Page 1 of 1

Data Standardization Question

Posted: Mon Apr 01, 2019 11:19 pm UTC
by pogrmman
Hi everyone.
I’m doing a project on the urban heat island effect. To that end, I’ve got daily temperature data at different climate sites for the 100 largest metro areas from 1966-2015. I’m using census data to get the populations of each metro area during each decade.

The thing is, there’s a very noticable effect from global warming: after standardizing temperature data for each climate site based on averages over the whole timescale, I’m getting an R-squared of nearly 30% just from comparing decade to standardized temps. Some of that is due to increasing urban heat island from growing population, but it exists pretty strongly even for the most rural climate sites I’ve got data from (35-50km from metro center), where the urban heat island is lessened.

Sure, some metro areas are big enough where that far might be within a dense area, but some of the ones showing this certainly aren’t — for instance, I doubt the station in Claremore, OK outside of Tulsa is in a pretty darn rural area.

Is there a good way to separate the global warming effect from the population effect, or is that unrealistic?

Re: Data Standardization Question

Posted: Tue Apr 02, 2019 2:32 pm UTC
by Sizik
Have a control group of climate sites in rural areas that shouldn't be affected by the effect you're testing for?

Re: Data Standardization Question

Posted: Wed Apr 03, 2019 6:35 am UTC
by pogrmman
Sizik wrote:Have a control group of climate sites in rural areas that shouldn't be affected by the effect you're testing for?

The biggest reason I haven't done that is I have no need or want for additional data! The whole, clean dataset is already quite large (~2GB) and it's already made certain things sluggish to calculate... I'm wondering if just taking the most rural 10 or 15% of the stations in my dataset, removing the effect on temperature predicted by population and distance from the metro area, then averaging those all out by decade, to create a baseline for each decade would be a valid technique.

Re: Data Standardization Question

Posted: Wed Apr 03, 2019 10:40 pm UTC
by gmalivuk
pogrmman wrote:
Sizik wrote:Have a control group of climate sites in rural areas that shouldn't be affected by the effect you're testing for?

The biggest reason I haven't done that is I have no need or want for additional data!

I mean, it sounds a lot like you *do* need additional data because otherwise you have no good way to remove the biggest confounder from your analysis.