Skip to content

Instantly share code, notes, and snippets.

@chalg
Last active April 30, 2018 06:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chalg/8644dd03a90fd9ffb72698f77edc980c to your computer and use it in GitHub Desktop.
Save chalg/8644dd03a90fd9ffb72698f77edc980c to your computer and use it in GitHub Desktop.
Daily maximum and minimum temperature trends for selected Bureau of Meteorology (BOM) weather stations.

BOM maximum and minimum temperature trends

1. Data preparation

Perform imports of required packages. Convert the contents of daily maximum and minimum BOM data into pandas dataframes. Concatenate individual dataframes into a single dataframe for mean maximum and mean minimum temperatures. Data source: http://www.bom.gov.au/climate/data/index.shtml

# Perform imports, set styles
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
import matplotlib as mpl
%pylab inline
%config InlineBackend.figure_format = 'retina'
sns.set(font_scale = 1.25)
plt.style.use('bmh')

# Convert contents of daily maximum temp CSV files from BOM website to pandas dataframes
df1 = pd.read_csv('IDCJAC0010_66062_1800_Data.csv')
df2 = pd.read_csv('IDCJAC0010_40043_1800_Data.csv')
df3 = pd.read_csv('IDCJAC0010_76031_1800_Data.csv')
df4 = pd.read_csv('IDCJAC0010_23034_1800_Data.csv')
df5 = pd.read_csv('IDCJAC0010_14015_1800_Data.csv')
df6 = pd.read_csv('IDCJAC0010_9021_1800_Data.csv')
df7 = pd.read_csv('IDCJAC0010_036031_1800_Data.csv')
df8 = pd.read_csv('IDCJAC0010_016001_1800_Data.csv')
df9 = pd.read_csv('IDCJAC0010_90015_1800_Data.csv')

# Concatenate daily max temp weather station data
frames = [df1, df2, df3, df4, df5, df6, df7, df8, df9]
temp_data_max = pd.concat(frames)

# Convert contents of daily minimum temp CSV files from BOM website to pandas dataframes
df21 = pd.read_csv('IDCJAC0011_66062_1800_Data.csv')
df22 = pd.read_csv('IDCJAC0011_40043_1800_Data.csv')
df23 = pd.read_csv('IDCJAC0011_76031_1800_Data.csv')
df24 = pd.read_csv('IDCJAC0011_23034_1800_Data.csv')
df25 = pd.read_csv('IDCJAC0011_14015_1800_Data.csv')
df26 = pd.read_csv('IDCJAC0011_9021_1800_Data.csv')
df27 = pd.read_csv('IDCJAC0011_036031_1800_Data.csv')
df28 = pd.read_csv('IDCJAC0011_016001_1800_Data.csv')
df29 = pd.read_csv('IDCJAC0011_90015_1800_Data.csv')

# Concatenate daily min temp weather station data
frames_min = [df21, df22, df23, df24, df25, df26, df27, df28, df29]
temp_data_min = pd.concat(frames_min)
Populating the interactive namespace from numpy and matplotlib

Read in further data as the station name is more useful than station number. Geo data could be used later if necessary.

# Read in temp_geo subset, which contains meaningful weather station names and location data if required
temp_geo = pd.read_csv('temp_geo.csv')
temp_geo
Bureau of Meteorology station number Station Lat Lon Elevation
0 66062 Sydney (Observatory Hill) -33.86 151.21 39
1 40043 Cape Moreton Lighthouse -27.03 153.47 100
2 76031 Mildura Airport -34.24 142.09 50
3 90015 Cape Otway Lighthouse -38.86 143.51 82
4 23034 Adelaide Airport -34.95 138.52 2
5 9021 Perth Airport -31.93 115.98 15
6 14015 Darwin Airport -12.42 130.89 30
7 36031 Longreach Aero -23.44 144.28 192
8 16001 Woomera Aerodrome -31.16 136.81 167

Merge minimum and maximum temperature data.

# Merge max and min temperature dataframes...
temp_data_all = pd.merge(temp_data_max, temp_data_min, how='left',
                         left_on=['Bureau of Meteorology station number', 'Year', 'Month', 'Day'],
                        right_on=['Bureau of Meteorology station number', 'Year', 'Month', 'Day'])
# Merge station subset
temp_data_all = pd.merge(temp_data_all, temp_geo, how='left', 
                         left_on='Bureau of Meteorology station number',
                        right_on='Bureau of Meteorology station number')

Review the combined dataframe.

temp_data_all.head()
Product code_x Bureau of Meteorology station number Year Month Day Maximum temperature (Degree C) Days of accumulation of maximum temperature Quality_x Product code_y Minimum temperature (Degree C) Days of accumulation of minimum temperature Quality_y Station Lat Lon Elevation
0 IDCJAC0010 66062 1859 1 1 24.4 NaN Y IDCJAC0011 14.5 NaN Y Sydney (Observatory Hill) -33.86 151.21 39
1 IDCJAC0010 66062 1859 1 2 24.4 1.0 Y IDCJAC0011 15.7 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39
2 IDCJAC0010 66062 1859 1 3 24.2 1.0 Y IDCJAC0011 15.3 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39
3 IDCJAC0010 66062 1859 1 4 24.7 1.0 Y IDCJAC0011 17.4 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39
4 IDCJAC0010 66062 1859 1 5 24.6 1.0 Y IDCJAC0011 16.9 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39
print (temp_data_all.shape)
(302845, 16)

Enhance the dataframe with an aggregation column called 'decade'. Then perform a clean-up of unnecessary columns created in the merge process.

# Create decade variable for grouping BOM temperature data
# Take a quick look at the dataframe

temp_data_all['decade'] = 10 * (temp_data_all['Year'] // 10)
temp_data_all['decade'] = temp_data_all['decade'].astype(str) + 's' 
print (temp_data_all.shape)
temp_data_all.head()
(302845, 17)
Product code_x Bureau of Meteorology station number Year Month Day Maximum temperature (Degree C) Days of accumulation of maximum temperature Quality_x Product code_y Minimum temperature (Degree C) Days of accumulation of minimum temperature Quality_y Station Lat Lon Elevation decade
0 IDCJAC0010 66062 1859 1 1 24.4 NaN Y IDCJAC0011 14.5 NaN Y Sydney (Observatory Hill) -33.86 151.21 39 1850s
1 IDCJAC0010 66062 1859 1 2 24.4 1.0 Y IDCJAC0011 15.7 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39 1850s
2 IDCJAC0010 66062 1859 1 3 24.2 1.0 Y IDCJAC0011 15.3 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39 1850s
3 IDCJAC0010 66062 1859 1 4 24.7 1.0 Y IDCJAC0011 17.4 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39 1850s
4 IDCJAC0010 66062 1859 1 5 24.6 1.0 Y IDCJAC0011 16.9 1.0 Y Sydney (Observatory Hill) -33.86 151.21 39 1850s
# Perform clean-up by dropping unnecessary columns from temp_data_all
print (temp_data_all.shape)
temp_data_all = temp_data_all.drop(['Product code_x', 'Product code_y', 'Days of accumulation of maximum temperature',
                                   'Days of accumulation of minimum temperature', 'Quality_x', 'Quality_y'], axis=1) 
print (temp_data_all.shape)
(302845, 17)
(302845, 11)
# Convert column headers to lowercase and remove spaces
temp_data_all.columns = temp_data_all.columns.str.lower().str.replace(' ', '_')

# Abbreviate long names..
temp_data_all.rename(columns={'bureau_of_meteorology_station_number' : 'station_number',
'maximum_temperature_(degree_c)' : 'max_temp',
'minimum_temperature_(degree_c)' : 'min_temp'}, inplace=True)
# Check
print (temp_data_all.columns)
Index(['station_number', 'year', 'month', 'day', 'max_temp', 'min_temp',
       'station', 'lat', 'lon', 'elevation', 'decade'],
      dtype='object')

While the amount of missing data below seems like a lot, overall it is not excessive. I have decided to drop them where necessary, rather than trying to impute values, as the accuracy would likely be similar.

# Check the number of NaNs (missing data)
print (temp_data_all.isnull().sum())
station_number       0
year                 0
month                0
day                  0
max_temp          4959
min_temp          4427
station              0
lat                  0
lon                  0
elevation            0
decade               0
dtype: int64
# Drop 2018 values to avoid temperature distortions because it is not a full year of data
temp_data_all = temp_data_all[(temp_data_all['year'] >= 1859) & (temp_data_all['year'] < 2018)]
# Check
temp_data_all.tail()
station_number year month day max_temp min_temp station lat lon elevation decade
302764 90015 2017 12 27 33.8 17.3 Cape Otway Lighthouse -38.86 143.51 82 2010s
302765 90015 2017 12 28 19.0 17.7 Cape Otway Lighthouse -38.86 143.51 82 2010s
302766 90015 2017 12 29 17.8 13.8 Cape Otway Lighthouse -38.86 143.51 82 2010s
302767 90015 2017 12 30 17.8 14.4 Cape Otway Lighthouse -38.86 143.51 82 2010s
302768 90015 2017 12 31 18.7 14.1 Cape Otway Lighthouse -38.86 143.51 82 2010s

2. Time Series Visualisation

Use the pivot_table function to convert the data into a wide format, suitable for time series analysis.

The below line plot highlights an upward trend in mean maximum temperatures, particularly from around the mid-1940s until the end of 2017. The second plot shows a zoomed in view from 1946 onwards.

# Plot mean maximum temperatures over time.
temp_data_all.pivot_table('max_temp',
                          index='year',
                          columns='station',
                          aggfunc='mean').plot(figsize=(14, 8)) 


plt.title('Mean Maximum Temperatures (1859-2017)')
plt.ylabel('mean max. temperature (degrees Celsius)');

output_19_0

# Zoom into the above plot from 1946, from which point the trend seems consistent.
temp_data_all[temp_data_all['year'] > 1945].pivot_table('max_temp',
                                                        index='year',
                                                        columns='station',
                                                        aggfunc='mean').plot(figsize=(14, 8)) 


plt.title('Mean Maximum Temperatures (1946-2017)')
plt.ylabel('mean max. temperature (degrees Celsius)');

output_20_0

The below line plot similarly, highlights an upward trend in mean minimum temperatures. The trends at Darwin Airport and Mildura Airport seems to be relatively static compared to the others.

# Plot mean minimum temperatures over time.
temp_data_all.pivot_table('min_temp',
                          index='year', columns='station',
                          aggfunc='mean').plot(figsize=(14, 8)) 


plt.title('Mean Minimum Temperatures (1859-2017)')
plt.ylabel('mean min. temperature (degrees Celsius)');

output_22_0

# Zoom into the above plot from 1946, from which point the trend seems consistent.
temp_data_all[temp_data_all['year'] > 1945].pivot_table('min_temp',
                          index='year', columns='station',
                          aggfunc='mean').plot(figsize=(14, 8)) 


plt.title('Mean Minimum Temperatures (1946-2017)')
plt.ylabel('mean min. temperature (degrees Celsius)');

output_23_0

The pivot table below shows the trend indexed by decade as opposed to years. The mean increase in maximum temperatures between the 1960s and 2010s is the highest in the temperate stations of Perth Airport, followed by Woomera Aerodrome and Mildura Airport.

Note: the 2010s are not complete so this variance could increase by the end of December 2019.

# Pivot on mean maximum & minimum temperatures across weather stations by decades
temp_data_all.pivot_table(['max_temp', 'min_temp'], index='decade',
                      columns='station', aggfunc='mean').dropna()
max_temp min_temp
station Adelaide Airport Cape Moreton Lighthouse Cape Otway Lighthouse Darwin Airport Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome Adelaide Airport Cape Moreton Lighthouse Cape Otway Lighthouse Darwin Airport Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome
decade
1960s 21.244402 23.010364 16.764082 31.859584 30.457359 23.600219 23.901726 21.914834 25.291089 11.041128 17.479949 10.110051 22.832565 14.794326 10.321955 12.156513 13.841897 12.329047
1970s 21.467716 23.073547 16.778510 32.141298 30.997238 23.333196 24.410825 22.123977 25.455930 11.198110 17.842113 10.741237 23.184254 14.898042 10.388438 12.103931 14.175446 12.650357
1980s 21.461894 23.067178 17.064353 32.015987 31.282510 23.838986 24.363400 22.263482 25.707473 11.359852 18.140847 10.871571 23.694863 16.044472 10.330000 12.384489 14.440236 12.808719
1990s 21.284662 23.298253 16.764007 32.163247 31.462072 23.669981 24.596550 22.236791 25.893591 11.609890 18.366725 10.726756 23.341561 16.047817 10.138757 12.807157 14.331968 12.806590
2000s 22.018889 24.092782 17.021355 32.315467 31.932001 24.588366 25.032850 22.893892 26.538127 11.825623 18.834042 11.169956 23.081096 16.068784 10.426088 12.002874 14.690496 13.165314
2010s 22.140623 24.247343 17.502405 32.566598 31.711434 24.908385 25.576626 23.204175 26.643151 12.162628 18.840315 11.558267 23.247826 16.266528 10.713963 12.674778 14.978520 13.319129

Summarise the above table by printing out the mean and median differences in temperature between the 1960s and the 2010s. It is apparent that all stations maximum and minimum temperatures have increased over this period. The increase in mean maximum temperatures is higher than mean minimum temperatures. It is also noteworthy that the most tropical station (Darwin Airport) shows the smallest movement in both. Although Perth, Woomera and Mildura show the largest increase in mean maximum temperatures, they all show a relatively small increase in mean minimum temperatures. One possible explanation for this is that reduced cloud cover in these locations, is still allowing heat to escape quickly at night.

Note: as mentioned above, the 2010s are not complete so this variance could increase by the end of December 2019.

# Create a new dataframe for reviewing mean maximum temp differences from the 1960s to 2010s.
pivot_df = temp_data_all.pivot_table(['max_temp'], index='decade',
                      columns='station', aggfunc='mean').dropna()

# Use a loop to print column name and the difference between the last decade and the first decade mean maximum temperatures.
lst = []
ln =  len(pivot_df.index) - 1
i = 0
for column in pivot_df:
    print (pivot_df.columns[i], pivot_df.iloc[ln,i] - pivot_df.iloc[0,i])
    lst.append(pivot_df.iloc[ln,i] - pivot_df.iloc[0,i])
    i += 1
print ('Overall mean increase: ', mean(lst), 'degrees Celsius.')
print ('Overall median increase: ', median(lst), 'degrees Celsius.')
('max_temp', 'Adelaide Airport') 0.89622099957
('max_temp', 'Cape Moreton Lighthouse') 1.23697932396
('max_temp', 'Cape Otway Lighthouse') 0.738322700181
('max_temp', 'Darwin Airport') 0.707014309163
('max_temp', 'Longreach Aero') 1.25407471527
('max_temp', 'Mildura Airport') 1.30816566995
('max_temp', 'Perth Airport') 1.67490004426
('max_temp', 'Sydney (Observatory Hill)') 1.28934111222
('max_temp', 'Woomera Aerodrome') 1.35206157602
Overall mean increase:  1.16189782784 degrees Celsius.
Overall median increase:  1.25407471527 degrees Celsius.
# Re-use previous dataframe and variables for reviewing mean minimum temp differences from the 1960s to 2010s.
pivot_df = temp_data_all.pivot_table(['min_temp'], index='decade',
                      columns='station', aggfunc='mean').dropna()

# Use a loop to print column name and the difference between the last decade and the first decade mean maximum temperatures.
lst = []
ln =  len(pivot_df.index) - 1
i = 0
for column in pivot_df:
    print (pivot_df.columns[i], pivot_df.iloc[ln,i] - pivot_df.iloc[0,i])
    lst.append(pivot_df.iloc[ln,i] - pivot_df.iloc[0,i])
    i += 1
print ('Overall mean increase: ', mean(lst), 'degrees Celsius.')
print ('Overall median increase: ', median(lst), 'degrees Celsius.')
('min_temp', 'Adelaide Airport') 1.1215001878
('min_temp', 'Cape Moreton Lighthouse') 1.36036622562
('min_temp', 'Cape Otway Lighthouse') 1.44821604666
('min_temp', 'Darwin Airport') 0.415261485678
('min_temp', 'Longreach Aero') 1.47220283951
('min_temp', 'Mildura Airport') 0.392008481117
('min_temp', 'Perth Airport') 0.518264192422
('min_temp', 'Sydney (Observatory Hill)') 1.13662259115
('min_temp', 'Woomera Aerodrome') 0.990081755563
Overall mean increase:  0.983835978392 degrees Celsius.
Overall median increase:  1.1215001878 degrees Celsius.

The below boxplots highlight the distribution of maximum and minimum temperatures very clearly across the decades of BOM data. As expected, you can see an increase in size of the interquartile range for the temperate weather stations, while the tropical and sub-tropical range is largely constant. The increase in each boxplot group is abundantly clear.

# Plot the above pivot table separately using boxplots
# Draw maximum temperature boxplots across decades for each station
plt.figure(figsize=(19, 12))

sns.boxplot(data=(temp_data_all[temp_data_all['year'] > 1939]).dropna(), x="max_temp",
            y="station", orient='h', hue="decade", palette="Oranges_d")
plt.title('Maximum Temperatures')
plt.ylabel('Weather station')
plt.xlabel("Maximum temp (Degree C)")
plt.savefig("max_temp_boxplot.png");

output_30_0

# Draw minimum temperature boxplots across decades for each station
plt.figure(figsize=(19, 12))

sns.boxplot(data=(temp_data_all[temp_data_all['year'] > 1939]).dropna(), x="min_temp",
            y="station", orient='h', hue="decade", palette="Blues_d")
plt.title('Minimum Temperatures')
plt.ylabel('Weather station')
plt.xlabel("Minimum temp (Degree C)")
plt.savefig("min_temp_boxplot.png");

output_31_0

The below pivot tables and plots provide some insights into the movement of more extreme temperatures.

# Create pivot table for the number of 40 degrees Celsius days or above, per decade
# Note 2010s not complete! Only those stations that register >= 40 will appear.

temp_data_all[temp_data_all['max_temp'] >= 40].pivot_table('max_temp', index='decade', columns='station',
                                                                                aggfunc='count', margins=True,
                                                                               margins_name='Total', fill_value='')
                                                                                
station Adelaide Airport Cape Otway Lighthouse Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome Total
decade
1860s 3 1 4.0
1870s 5 1 6.0
1880s 2 2.0
1890s 5 2 7.0
1900s 7 4 11.0
1910s 1 1 2.0
1920s 4 4.0
1930s 1 4 5.0
1940s 9 7 5 4 25.0
1950s 2 2 36 31 4 76 151.0
1960s 5 1 62 51 46 4 97 266.0
1970s 9 199 38 37 2 106 391.0
1980s 24 6 247 69 42 3 133 524.0
1990s 12 3 220 52 46 3 123 459.0
2000s 31 2 260 102 40 4 183 622.0
2010s 22 1 225 85 46 5 152 536.0
Total 105 39 1213 442 295 47 874 3015.0

In the below plot the number of high maximum temperatures appear to be increasing at a faster rate for temperate inland stations (considering the low base formed by incomplete data in the initial decade for the datasets). It is important to keep in mind that the 2010s are incomplete and two more summers are yet to come (2018 & 2019).

# Create pivot table for the number of mean maximum 40 degrees Celsius days or above, per decade
# Note 2010s not complete! Only those stations that register >= 40 will appear. Datasets that start mid-way through the decade 
# will start from a low base.

temp_data_all[temp_data_all['max_temp'] >= 40].pivot_table('max_temp', index='decade', columns='station',
                                                                                aggfunc='count',
                                                                                fill_value=nan).plot(figsize=(12, 6))

plt.title('Number of 40 degrees Celsius days or above per decade')
plt.ylabel('Count')
plt.savefig("40_and_above.png");

output_35_0

# Zoom into the above from the 1950s to the 2000s to reduce incomplete decades.
# Only those stations that register >= 40 will appear.

temp_data_all[(temp_data_all['max_temp'] >= 40) & (temp_data_all['year'] >= 1950) & (temp_data_all['year'] < 2010)].pivot_table('max_temp',
                                                                                index='decade', columns='station',
                                                                                aggfunc='count',
                                                                                fill_value=nan).plot(figsize=(12, 6))

plt.title('Number of 40 degrees Celsius days or above per decade (1950 - 2009)')
plt.ylabel('Count');

output_36_0

# Create pivot table for the number of 5 degrees Celsius days or below, per decade
# Note 2010s not complete! Only those stations that register <= 5 will appear.
# Datasets that start mid-way through the decade will start from a low base.

temp_data_all[temp_data_all['min_temp'] <= 5].pivot_table('min_temp', index='decade', columns='station',
                                                          aggfunc='count', margins=True, margins_name='Total', fill_value='')
station Adelaide Airport Cape Otway Lighthouse Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome Total
decade
1850s 16.0 16.0
1860s 389 75.0 464.0
1870s 131 53.0 184.0
1880s 138 23.0 161.0
1890s 108 39.0 147.0
1900s 169 63.0 232.0
1910s 81 29.0 110.0
1920s 75 32.0 107.0
1930s 77 60.0 137.0
1940s 137 247 132 48.0 63 627.0
1950s 160 130 676 289 20.0 380 1655.0
1960s 323 151 99 688 206 16.0 376 1859.0
1970s 313 76 348 703 189 19.0 295 1943.0
1980s 258 63 271 663 160 13.0 302 1730.0
1990s 263 80 260 698 146 5.0 286 1738.0
2000s 267 53 267 717 291 5.0 262 1862.0
2010s 195 35 179 554 221 3.0 265 1452.0
Total 1779 1893 1424 4946 1634 519.0 2229 14424.0
# Create pivot table for the number of mean minimum 5 degrees Celsius days or below, per decade
# Note 2010s not complete! Datasets that start mid-way through the decade will start from a low base.
temp_data_all[temp_data_all['min_temp'] <= 5].pivot_table('min_temp', index='decade', columns='station',
                                                           aggfunc='count', fill_value=nan).plot(figsize=(12, 6))

plt.title('Number of 5 degrees Celsius days or below per decade')
plt.ylabel('Count')
plt.savefig("5_and_below.png");

output_38_0

# Create pivot table for the maximum temperatures, per decade
# Note 2010s not complete!
temp_data_all.pivot_table(['max_temp', 'min_temp'],
                          index='decade', columns='station',
                          aggfunc='max', fill_value='')
max_temp min_temp
station Adelaide Airport Cape Moreton Lighthouse Cape Otway Lighthouse Darwin Airport Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome Adelaide Airport Cape Moreton Lighthouse Cape Otway Lighthouse Darwin Airport Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome
decade
1850s 36.9 23.2
1860s 40.6 41.6 22.8 26.3
1870s 42.2 40.7 26.1 25.4
1880s 40.6 39.3 25.6 25.7
1890s 40.6 42.5 30.6 24.9
1900s 42.8 41.9 29.4 24.2
1910s 32.8 40.6 40.3 25.4 27.2 25.3
1920s 32.2 39.2 42.1 26.3 30.3 23.4
1930s 32.9 40 45.3 25.6 26.7 24.7
1940s 35 39.4 37.1 43.5 42.8 41.9 45.4 25.6 25.6 28.8 25.6 25.6 22.9 30.5
1950s 40.9 32.9 42.2 37 44.2 44.1 42.2 44.6 25 25 28.3 29.2 30.7 28.3 23.1 30.3
1960s 44 32.2 40.6 37 43.3 45.5 44.6 42.4 47.6 28.7 24.4 27.3 29.3 30.6 28.9 28.7 26.1 32.2
1970s 41.9 33 39.2 37.2 46 46.8 44.2 41.4 45.8 29.8 25.3 26 29.2 29.7 30.3 27.6 26.6 29.5
1980s 42.9 33.4 43.3 38.9 45.1 46 44.5 41.8 45.8 31.4 25.4 27.9 29.7 29.8 29.2 27.8 25.6 34.7
1990s 42.9 32.9 42 37.8 47.3 46.9 46.7 40.9 45.8 30.9 25.5 27 29.6 30.2 30.7 28.7 26.2 31.2
2000s 44 35.4 41.7 38 45.9 46.7 44.5 44.2 46.3 33.5 26.4 26.6 29.7 30.4 30.9 27.8 25.0 31.4
2010s 44.1 32.1 40.9 37.5 46.3 46.3 44.2 45.8 48.1 31.2 25.8 26.3 29.7 31.5 31.9 28.6 27.6 31.8
# Create pivot table for the minimum temperatures, per decade
# Note 2010s not complete!
temp_data_all.pivot_table(['max_temp', 'min_temp'],
                          index='decade', columns='station',
                          aggfunc='min', fill_value='')
max_temp min_temp
station Adelaide Airport Cape Moreton Lighthouse Cape Otway Lighthouse Darwin Airport Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome Adelaide Airport Cape Moreton Lighthouse Cape Otway Lighthouse Darwin Airport Longreach Aero Mildura Airport Perth Airport Sydney (Observatory Hill) Woomera Aerodrome
decade
1850s 11.1 2.8
1860s 8.9 7.7 -1.1 2.2
1870s 8.3 9.1 1.7 2.7
1880s 7.2 10.2 1.1 3.8
1890s 9.4 9.6 -1.1 2.2
1900s 7.8 10.0 -0.6 3.2
1910s 13.1 8.9 11.0 7.2 2.4 3.0
1920s 12.8 7.2 10.8 6.1 1.7 2.9
1930s 10.6 6.7 10.6 6.7 1.1 2.1
1940s 12.8 8.3 24.9 9 11.9 11.1 11.7 6.1 1.7 10.4 -2.4 -1.1 3.2 0
1950s 10.3 12.2 7.8 24 7.9 9.4 11.5 8.3 -1.2 5.6 1.7 13.9 -3.4 -0.8 4.3 -0.3
1960s 9.8 13.3 7.8 21.1 14.6 7.9 10.7 9.3 9.4 -1.1 6.1 0 10.8 -1.7 -2.5 -0.4 2.7 -1.4
1970s 10.4 14.3 9.2 24.1 14.9 8.9 10.7 11.4 9.8 -2.2 7.2 3 13.3 -1.6 -2 0 2.7 -0.9
1980s 10 12.5 7.4 22.7 13.1 8 11.5 9.6 9.4 -2.6 7.9 2.2 13.4 -2.9 -4 0.2 3.1 0.4
1990s 9.8 13.1 8 24.8 12.5 9.3 11.7 10.4 9.3 0.4 7.4 2 13.1 -0.8 -3.1 -0.3 4.3 0.5
2000s 10.2 12.3 8.7 22.7 8.3 9.7 11.7 11.9 9.5 -0.5 8 3 12.4 -0.6 -3.7 -1.3 3.7 0.4
2010s 10.9 14.1 9.1 21.9 9.1 9.1 12.3 11.7 11.2 0.1 7.2 2.7 12.7 -0.7 -3.2 -1 4.3 0.7

Create a new variable to investigate the ranges between the minimum and maximum temperatures. There does not appear to be any noticeable increase or decrease in volatility.

# Create new column to hold temp range
print (temp_data_all.shape)
temp_data_all['range'] = temp_data_all['max_temp'] - temp_data_all['min_temp']
print (temp_data_all.shape)
(302409, 11)
(302409, 12)
# Plot the trend in temperature ranges across time for weather stations.
# Exclude 2017 because it is incomplete  
temp_data_all.pivot_table('range', index='year', columns='station',
                          aggfunc='mean', fill_value=nan).plot(figsize=(14, 10))
plt.xlim(1900, 2017)
plt.title('Temperature range volatility')
plt.ylabel('Temperature range');

output_43_0

Take a look at temperatures from a seasonal perspective. There has been a lot of talk in the media recently of Autumn being warm, especially in the southern states. There has also been late season bushfires in Australia in April 2018.

Based on the below, Summer, followed by Autumn and Spring seem to have increased the most. Winter has increased to a lesser extent.

# Plot mean maximum temperatures for all four seasons.

# create a new dataframe from 1966, where all stations have data to avoid distortions
temp_data_66 = temp_data_all[temp_data_all['year'] > 1965]

summer = temp_data_66[(temp_data_66['month'] == 12) |
              (temp_data_66['month'] <= 2)].pivot_table('max_temp',
                                                         index='year',
                                                         aggfunc='mean',
                                                         fill_value=nan)
autumn = temp_data_66[(temp_data_66['month'] >= 3) &
              (temp_data_66['month'] <= 5)].pivot_table('max_temp',
                                                         index='year',
                                                         aggfunc='mean',
                                                         fill_value=nan)
winter = temp_data_66[(temp_data_66['month'] >= 6) &
              (temp_data_66['month'] <= 8)].pivot_table('max_temp',
                                                         index='year',
                                                         aggfunc='mean',
                                                         fill_value=nan)
spring = temp_data_66[(temp_data_66['month'] >= 9) &
              (temp_data_66['month'] <= 11)].pivot_table('max_temp',
                                                         index='year',
                                                         aggfunc='mean',
                                                         fill_value=nan)
plt.figure(figsize=(12, 10))
plt.plot(summer, 'r--', label='Summer')
plt.plot(autumn, 'y--', label='Autumn')
plt.plot(winter, 'b--', label='Winter')
plt.plot(spring, 'g--', label='Spring')

plt.legend()
plt.title('All stations - seasonal (1966-2017)')
plt.ylabel('mean max. temperature (degrees Celsius)')
plt.show();

output_45_0

3. Further Visualisations

In the below kernel density estimation plots, we can see the broader distribution of temperatures in the temperate zone at Mildura Airport (blues), compared to Cape Morten Lighthouse (greens) and Darwin Airport (oranges).

# Plot 3 bivariate densities with Seaborn
# see https://seaborn.pydata.org/generated/seaborn.kdeplot.html

# Gather data from 1980 only

yr = 1980
mld = temp_data_all[(temp_data_all['station'] == 'Mildura Airport') & (temp_data_all['year'] > yr)].dropna()
dwn = temp_data_all[(temp_data_all['station'] == 'Darwin Airport') & (temp_data_all['year'] > yr)].dropna()
cml = temp_data_all[(temp_data_all['station'] == 'Cape Moreton Lighthouse') & (temp_data_all['year'] > yr)].dropna()



ax = sns.kdeplot(mld.max_temp, mld.min_temp,
                  cmap="Blues", shade=True, shade_lowest=False)
ax = sns.kdeplot(dwn.max_temp, dwn.min_temp,
                  cmap="Oranges", shade=True, shade_lowest=False)
ax = sns.kdeplot(cml.max_temp, cml.min_temp,
                  cmap="Greens", shade=True, shade_lowest=False)

output_48_0

The below seaborn pairplot shows variables max_temp, min_temp, year & month plotted against station. This plot can be useful for finding relationships that require further analysis.

plt.figure(figsize=(19, 10))
sns.pairplot(data=(temp_data_all[temp_data_all['year'] > 1939]).dropna(),
             vars=['max_temp','min_temp', 'year', 'month'], hue='station', palette="Set2");
<matplotlib.figure.Figure at 0x1f4d6f355f8>

output_50_1

According to some articles, the R data science language is be better suited to perform further time series analysis. I intend to find out! Produce a univariate extract to investigate later and perhaps create a prediction model in R...

# Create a new univariate time series dataset to explore further in R
# This requires a datetime index, assign to new dataframe beforehand.
temp_data = temp_data_all
temp_data['date'] = pd.to_datetime(temp_data_all[['day', 'month', 'year']])
temp_data.set_index('date', inplace=True)

# Get data for Station Mildura only, check the number of NaNs, remove NaNs
print (temp_data[temp_data.station == 'Mildura Airport'].isnull().sum())
# Number of NaNs is not excessive, just drop them.
mildura = temp_data[temp_data.station == 'Mildura Airport'].dropna()
print (mildura.shape)
print (mildura.head(5))
print (type(mildura))

# Convert daily to the monthly mean, with a time stamp of the first day of the month
mildura = mildura['max_temp'].resample('MS').mean()

# Review
print (mildura.shape)
print (mildura.head(5))
print (mildura.tail(5))
print (type(mildura))

# Export to csv for use in RStudio in a format that can easily be converted to an xts object.
mildura.to_csv("mildura_temps.csv", header=True)
station_number      0
year                0
month               0
day                 0
max_temp          251
min_temp          263
station             0
lat                 0
lon                 0
elevation           0
decade              0
range             268
dtype: int64
(26030, 12)
            station_number  year  month  day  max_temp  min_temp  \
date                                                               
1946-08-26           76031  1946      8   26      19.8       0.0   
1946-08-27           76031  1946      8   27      14.8       8.3   
1946-08-28           76031  1946      8   28      14.5       4.3   
1946-08-29           76031  1946      8   29      16.3       4.3   
1946-08-30           76031  1946      8   30      18.3       3.3   

                    station    lat     lon  elevation decade  range  
date                                                                 
1946-08-26  Mildura Airport -34.24  142.09         50  1940s   19.8  
1946-08-27  Mildura Airport -34.24  142.09         50  1940s    6.5  
1946-08-28  Mildura Airport -34.24  142.09         50  1940s   10.2  
1946-08-29  Mildura Airport -34.24  142.09         50  1940s   12.0  
1946-08-30  Mildura Airport -34.24  142.09         50  1940s   15.0  
<class 'pandas.core.frame.DataFrame'>
(857,)
date
1946-08-01    16.783333
1946-09-01    21.092000
1946-10-01    23.292857
1946-11-01    26.953846
1946-12-01    30.430000
Freq: MS, Name: max_temp, dtype: float64
date
2017-08-01    17.416129
2017-09-01    22.846667
2017-10-01    26.770968
2017-11-01    30.516667
2017-12-01    31.209677
Freq: MS, Name: max_temp, dtype: float64
<class 'pandas.core.series.Series'>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment