Skip to content

Instantly share code, notes, and snippets.

@psychemedia
Last active March 28, 2023 10:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save psychemedia/ad30077359ab1f19fbfcace5236c55c2 to your computer and use it in GitHub Desktop.
Save psychemedia/ad30077359ab1f19fbfcace5236c55c2 to your computer and use it in GitHub Desktop.
DEFRA AUR Data Downloader - quick hack for downloading UK air quality data
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DEFRA AURN Data Downloader\n",
"\n",
"From https://github.com/davidcarslaw/openair it looks like there are R packages available. So we can perhaps reverse engineer that R package to find a way in to the data downloads from Python...\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pyreadr in /home/oustudent/.local/lib/python3.5/site-packages (0.2.1)\n",
"\u001b[33mYou are using pip version 10.0.1, however version 19.2.3 is available.\n",
"You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n"
]
}
],
"source": [
"!pip3 install --user pyreadr"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pyreadr"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"metadata_url='http://uk-air.defra.gov.uk/openair/R_data/AURN_metadata.RData'\n",
"\n",
"fn = metadata_url.split('/')[-1]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"URL transformed to HTTPS due to an HSTS policy\n",
"--2019-09-09 15:17:42-- https://uk-air.defra.gov.uk/openair/R_data/AURN_metadata.RData\n",
"Resolving uk-air.defra.gov.uk (uk-air.defra.gov.uk)... 213.251.9.44\n",
"Connecting to uk-air.defra.gov.uk (uk-air.defra.gov.uk)|213.251.9.44|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 23386 (23K) [application/octet-stream]\n",
"Saving to: ‘AURN_metadata.RData’\n",
"\n",
"AURN_metadata.RData 100%[===================>] 22.84K --.-KB/s in 0.001s \n",
"\n",
"2019-09-09 15:17:42 (24.1 MB/s) - ‘AURN_metadata.RData’ saved [23386/23386]\n",
"\n"
]
}
],
"source": [
"!wget $metadata_url"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>site_id</th>\n",
" <th>site_name</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>altitude</th>\n",
" <th>location_type</th>\n",
" <th>zone_name</th>\n",
" <th>zone_id</th>\n",
" <th>la_region</th>\n",
" <th>la_region_id</th>\n",
" <th>parameter</th>\n",
" <th>date_started</th>\n",
" <th>date_ended</th>\n",
" <th>ratified_to</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>A3</td>\n",
" <td>London A3 Roadside</td>\n",
" <td>51.37348</td>\n",
" <td>-0.291853</td>\n",
" <td>32</td>\n",
" <td>Urban Traffic</td>\n",
" <td>Greater London Urban Area</td>\n",
" <td>A01</td>\n",
" <td>Kingston upon Thames London Boro</td>\n",
" <td>160</td>\n",
" <td>CO</td>\n",
" <td>20/03/1997</td>\n",
" <td>30/09/2007</td>\n",
" <td>2007-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>A3</td>\n",
" <td>London A3 Roadside</td>\n",
" <td>51.37348</td>\n",
" <td>-0.291853</td>\n",
" <td>32</td>\n",
" <td>Urban Traffic</td>\n",
" <td>Greater London Urban Area</td>\n",
" <td>A01</td>\n",
" <td>Kingston upon Thames London Boro</td>\n",
" <td>160</td>\n",
" <td>PM10</td>\n",
" <td>20/03/1997</td>\n",
" <td>30/09/2007</td>\n",
" <td>2007-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>A3</td>\n",
" <td>London A3 Roadside</td>\n",
" <td>51.37348</td>\n",
" <td>-0.291853</td>\n",
" <td>32</td>\n",
" <td>Urban Traffic</td>\n",
" <td>Greater London Urban Area</td>\n",
" <td>A01</td>\n",
" <td>Kingston upon Thames London Boro</td>\n",
" <td>160</td>\n",
" <td>NO</td>\n",
" <td>20/03/1997</td>\n",
" <td>30/09/2007</td>\n",
" <td>2007-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>A3</td>\n",
" <td>London A3 Roadside</td>\n",
" <td>51.37348</td>\n",
" <td>-0.291853</td>\n",
" <td>32</td>\n",
" <td>Urban Traffic</td>\n",
" <td>Greater London Urban Area</td>\n",
" <td>A01</td>\n",
" <td>Kingston upon Thames London Boro</td>\n",
" <td>160</td>\n",
" <td>NO2</td>\n",
" <td>20/03/1997</td>\n",
" <td>30/09/2007</td>\n",
" <td>2007-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>A3</td>\n",
" <td>London A3 Roadside</td>\n",
" <td>51.37348</td>\n",
" <td>-0.291853</td>\n",
" <td>32</td>\n",
" <td>Urban Traffic</td>\n",
" <td>Greater London Urban Area</td>\n",
" <td>A01</td>\n",
" <td>Kingston upon Thames London Boro</td>\n",
" <td>160</td>\n",
" <td>NOXasNO2</td>\n",
" <td>20/03/1997</td>\n",
" <td>30/09/2007</td>\n",
" <td>2007-09-30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" site_id site_name latitude longitude altitude location_type \\\n",
"0 A3 London A3 Roadside 51.37348 -0.291853 32 Urban Traffic \n",
"1 A3 London A3 Roadside 51.37348 -0.291853 32 Urban Traffic \n",
"2 A3 London A3 Roadside 51.37348 -0.291853 32 Urban Traffic \n",
"3 A3 London A3 Roadside 51.37348 -0.291853 32 Urban Traffic \n",
"4 A3 London A3 Roadside 51.37348 -0.291853 32 Urban Traffic \n",
"\n",
" zone_name zone_id la_region \\\n",
"0 Greater London Urban Area A01 Kingston upon Thames London Boro \n",
"1 Greater London Urban Area A01 Kingston upon Thames London Boro \n",
"2 Greater London Urban Area A01 Kingston upon Thames London Boro \n",
"3 Greater London Urban Area A01 Kingston upon Thames London Boro \n",
"4 Greater London Urban Area A01 Kingston upon Thames London Boro \n",
"\n",
" la_region_id parameter date_started date_ended ratified_to \n",
"0 160 CO 20/03/1997 30/09/2007 2007-09-30 \n",
"1 160 PM10 20/03/1997 30/09/2007 2007-09-30 \n",
"2 160 NO 20/03/1997 30/09/2007 2007-09-30 \n",
"3 160 NO2 20/03/1997 30/09/2007 2007-09-30 \n",
"4 160 NOXasNO2 20/03/1997 30/09/2007 2007-09-30 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"pyreadr.read_r(fn)['AURN_metadata'].head()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"package_url_base = 'https://uk-air.defra.gov.uk/openair/R_data/{fn}'"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-09-09 15:30:03-- https://uk-air.defra.gov.uk/openair/R_data/KC1_2019.RData\n",
"Resolving uk-air.defra.gov.uk (uk-air.defra.gov.uk)... 213.251.9.44\n",
"Connecting to uk-air.defra.gov.uk (uk-air.defra.gov.uk)|213.251.9.44|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 242036 (236K) [application/octet-stream]\n",
"Saving to: ‘KC1_2019.RData’\n",
"\n",
"KC1_2019.RData 100%[===================>] 236.36K --.-KB/s in 0.09s \n",
"\n",
"2019-09-09 15:30:03 (2.52 MB/s) - ‘KC1_2019.RData’ saved [242036/242036]\n",
"\n"
]
}
],
"source": [
"site = 'kc1'\n",
"year = 2019\n",
"\n",
"fn = '{site}_{year}.RData'.format(site=site.upper(), year=year)\n",
"\n",
"url = package_url_base.format(fn=fn)\n",
"!wget $url"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date</th>\n",
" <th>O3</th>\n",
" <th>NO2</th>\n",
" <th>CO</th>\n",
" <th>SO2</th>\n",
" <th>PM10</th>\n",
" <th>NOXasNO2</th>\n",
" <th>NO</th>\n",
" <th>PM2.5</th>\n",
" <th>temp</th>\n",
" <th>ws</th>\n",
" <th>wd</th>\n",
" <th>RAWPM25</th>\n",
" <th>site</th>\n",
" <th>code</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2019-01-01 00:00:00</td>\n",
" <td>49.89250</td>\n",
" <td>18.87596</td>\n",
" <td>0.104272</td>\n",
" <td>1.62547</td>\n",
" <td>21.275</td>\n",
" <td>19.60464</td>\n",
" <td>0.47523</td>\n",
" <td>15.849</td>\n",
" <td>5.5</td>\n",
" <td>2.3</td>\n",
" <td>259.3</td>\n",
" <td>16.800</td>\n",
" <td>London N. Kensington</td>\n",
" <td>KC1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2019-01-01 01:00:00</td>\n",
" <td>52.48691</td>\n",
" <td>19.75871</td>\n",
" <td>0.084962</td>\n",
" <td>0.91433</td>\n",
" <td>12.775</td>\n",
" <td>20.34165</td>\n",
" <td>0.38018</td>\n",
" <td>8.726</td>\n",
" <td>4.6</td>\n",
" <td>2.6</td>\n",
" <td>269.4</td>\n",
" <td>9.250</td>\n",
" <td>London N. Kensington</td>\n",
" <td>KC1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2019-01-01 02:00:00</td>\n",
" <td>51.38928</td>\n",
" <td>21.51622</td>\n",
" <td>0.081101</td>\n",
" <td>1.06672</td>\n",
" <td>8.800</td>\n",
" <td>22.71853</td>\n",
" <td>0.78413</td>\n",
" <td>4.859</td>\n",
" <td>4.4</td>\n",
" <td>3.0</td>\n",
" <td>280.5</td>\n",
" <td>5.150</td>\n",
" <td>London N. Kensington</td>\n",
" <td>KC1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2019-01-01 03:00:00</td>\n",
" <td>50.49121</td>\n",
" <td>21.12616</td>\n",
" <td>0.075307</td>\n",
" <td>1.06672</td>\n",
" <td>9.525</td>\n",
" <td>22.05522</td>\n",
" <td>0.60592</td>\n",
" <td>5.071</td>\n",
" <td>4.1</td>\n",
" <td>2.7</td>\n",
" <td>287.0</td>\n",
" <td>5.375</td>\n",
" <td>London N. Kensington</td>\n",
" <td>KC1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2019-01-01 04:00:00</td>\n",
" <td>43.85551</td>\n",
" <td>25.71346</td>\n",
" <td>0.083997</td>\n",
" <td>0.99052</td>\n",
" <td>9.425</td>\n",
" <td>26.58787</td>\n",
" <td>0.57028</td>\n",
" <td>5.212</td>\n",
" <td>4.0</td>\n",
" <td>3.0</td>\n",
" <td>291.5</td>\n",
" <td>5.525</td>\n",
" <td>London N. Kensington</td>\n",
" <td>KC1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date O3 NO2 CO SO2 PM10 \\\n",
"0 2019-01-01 00:00:00 49.89250 18.87596 0.104272 1.62547 21.275 \n",
"1 2019-01-01 01:00:00 52.48691 19.75871 0.084962 0.91433 12.775 \n",
"2 2019-01-01 02:00:00 51.38928 21.51622 0.081101 1.06672 8.800 \n",
"3 2019-01-01 03:00:00 50.49121 21.12616 0.075307 1.06672 9.525 \n",
"4 2019-01-01 04:00:00 43.85551 25.71346 0.083997 0.99052 9.425 \n",
"\n",
" NOXasNO2 NO PM2.5 temp ws wd RAWPM25 site \\\n",
"0 19.60464 0.47523 15.849 5.5 2.3 259.3 16.800 London N. Kensington \n",
"1 20.34165 0.38018 8.726 4.6 2.6 269.4 9.250 London N. Kensington \n",
"2 22.71853 0.78413 4.859 4.4 3.0 280.5 5.150 London N. Kensington \n",
"3 22.05522 0.60592 5.071 4.1 2.7 287.0 5.375 London N. Kensington \n",
"4 26.58787 0.57028 5.212 4.0 3.0 291.5 5.525 London N. Kensington \n",
"\n",
" code \n",
"0 KC1 \n",
"1 KC1 \n",
"2 KC1 \n",
"3 KC1 \n",
"4 KC1 "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pyreadr.read_r(fn)[fn.split('.')[0]].head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment