Skip to content

Instantly share code, notes, and snippets.

@jhconning
Last active April 3, 2021 08:45
Show Gist options
  • Save jhconning/66c432d0d3239cdd4d3636fc98885625 to your computer and use it in GitHub Desktop.
Save jhconning/66c432d0d3239cdd4d3636fc98885625 to your computer and use it in GitHub Desktop.
json into a dataframe with normalize
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reading in nested json from an url into pandas"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import json \n",
"import requests\n",
"from pandas.io.json import json_normalize"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"url = \"http://overpass-api.de/api/interpreter?data=[out:json];node[highway=traffic_signals](40.37753432750217,-74.35908999097687,41.01617676767561,-73.60019572851844);out%20meta;\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The requests library is built from urllib3 but makes it a bit easier.\n",
"\n",
"If this data had not contained nested data (i.e. dictionaries within dictionaries) we could have simply used [pd.read_json](https://chrisalbon.com/python/data_wrangling/load_json_file_into_pandas/)\n",
"\n",
"```df = pd.read_json(url)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"d = json.loads(requests.get(url).text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that `d` is now loaded as a dictionary with entries that look like this:\n",
"\n",
" 'version': 0.6,\n",
" 'generator': 'Overpass API 0.7.55.4 3079d8ea',\n",
" 'osm3s': {'timestamp_osm_base': '2018-10-31T17:29:02Z',\n",
" 'copyright': 'The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.'},\n",
" 'elements': [{'type': 'node',\n",
" 'id': 42421728,\n",
" 'lat': 40.7980486,\n",
" 'lon': -73.9600434,\n",
" 'timestamp': '2018-07-18T16:25:37Z',\n",
" 'version': 5,\n",
" 'changeset': 60842978,\n",
" 'user': 'tabaya',\n",
" 'uid': 8500573,\n",
" 'tags': {'crossing': 'traffic_signals', 'highway': 'traffic_signals'}},\n",
" {'type': 'node',\n",
" 'id': 42421731,\n",
" 'lat': 40.798645,\n",
" 'lon': -73.9614743,\n",
" 'timestamp': '2017-03-21T19:36:26Z',\n",
" 'version': 7,\n",
" 'changeset': 47049187,\n",
" 'user': 'poornibadrinath',\n",
" 'uid': 1597155,\n",
" 'tags': {'highway': 'traffic_signals'}},"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['version', 'generator', 'osm3s', 'elements']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(d.keys())"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>changeset</th>\n",
" <th>id</th>\n",
" <th>lat</th>\n",
" <th>lon</th>\n",
" <th>tags</th>\n",
" <th>timestamp</th>\n",
" <th>type</th>\n",
" <th>uid</th>\n",
" <th>user</th>\n",
" <th>version</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>60842978</td>\n",
" <td>42421728</td>\n",
" <td>40.798049</td>\n",
" <td>-73.960043</td>\n",
" <td>{'crossing': 'traffic_signals', 'highway': 'tr...</td>\n",
" <td>2018-07-18T16:25:37Z</td>\n",
" <td>node</td>\n",
" <td>8500573</td>\n",
" <td>tabaya</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>47049187</td>\n",
" <td>42421731</td>\n",
" <td>40.798645</td>\n",
" <td>-73.961474</td>\n",
" <td>{'highway': 'traffic_signals'}</td>\n",
" <td>2017-03-21T19:36:26Z</td>\n",
" <td>node</td>\n",
" <td>1597155</td>\n",
" <td>poornibadrinath</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>16529897</td>\n",
" <td>42421737</td>\n",
" <td>40.799237</td>\n",
" <td>-73.962876</td>\n",
" <td>{'highway': 'traffic_signals'}</td>\n",
" <td>2013-06-12T21:03:39Z</td>\n",
" <td>node</td>\n",
" <td>1541236</td>\n",
" <td>AlexSof</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>16529671</td>\n",
" <td>42421741</td>\n",
" <td>40.800431</td>\n",
" <td>-73.965705</td>\n",
" <td>{'highway': 'traffic_signals'}</td>\n",
" <td>2013-06-12T20:48:02Z</td>\n",
" <td>node</td>\n",
" <td>1541236</td>\n",
" <td>AlexSof</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>13681649</td>\n",
" <td>42421745</td>\n",
" <td>40.801401</td>\n",
" <td>-73.967994</td>\n",
" <td>{'highway': 'traffic_signals'}</td>\n",
" <td>2012-10-30T02:20:04Z</td>\n",
" <td>node</td>\n",
" <td>716239</td>\n",
" <td>WestsideGuy</td>\n",
" <td>7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" changeset id lat lon \\\n",
"0 60842978 42421728 40.798049 -73.960043 \n",
"1 47049187 42421731 40.798645 -73.961474 \n",
"2 16529897 42421737 40.799237 -73.962876 \n",
"3 16529671 42421741 40.800431 -73.965705 \n",
"4 13681649 42421745 40.801401 -73.967994 \n",
"\n",
" tags timestamp \\\n",
"0 {'crossing': 'traffic_signals', 'highway': 'tr... 2018-07-18T16:25:37Z \n",
"1 {'highway': 'traffic_signals'} 2017-03-21T19:36:26Z \n",
"2 {'highway': 'traffic_signals'} 2013-06-12T21:03:39Z \n",
"3 {'highway': 'traffic_signals'} 2013-06-12T20:48:02Z \n",
"4 {'highway': 'traffic_signals'} 2012-10-30T02:20:04Z \n",
"\n",
" type uid user version \n",
"0 node 8500573 tabaya 5 \n",
"1 node 1597155 poornibadrinath 7 \n",
"2 node 1541236 AlexSof 8 \n",
"3 node 1541236 AlexSof 7 \n",
"4 node 716239 WestsideGuy 7 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = json_normalize(d,'elements')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice the tags column is a column of dictionaries.\n",
"We could further unpack the tags column:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>FIXME</th>\n",
" <th>NJDOT_SRI</th>\n",
" <th>addr:housenumber</th>\n",
" <th>addr:postcode</th>\n",
" <th>addr:street</th>\n",
" <th>asset_ref</th>\n",
" <th>bicycle</th>\n",
" <th>button_operated</th>\n",
" <th>census:population</th>\n",
" <th>crossing</th>\n",
" <th>...</th>\n",
" <th>railway</th>\n",
" <th>ref</th>\n",
" <th>route_ref</th>\n",
" <th>side</th>\n",
" <th>sloped_curb</th>\n",
" <th>source</th>\n",
" <th>sources</th>\n",
" <th>traffic_signals</th>\n",
" <th>traffic_signals:direction</th>\n",
" <th>tunnel</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>traffic_signals</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 46 columns</p>\n",
"</div>"
],
"text/plain": [
" FIXME NJDOT_SRI addr:housenumber addr:postcode addr:street asset_ref \\\n",
"0 NaN NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN NaN NaN \n",
"\n",
" bicycle button_operated census:population crossing ... railway \\\n",
"0 NaN NaN NaN traffic_signals ... NaN \n",
"1 NaN NaN NaN NaN ... NaN \n",
"2 NaN NaN NaN NaN ... NaN \n",
"3 NaN NaN NaN NaN ... NaN \n",
"4 NaN NaN NaN NaN ... NaN \n",
"\n",
" ref route_ref side sloped_curb source sources traffic_signals \\\n",
"0 NaN NaN NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN NaN NaN NaN \n",
"\n",
" traffic_signals:direction tunnel \n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
"[5 rows x 46 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"json_normalize(df.tags).head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"gist": {
"data": {
"description": "from url to pandas via json and json_normalize ",
"public": true
},
"id": ""
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment