Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jorisvandenbossche/ccd34426d7fe182c929089b6cd4557ac to your computer and use it in GitHub Desktop.
Save jorisvandenbossche/ccd34426d7fe182c929089b6cd4557ac to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Some timings with GeoPandas new Parquet and Feather file format support"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import geopandas"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# ignore the warnings of it being experimental\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\", \"this is an initial implementation of Parquet file support\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test case 1: Natural Earth 1:10m Admin 1 – States, Provinces\n",
"\n",
"https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"df = geopandas.read_file(\"zip+https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Writing"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.85 s, sys: 67.3 ms, total: 1.91 s\n",
"Wall time: 1.91 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_ne_10m.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.53 s, sys: 146 ms, total: 1.68 s\n",
"Wall time: 2.45 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_ne_10m.gpkg\", driver=\"GPKG\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 209 ms, sys: 28.2 ms, total: 237 ms\n",
"Wall time: 236 ms\n"
]
}
],
"source": [
"%time df.to_parquet(\"test_ne_10m.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 212 ms, sys: 19.2 ms, total: 231 ms\n",
"Wall time: 215 ms\n"
]
}
],
"source": [
"%time df.to_feather(\"test_ne_10m.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Reading"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"825 ms ± 8.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%timeit geopandas.read_file(\"test_ne_10m.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"729 ms ± 4.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"%timeit geopandas.read_file(\"test_ne_10m.gpkg\", driver='GPKG')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"161 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%timeit geopandas.read_parquet(\"test_ne_10m.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"134 ms ± 839 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%timeit geopandas.read_feather(\"test_ne_10m.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### File sizes"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 joris joris 10 Mai 20 20:35 test_ne_10m.cpg\n",
"-rw-r--r-- 1 joris joris 23M Mai 20 20:35 test_ne_10m.dbf\n",
"-rw-r--r-- 1 joris joris 19M Mai 20 20:35 test_ne_10m.feather\n",
"-rw-r--r-- 1 joris joris 27M Mai 20 20:35 test_ne_10m.gpkg\n",
"-rw-r--r-- 1 joris joris 20M Mai 20 20:35 test_ne_10m.parquet\n",
"-rw-r--r-- 1 joris joris 145 Mai 20 20:35 test_ne_10m.prj\n",
"-rw-r--r-- 1 joris joris 21M Mai 20 20:35 test_ne_10m.shp\n",
"-rw-r--r-- 1 joris joris 36K Mai 20 20:35 test_ne_10m.shx\n"
]
}
],
"source": [
"!ls test_ne_10m.* -lh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test case 2: OpenStreetMap buildings"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pyrosm"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Download pbf data\n",
"fp = pyrosm.get_data(\"London\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the OSM object\n",
"osm = pyrosm.OSM(fp)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"buildings = osm.get_buildings()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"df = buildings[[\"id\", \"osm_type\", \"building\", \"amenity\", \"addr:street\", \"timestamp\", \"geometry\"]].rename(columns={\"id\": \"osm_id\"})"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>osm_id</th>\n",
" <th>osm_type</th>\n",
" <th>building</th>\n",
" <th>amenity</th>\n",
" <th>addr:street</th>\n",
" <th>timestamp</th>\n",
" <th>geometry</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2956186</td>\n",
" <td>way</td>\n",
" <td>block</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.02162 51.44472, -0.02033 51.44469...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2956187</td>\n",
" <td>way</td>\n",
" <td>yes</td>\n",
" <td>townhall</td>\n",
" <td>Catford Broadway</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.02110 51.44523, -0.02132 51.44508...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2956188</td>\n",
" <td>way</td>\n",
" <td>yes</td>\n",
" <td>theatre</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.02004 51.44536, -0.02006 51.44528...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2956192</td>\n",
" <td>way</td>\n",
" <td>store</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.01900 51.44462, -0.01864 51.44458...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2956193</td>\n",
" <td>way</td>\n",
" <td>store</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.01752 51.44542, -0.01815 51.44551...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>584187</th>\n",
" <td>266218115929</td>\n",
" <td>relation</td>\n",
" <td>residential</td>\n",
" <td>None</td>\n",
" <td>Bedford Gardens</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.19751 51.50561, -0.19750 51.50562...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>584188</th>\n",
" <td>266229200037</td>\n",
" <td>relation</td>\n",
" <td>residential</td>\n",
" <td>None</td>\n",
" <td>Bedford Gardens</td>\n",
" <td>0</td>\n",
" <td>MULTIPOLYGON (((-0.19738 51.50565, -0.19730 51...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>584189</th>\n",
" <td>266395470798</td>\n",
" <td>relation</td>\n",
" <td>yes</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.11464 51.45445, -0.11467 51.45450...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>584190</th>\n",
" <td>266406556085</td>\n",
" <td>relation</td>\n",
" <td>yes</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.11409 51.45358, -0.11412 51.45362...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>584191</th>\n",
" <td>266417641373</td>\n",
" <td>relation</td>\n",
" <td>yes</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>0</td>\n",
" <td>POLYGON ((-0.11420 51.45375, -0.11422 51.45378...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>584192 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" osm_id osm_type building amenity addr:street \\\n",
"0 2956186 way block None None \n",
"1 2956187 way yes townhall Catford Broadway \n",
"2 2956188 way yes theatre None \n",
"3 2956192 way store None None \n",
"4 2956193 way store None None \n",
"... ... ... ... ... ... \n",
"584187 266218115929 relation residential None Bedford Gardens \n",
"584188 266229200037 relation residential None Bedford Gardens \n",
"584189 266395470798 relation yes None None \n",
"584190 266406556085 relation yes None None \n",
"584191 266417641373 relation yes None None \n",
"\n",
" timestamp geometry \n",
"0 0 POLYGON ((-0.02162 51.44472, -0.02033 51.44469... \n",
"1 0 POLYGON ((-0.02110 51.44523, -0.02132 51.44508... \n",
"2 0 POLYGON ((-0.02004 51.44536, -0.02006 51.44528... \n",
"3 0 POLYGON ((-0.01900 51.44462, -0.01864 51.44458... \n",
"4 0 POLYGON ((-0.01752 51.44542, -0.01815 51.44551... \n",
"... ... ... \n",
"584187 0 POLYGON ((-0.19751 51.50561, -0.19750 51.50562... \n",
"584188 0 MULTIPOLYGON (((-0.19738 51.50565, -0.19730 51... \n",
"584189 0 POLYGON ((-0.11464 51.45445, -0.11467 51.45450... \n",
"584190 0 POLYGON ((-0.11409 51.45358, -0.11412 51.45362... \n",
"584191 0 POLYGON ((-0.11420 51.45375, -0.11422 51.45378... \n",
"\n",
"[584192 rows x 7 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Polygon 583953\n",
"MultiPolygon 132\n",
"LineString 107\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.geom_type.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# Select only Polygon/MultiPolygons to have uniform geometry type\n",
"df = df[df.geom_type != \"LineString\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Writing"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 49.9 s, sys: 2.16 s, total: 52.1 s\n",
"Wall time: 52.2 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_london_buildings.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 54.6 s, sys: 442 ms, total: 55.1 s\n",
"Wall time: 55.2 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_london_buildings.gpkg\", driver=\"GPKG\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.14 s, sys: 104 ms, total: 1.25 s\n",
"Wall time: 1.26 s\n"
]
}
],
"source": [
"%time df.to_parquet(\"test_london_buildings.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.1 s, sys: 90.8 ms, total: 1.19 s\n",
"Wall time: 1.16 s\n"
]
}
],
"source": [
"%time df.to_feather(\"test_london_buildings.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Reading"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 16.7 s, sys: 350 ms, total: 17.1 s\n",
"Wall time: 16.9 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_file(\"test_london_buildings.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 17 s, sys: 234 ms, total: 17.2 s\n",
"Wall time: 17.2 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_file(\"test_london_buildings.gpkg\", driver='GPKG')"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.04 s, sys: 120 ms, total: 1.16 s\n",
"Wall time: 1.03 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_parquet(\"test_london_buildings.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 858 ms, sys: 94.6 ms, total: 952 ms\n",
"Wall time: 897 ms\n"
]
}
],
"source": [
"%time _ = geopandas.read_feather(\"test_london_buildings.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### File sizes"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 joris joris 10 Mai 20 21:20 test_london_buildings.cpg\n",
"-rw-r--r-- 1 joris joris 199M Mai 20 21:21 test_london_buildings.dbf\n",
"-rw-r--r-- 1 joris joris 65M Mai 20 21:22 test_london_buildings.feather\n",
"-rw-r--r-- 1 joris joris 147M Mai 20 21:22 test_london_buildings.gpkg\n",
"-rw-r--r-- 1 joris joris 56M Mai 20 21:22 test_london_buildings.parquet\n",
"-rw-r--r-- 1 joris joris 145 Mai 20 21:20 test_london_buildings.prj\n",
"-rw-r--r-- 1 joris joris 95M Mai 20 21:21 test_london_buildings.shp\n",
"-rw-r--r-- 1 joris joris 4,5M Mai 20 21:21 test_london_buildings.shx\n"
]
}
],
"source": [
"!ls test_london_buildings.* -lh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test case 3: OpenStreetMap points of interest"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pyrosm"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Download pbf data\n",
"fp = pyrosm.get_data(\"London\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the OSM object\n",
"osm = pyrosm.OSM(fp)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"pois = osm.get_pois()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"df = pois[[\"id\", \"osm_type\", \"amenity\", \"addr:street\", \"timestamp\", \"geometry\"]].rename(columns={\"id\": \"osm_id\"})"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>osm_id</th>\n",
" <th>osm_type</th>\n",
" <th>amenity</th>\n",
" <th>addr:street</th>\n",
" <th>timestamp</th>\n",
" <th>geometry</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>108042</td>\n",
" <td>node</td>\n",
" <td>pub</td>\n",
" <td>University Street</td>\n",
" <td>NaN</td>\n",
" <td>POINT (-0.13551 51.52356)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>108539</td>\n",
" <td>node</td>\n",
" <td>bicycle_rental</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>POINT (-0.09339 51.52913)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>109575</td>\n",
" <td>node</td>\n",
" <td>advice</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>POINT (-0.14312 51.52826)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>110075</td>\n",
" <td>node</td>\n",
" <td>bicycle_parking</td>\n",
" <td>None</td>\n",
" <td>NaN</td>\n",
" <td>POINT (-0.14028 51.53426)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>451152</td>\n",
" <td>node</td>\n",
" <td>pub</td>\n",
" <td>Regents Park Road</td>\n",
" <td>NaN</td>\n",
" <td>POINT (-0.19461 51.60084)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134244</th>\n",
" <td>260535140001</td>\n",
" <td>relation</td>\n",
" <td>school</td>\n",
" <td>Brick Lane</td>\n",
" <td>0.0</td>\n",
" <td>MULTIPOLYGON (((-0.07169 51.51895, -0.07171 51...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134245</th>\n",
" <td>261175302034</td>\n",
" <td>relation</td>\n",
" <td>college</td>\n",
" <td>None</td>\n",
" <td>0.0</td>\n",
" <td>MULTIPOLYGON (((0.00889 51.54038, 0.00842 51.5...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134246</th>\n",
" <td>261594886517</td>\n",
" <td>relation</td>\n",
" <td>school</td>\n",
" <td>None</td>\n",
" <td>0.0</td>\n",
" <td>MULTIPOLYGON (((0.03907 51.51750, 0.04075 51.5...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134247</th>\n",
" <td>262136116245</td>\n",
" <td>relation</td>\n",
" <td>school</td>\n",
" <td>None</td>\n",
" <td>0.0</td>\n",
" <td>MULTIPOLYGON (((-0.03010 51.51048, -0.02994 51...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>134248</th>\n",
" <td>262334979711</td>\n",
" <td>relation</td>\n",
" <td>school</td>\n",
" <td>None</td>\n",
" <td>0.0</td>\n",
" <td>MULTIPOLYGON (((-0.02641 51.51923, -0.02532 51...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>134249 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" osm_id osm_type amenity addr:street timestamp \\\n",
"0 108042 node pub University Street NaN \n",
"1 108539 node bicycle_rental None NaN \n",
"2 109575 node advice None NaN \n",
"3 110075 node bicycle_parking None NaN \n",
"4 451152 node pub Regents Park Road NaN \n",
"... ... ... ... ... ... \n",
"134244 260535140001 relation school Brick Lane 0.0 \n",
"134245 261175302034 relation college None 0.0 \n",
"134246 261594886517 relation school None 0.0 \n",
"134247 262136116245 relation school None 0.0 \n",
"134248 262334979711 relation school None 0.0 \n",
"\n",
" geometry \n",
"0 POINT (-0.13551 51.52356) \n",
"1 POINT (-0.09339 51.52913) \n",
"2 POINT (-0.14312 51.52826) \n",
"3 POINT (-0.14028 51.53426) \n",
"4 POINT (-0.19461 51.60084) \n",
"... ... \n",
"134244 MULTIPOLYGON (((-0.07169 51.51895, -0.07171 51... \n",
"134245 MULTIPOLYGON (((0.00889 51.54038, 0.00842 51.5... \n",
"134246 MULTIPOLYGON (((0.03907 51.51750, 0.04075 51.5... \n",
"134247 MULTIPOLYGON (((-0.03010 51.51048, -0.02994 51... \n",
"134248 MULTIPOLYGON (((-0.02641 51.51923, -0.02532 51... \n",
"\n",
"[134249 rows x 6 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Point 86677\n",
"Polygon 47308\n",
"LineString 176\n",
"MultiPolygon 88\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.geom_type.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# Select only points to have uniform geometry type\n",
"df = df[df.geom_type == \"Point\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Writing"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3.66 s, sys: 240 ms, total: 3.9 s\n",
"Wall time: 3.91 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_london_pois.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4.09 s, sys: 79.6 ms, total: 4.17 s\n",
"Wall time: 4.36 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_london_pois.gpkg\", driver=\"GPKG\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 122 ms, sys: 8.4 ms, total: 131 ms\n",
"Wall time: 128 ms\n"
]
}
],
"source": [
"%time df.to_parquet(\"test_london_pois.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 91.8 ms, sys: 11.8 ms, total: 104 ms\n",
"Wall time: 91.6 ms\n"
]
}
],
"source": [
"%time df.to_feather(\"test_london_pois.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Reading"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.95 s, sys: 43.9 ms, total: 2 s\n",
"Wall time: 1.99 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_file(\"test_london_pois.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.82 s, sys: 0 ns, total: 1.82 s\n",
"Wall time: 1.82 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_file(\"test_london_pois.gpkg\", driver='GPKG')"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 104 ms, sys: 23.1 ms, total: 127 ms\n",
"Wall time: 107 ms\n"
]
}
],
"source": [
"%time _ = geopandas.read_parquet(\"test_london_pois.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 80.2 ms, sys: 8.25 ms, total: 88.4 ms\n",
"Wall time: 82.3 ms\n"
]
}
],
"source": [
"%time _ = geopandas.read_feather(\"test_london_pois.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### File sizes"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 joris joris 10 Mai 20 20:33 test_london_pois.cpg\n",
"-rw-r--r-- 1 joris joris 24M Mai 20 20:33 test_london_pois.dbf\n",
"-rw-r--r-- 1 joris joris 2,9M Mai 20 20:34 test_london_pois.feather\n",
"-rw-r--r-- 1 joris joris 24M Mai 20 20:34 test_london_pois.gpkg\n",
"-rw-r--r-- 1 joris joris 2,3M Mai 20 20:34 test_london_pois.parquet\n",
"-rw-r--r-- 1 joris joris 145 Mai 20 20:33 test_london_pois.prj\n",
"-rw-r--r-- 1 joris joris 2,4M Mai 20 20:33 test_london_pois.shp\n",
"-rw-r--r-- 1 joris joris 678K Mai 20 20:33 test_london_pois.shx\n"
]
}
],
"source": [
"!ls test_london_pois.* -lh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test case 4: USA Census - TIGER/Line Shapefiles\n",
"\n",
"The ZIP Code Tabulation Area shapefile: https://www2.census.gov/geo/tiger/TIGER2019/ZCTA5/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df = geopandas.read_file(\"zip://../Downloads/tl_2019_us_zcta510.zip\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ZCTA5CE10</th>\n",
" <th>GEOID10</th>\n",
" <th>CLASSFP10</th>\n",
" <th>MTFCC10</th>\n",
" <th>FUNCSTAT10</th>\n",
" <th>ALAND10</th>\n",
" <th>AWATER10</th>\n",
" <th>INTPTLAT10</th>\n",
" <th>INTPTLON10</th>\n",
" <th>geometry</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>43451</td>\n",
" <td>43451</td>\n",
" <td>B5</td>\n",
" <td>G6350</td>\n",
" <td>S</td>\n",
" <td>63484186</td>\n",
" <td>157689</td>\n",
" <td>+41.3183010</td>\n",
" <td>-083.6174935</td>\n",
" <td>POLYGON ((-83.70873 41.32733, -83.70815 41.327...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>43452</td>\n",
" <td>43452</td>\n",
" <td>B5</td>\n",
" <td>G6350</td>\n",
" <td>S</td>\n",
" <td>121522304</td>\n",
" <td>13721730</td>\n",
" <td>+41.5157923</td>\n",
" <td>-082.9809454</td>\n",
" <td>POLYGON ((-83.08698 41.53780, -83.08256 41.537...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>43456</td>\n",
" <td>43456</td>\n",
" <td>B5</td>\n",
" <td>G6350</td>\n",
" <td>S</td>\n",
" <td>9320975</td>\n",
" <td>1003775</td>\n",
" <td>+41.6318300</td>\n",
" <td>-082.8393923</td>\n",
" <td>MULTIPOLYGON (((-82.83558 41.71082, -82.83515 ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>43457</td>\n",
" <td>43457</td>\n",
" <td>B5</td>\n",
" <td>G6350</td>\n",
" <td>S</td>\n",
" <td>48004681</td>\n",
" <td>0</td>\n",
" <td>+41.2673301</td>\n",
" <td>-083.4274872</td>\n",
" <td>POLYGON ((-83.49650 41.25371, -83.48382 41.253...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>43458</td>\n",
" <td>43458</td>\n",
" <td>B5</td>\n",
" <td>G6350</td>\n",
" <td>S</td>\n",
" <td>2573816</td>\n",
" <td>39915</td>\n",
" <td>+41.5304461</td>\n",
" <td>-083.2133648</td>\n",
" <td>POLYGON ((-83.22229 41.53102, -83.22228 41.532...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ZCTA5CE10 GEOID10 CLASSFP10 MTFCC10 FUNCSTAT10 ALAND10 AWATER10 \\\n",
"0 43451 43451 B5 G6350 S 63484186 157689 \n",
"1 43452 43452 B5 G6350 S 121522304 13721730 \n",
"2 43456 43456 B5 G6350 S 9320975 1003775 \n",
"3 43457 43457 B5 G6350 S 48004681 0 \n",
"4 43458 43458 B5 G6350 S 2573816 39915 \n",
"\n",
" INTPTLAT10 INTPTLON10 \\\n",
"0 +41.3183010 -083.6174935 \n",
"1 +41.5157923 -082.9809454 \n",
"2 +41.6318300 -082.8393923 \n",
"3 +41.2673301 -083.4274872 \n",
"4 +41.5304461 -083.2133648 \n",
"\n",
" geometry \n",
"0 POLYGON ((-83.70873 41.32733, -83.70815 41.327... \n",
"1 POLYGON ((-83.08698 41.53780, -83.08256 41.537... \n",
"2 MULTIPOLYGON (((-82.83558 41.71082, -82.83515 ... \n",
"3 POLYGON ((-83.49650 41.25371, -83.48382 41.253... \n",
"4 POLYGON ((-83.22229 41.53102, -83.22228 41.532... "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"33144"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Writing"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 15.7 s, sys: 640 ms, total: 16.3 s\n",
"Wall time: 16.5 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_us_zcta.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 16.8 s, sys: 3.68 s, total: 20.5 s\n",
"Wall time: 23.7 s\n"
]
}
],
"source": [
"%time df.to_file(\"test_us_zcta.gpkg\", driver=\"GPKG\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5 s, sys: 1.25 s, total: 6.25 s\n",
"Wall time: 7.09 s\n"
]
}
],
"source": [
"%time df.to_parquet(\"test_us_zcta.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4.55 s, sys: 1.46 s, total: 6.02 s\n",
"Wall time: 6.13 s\n"
]
}
],
"source": [
"%time df.to_feather(\"test_us_zcta.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Reading"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10.3 s, sys: 385 ms, total: 10.7 s\n",
"Wall time: 10.7 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_file(\"test_us_zcta.shp\", driver='ESRI Shapefile')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10.1 s, sys: 492 ms, total: 10.6 s\n",
"Wall time: 10.6 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_file(\"test_us_zcta.gpkg\", driver='GPKG')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4.24 s, sys: 1.02 s, total: 5.25 s\n",
"Wall time: 5.17 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_parquet(\"test_us_zcta.parquet\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3.83 s, sys: 412 ms, total: 4.25 s\n",
"Wall time: 4.27 s\n"
]
}
],
"source": [
"%time _ = geopandas.read_feather(\"test_us_zcta.feather\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### File sizes"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 joris joris 10 Mai 20 20:38 test_us_zcta.cpg\n",
"-rw-r--r-- 1 joris joris 19M Mai 20 20:38 test_us_zcta.dbf\n",
"-rw-r--r-- 1 joris joris 783M Mai 20 20:39 test_us_zcta.feather\n",
"-rw-r--r-- 1 joris joris 839M Mai 20 20:39 test_us_zcta.gpkg\n",
"-rw-r--r-- 1 joris joris 790M Mai 20 20:39 test_us_zcta.parquet\n",
"-rw-r--r-- 1 joris joris 167 Mai 20 20:38 test_us_zcta.prj\n",
"-rw-r--r-- 1 joris joris 811M Mai 20 20:38 test_us_zcta.shp\n",
"-rw-r--r-- 1 joris joris 260K Mai 20 20:38 test_us_zcta.shx\n"
]
}
],
"source": [
"!ls test_us_zcta.* -lh"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment