Skip to content

Instantly share code, notes, and snippets.

@paleolimbot
Created December 5, 2024 22:05
Show Gist options
  • Save paleolimbot/4bbfaf9dd79a306e21e59156004c7e33 to your computer and use it in GitHub Desktop.
Save paleolimbot/4bbfaf9dd79a306e21e59156004c7e33 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get some bigish data to work with:\n",
"\n",
"```shell\n",
"curl -L https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/microsoft-buildings-point.fgb.zip \\\n",
" -o microsoft-buildings-point.fgb.zip\n",
"unzip microsoft-buildings-point.fgb.zip\n",
"rm microsoft-buildings-point.fgb.zip\n",
"\n",
"pip install duckdb\n",
"```\n",
"\n",
"Getting a dev version of the geography extension. Doesn't quite work unless you have a PAT set...you can visit a [recent CI run result](https://github.com/paleolimbot/duckdb-geography/actions/runs/12187812016) to do basically the same thing.\n",
"\n",
"```shell\n",
"PLATFORM=\"osx_arm64\" # use linux_amd64_gcc4 for linux/Python\n",
"ARTIFACT_URL=$(curl -L \\\n",
" \"https://api.github.com/repos/paleolimbot/duckdb-geography/actions/artifacts?name=geography-v1.1.3-extension-$PLATFORM\" | \\\n",
" jq -r '.artifacts.[0].url')\n",
"\n",
"DOWNLOAD_URL=$(curl -L \"$ARTIFACT_URL\" | jq -r '.archive_download_url')\n",
"\n",
"curl -L \"$DOWNLOAD_URL\" \\\n",
" -H \"Authorization: Bearer <YOUR-TOKEN>\"\n",
" -o geography.duckdb_extension.zip\n",
"unzip geography.duckdb_extension.zip\n",
"\n",
"# then con.raw_sql(\"INSTALL 'geography.duckdb_extension';\") below\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set up the connection. Using ibis because the geopandas integration lets us plot this quickly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"import ibis\n",
"from ibis import _\n",
"\n",
"ibis.options.interactive = True\n",
"\n",
"# Because the geography extension isn't published yet and we'll use it!\n",
"con = ibis.duckdb.connect(allow_unsigned_extensions=True)\n",
"con.load_extension(\"spatial\")\n",
"con.load_extension(\"geography\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make the .fgb into a .parquet so this all goes faster. Takes about 2.5 min for the first go but the Parquet read after that is almost instantaneous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"con.raw_sql(\"\"\"\n",
"COPY (\n",
" SELECT geom,\n",
" FROM 'microsoft-buildings-point.fgb'\n",
") TO 'microsoft-buildings-point.parquet'\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Partitioning by S2 cell:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<duckdb.duckdb.DuckDBPyConnection at 0x1295a4df0>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shutil.rmtree(\"buildings_by_cell\", ignore_errors=True)\n",
"\n",
"# This mechanism takes a cell level as its only parameter\n",
"cell_level = 3\n",
"row_group_size = 122880\n",
"\n",
"con.raw_sql(f\"\"\"\n",
"COPY (\n",
" SELECT\n",
" geom.st_aswkb().s2_cellfromwkb().s2_cell_parent({cell_level}).s2_cell_token() as partition_cell,\n",
" geom\n",
" FROM 'microsoft-buildings-point.parquet'\n",
") TO \"buildings_by_cell\" WITH (\n",
" FORMAT PARQUET,\n",
" PARTITION_BY partition_cell,\n",
" -- Can also specify in bytes via ROW_GROUP_SIZE_BYTES\n",
" ROW_GROUP_SIZE {row_group_size}\n",
" -- Unfortunately we can't set file sizes here\n",
" -- Not implemented Error: Can't combine file rotation (e.g., ROW_GROUPS_PER_FILE) and PARTITION_BY for COPY\n",
")\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check file sizes (rather variable since DuckDB won't let us set a max and because some cells just don't have that many buildings):"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 deweydunnington staff 21M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 9.5M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 42M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 24M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 90M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 197M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 14K Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 196M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 25M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 106M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 216M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 525M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 234M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 3.0M Dec 5 15:54 data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 344M Dec 5 15:54 data_0.parquet\n"
]
}
],
"source": [
"! ls -lh buildings_by_cell/** | grep -e \".parquet\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the partitioning:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: >"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAEBCAYAAADYacenAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAfq0lEQVR4nO3de3BU5f3H8U9QsiSQDRBCsikbJsotXKJDOgOhSpVbhiClkrYOGEULQWmkQtRiRlJFZEJ1qjIFUmppwMEMVgZqtQgDyqUqaJoQAxhioLHByQXrSDZESbic3x9t9teVANlNcs7u5v2a2ZF9ztmz3zwumw/PeZ5zQgzDMAQAAGCSHlYXAAAAuhfCBwAAMBXhAwAAmIrwAQAATEX4AAAApiJ8AAAAUxE+AACAqQgfAADAVDdaXcB3Xb58WTU1NYqIiFBISIjV5QAAgHYwDEONjY2Ki4tTjx7XHtvwu/BRU1Mjp9NpdRkAAMAHp0+f1qBBg665j1fh45lnntGKFSs82oYPH64TJ05Iks6fP6/HHntMW7duVXNzs1JTU7V+/XrFxMS0+z0iIiLcxdvtdm/KAwAAFnG5XHI6ne7f49fi9cjHqFGjtHfv3v8/wI3/f4ilS5fqb3/7m9544w1FRkbqkUce0ezZs/XBBx+0+/itp1rsdjvhAwCAANOeKRNeh48bb7xRsbGxV7Q3NDRo48aNKiws1KRJkyRJBQUFSkxM1OHDhzV+/Hhv3woAAAQhr1e7VFZWKi4uTjfddJPuvfdeVVdXS5KKi4t14cIFTZkyxb3viBEjFB8fr0OHDl31eM3NzXK5XB4PAAAQvLwKH+PGjdOmTZu0a9cu5efnq6qqSrfffrsaGxtVV1en0NBQ9e3b1+M1MTExqquru+ox8/LyFBkZ6X4w2RQAgODm1WmX6dOnu/+clJSkcePGafDgwfrzn/+ssLAwnwrIyclRdna2+3nrhBUAABCcOnSRsb59+2rYsGE6efKkYmNj1dLSorNnz3rsU19f3+YckVY2m809uZRJpgAABL8OhY9z587p1KlTcjgcSk5OVs+ePfXuu++6t1dUVKi6ulopKSkdLhQAAAQHr067PP7445o5c6YGDx6smpoaPf3007rhhhs0Z84cRUZGav78+crOzlb//v1lt9u1ePFipaSksNIF6KDKyko1NjZaXUa3FhERoaFDh1pdBhAUvAofX3zxhebMmaOvvvpK0dHRuu2223T48GFFR0dLkl566SX16NFD6enpHhcZA+C7yspKDRs2zOoyIOmzzz4jgACdIMQwDMPqIv6Xy+VSZGSkGhoamP8BSCopKVFycrK2bNmixMREq8vplsrLy5WRkaHi4mKNHTvW6nIAv+TN72+/u7dLVwv04evq6mo1NTVZXYZXevfurfj4eKvLCFjl5eU+v5ZTBQD8UbcKHwxfI5BlZGT49DpOFQDwN90qfLSOeATq8HXr0O/KlSuVkJBgdTntUlVVpdzc3IDtc3/Q+v/d2z5sfV0gj/QBCE7dKny0SkxMDOjztmlpaQFTf0lJiXJzcwO+z/0BfQggWHToOh8AAADeInwAAABTdcvTLgDgi46sPPI3rISClQgfAHAd1dXVknxfceSvWAkFqxA+AOA6Wq+ts3LlSqWlpVlcTcexEgpWI3wAQDslJCSw4gjoBEw4BQAApiJ8AAAAU3HaBQAQVAL9Hl5msHq1E+EDABA0uIdX+1m52onwAQAIGoF+Dy8z+MNqJ8IHACDocC8k/8aEUwAAYCrCBwAAMBXhAwAAmIrwAQAATEX4AAAApiJ8AAAAUxE+AACAqQgfAADAVIQPAABgKsIHAAAwFeEDAACYivABAABM1aHwsXr1aoWEhGjJkiXutjvuuEMhISEej4cffrijdQIAgCDh811ti4qKtGHDBiUlJV2xLTMzU88++6z7eXh4uK9vAwAAgoxPIx/nzp3Tvffeq1deeUX9+vW7Ynt4eLhiY2PdD7vd3uFCAQBAcPBp5CMrK0szZszQlClT9Nxzz12x/bXXXtOWLVsUGxurmTNnKjc396qjH83NzWpubnY/d7lcvpQE4CrKy8utLiHgVVVVuf9bUlJicTUd1/qZsOqzERERoaFDh1ry3vAPXoePrVu3qqSkREVFRW1unzt3rgYPHqy4uDiVlZVp2bJlqqio0Pbt29vcPy8vTytWrPC2DADXUV1dLUnKyMiwuJLgkZubq9zcXKvL6DRWfjY+++wzAkg35lX4OH36tB599FHt2bNHvXr1anOfhQsXuv88ZswYORwOTZ48WadOndLNN998xf45OTnKzs52P3e5XHI6nd6UBaANTU1NkqSVK1cqLS3N4moC25o1a/Tqq69aXUan27Fjh+Lj4019z/LycmVkZKixsdHU94V/8Sp8FBcX68yZMxo7dqy77dKlSzp48KDWrl2r5uZm3XDDDR6vGTdunCTp5MmTbYYPm80mm83mS+0A2iEhIcHj7yy81/ov9EWLFmnBggUWV9M5OPUBK3kVPiZPnqyjR496tD344IMaMWKEli1bdkXwkKTS0lJJksPh8L1KBLzWUwDwXut5+R07dnh1jv6DDz6QFDzzFKxUU1MjSYqLiyPIAZ3Aq/ARERGh0aNHe7T17t1bUVFRGj16tE6dOqXCwkKlpaUpKipKZWVlWrp0qSZOnNjmklx0H3fffbfVJQS8tiZ3t0ewzVOwUlhYmNUlAEHB5+t8tCU0NFR79+7Vyy+/rKamJjmdTqWnp2v58uWd+TYIUFu2bFFiYqLVZQSkL7/8Uv/+97+9es1f/vIXbdu2LahOFVhl586dys3NVWxsrNWlBI2uWmlj9UqezhLsp8U6HD7279/v/rPT6dSBAwc6ekgEqcTERIasTVRVVaVt27ZxqqATBPovMn9i1iqsYFjlFcwrgjp15AMAgGvp6lVYratpAnmktTusCCJ8AABM19WrsBhp9W/c1RYAAJiK8AEAAExF+AAAAKZizgcAIOgE8oX1unq5sD+s3iJ8AACCRuvVtDMzMy2upOO6erlwRERElx7/WggfAICg4XA4VFNTo9raWqtL8ZkZy4WtvogZ4QMAEFQcDkdQ3E8smJcLM+EUAACYipGP/1FZWenXV5Tz9e6mVqqqqnL/OVBqDhatd2IFAH9D+PivyspKDRs2zOoy2sXXu5taLRjutRCIuBMrAH9D+Piv1hEPf78fgC93N7VSVVWV+3buXXUvB7SNO7EC8FeEj+8I5gk+VigpKXGHj66+lwM8cZoLgL9iwikAADAV4QMAAJiK8AEAAExF+AAAAKYifAAAAFMRPgAAgKkIHwAAwFSEDwAAYCrCBwAAMBXhAwAAmIrwAQAATEX4AAAApiJ8AAAAUxE+AACAqToUPlavXq2QkBAtWbLE3Xb+/HllZWUpKipKffr0UXp6uurr6ztaJwAACBI+h4+ioiJt2LBBSUlJHu1Lly7VW2+9pTfeeEMHDhxQTU2NZs+e3eFCAQBAcPApfJw7d0733nuvXnnlFfXr18/d3tDQoI0bN+rFF1/UpEmTlJycrIKCAn344Yc6fPhwpxUNAAACl0/hIysrSzNmzNCUKVM82ouLi3XhwgWP9hEjRig+Pl6HDh1q81jNzc1yuVweDwAAELxu9PYFW7duVUlJiYqKiq7YVldXp9DQUPXt29ejPSYmRnV1dW0eLy8vTytWrPC2DAAAEKC8Ch+nT5/Wo48+qj179qhXr16dUkBOTo6ys7Pdz10ul5xOZ6cc+2pKSkquaCsvL/f4LzoH/QkA+C6vwkdxcbHOnDmjsWPHutsuXbqkgwcPau3atdq9e7daWlp09uxZj9GP+vp6xcbGtnlMm80mm83mW/VecjgckqTMzMyr7pORkWFKLQAAdFdehY/Jkyfr6NGjHm0PPvigRowYoWXLlsnpdKpnz5569913lZ6eLkmqqKhQdXW1UlJSOq9qHzkcDtXU1Ki2tvaKbeXl5crIyNCWLVuUmJhoQXXBa9++fXr88cetLgMA4Ce8Ch8REREaPXq0R1vv3r0VFRXlbp8/f76ys7PVv39/2e12LV68WCkpKRo/fnznVd0BDofDPQLSlsTERI+RHXQcp14AAP/L6wmn1/PSSy+pR48eSk9PV3Nzs1JTU7V+/frOfhsAABCgOhw+9u/f7/G8V69eWrdundatW9fRQwMAgCDEvV0AAICpCB8AAMBUhA8AAGAqwgcAADAV4QMAAJiK8AEAAExF+AAAAKYifAAAAFMRPgAAgKkIHwAAwFSEDwAAYCrCBwAAMBXhAwAAmIrwAQAATEX4AAAApiJ8AAAAUxE+AACAqQgfAADAVIQPAABgKsIHAAAwFeEDAACYivABAABMRfgAAACmInwAAABTET4AAICpCB8AAMBUhA8AAGAqwgcAADCVV+EjPz9fSUlJstvtstvtSklJ0TvvvOPefscddygkJMTj8fDDD3d60QAAIHDd6M3OgwYN0urVqzV06FAZhqHNmzdr1qxZOnLkiEaNGiVJyszM1LPPPut+TXh4eOdWDAAAAppX4WPmzJkez1etWqX8/HwdPnzYHT7Cw8MVGxvbeRUCAICg4vOcj0uXLmnr1q1qampSSkqKu/21117TgAEDNHr0aOXk5Oibb7655nGam5vlcrk8HgAAIHh5NfIhSUePHlVKSorOnz+vPn36aMeOHRo5cqQkae7cuRo8eLDi4uJUVlamZcuWqaKiQtu3b7/q8fLy8rRixQrffwIAABBQvA4fw4cPV2lpqRoaGrRt2zbNmzdPBw4c0MiRI7Vw4UL3fmPGjJHD4dDkyZN16tQp3XzzzW0eLycnR9nZ2e7nLpdLTqfThx8FAAAEAq/DR2hoqIYMGSJJSk5OVlFRkdasWaMNGzZcse+4ceMkSSdPnrxq+LDZbLLZbN6WAQAAApTX4eO7Ll++rObm5ja3lZaWSpIcDkdH3waAj6qqqlRSUmJ1GQGtqqrK/V/6smNa+xLdm1fhIycnR9OnT1d8fLwaGxtVWFio/fv3a/fu3Tp16pQKCwuVlpamqKgolZWVaenSpZo4caKSkpK6qn4AV1FXVydJys3NVW5ursXVBAf6svO0fj7RPXkVPs6cOaP7779ftbW1ioyMVFJSknbv3q2pU6fq9OnT2rt3r15++WU1NTXJ6XQqPT1dy5cv76raAVzDt99+K0latGiRFixYYHE1gS0vL0/btm3TT37yE+Xk5FhdTkD74x//qPz8fPfnE92TV+Fj48aNV93mdDp14MCBDhcEoHO1tLRYXULQuHDhgtUlBI2ampp2ncKKiIjQ0KFDTagIZurwnA+gvThfbq7GxkZJ//lHw7X+4YD2e/PNN/Xmm29aXUZQyM/PV35+frv23bFjh+Lj47u4Iv9RXl5udQldjvCBLsfcA2stWbJE9913n9VlBLSdO3cqNzdXK1euVFpamtXlBLTXX39dzz//vFevufvuu7uoGv8WERFhdQldhvCBLsfcA2u0nluPiorS2LFjrS4noLX+SzQhIYG+7KDWvmxPkCsvL1dGRoa2bNmixMREM8rzG8F+uonwAdPExcXxxW2iuLg4q0sArsqbIJeYmMh3R5Dx+d4uAAAAviB8AAAAUxE+AACAqZjzAQAw3YkTJ6679L47LDntrggfAADTDBgwQJL03HPP6bnnnmvXa4J5yWl3RfgAAJgmOjpaktq9fDbYl5x2V4QPAIDpWD7bvTHhFAAAmIqRD5jm888/594uJqqpqbG6BABoE+HjO/jl6Lvq6mo1NTVd0V5ZWSmJG5xZJSwszOoSAMAD4eO/HA6HJCkzM9PiSoLXsmXL9LOf/czqMrqN1puhxcbGWl0KAHggfPyXw+FQTU2NamtrrS4lILXeAGrlypVKSEjw2PbBBx8oPz9fY8aMYYKZibhGAgB/Rfj4Hw6Hwz0CAt+kpaW1GTDy8/MtqAYA4I8IH+hUbc2ZqaqqsqASAIC/InygU7Rnzkzv3r3NKgcA4McIH+gU15oz0zofJD4+3oLKAAD+hvCBTsOcGQBAe3CFUwAAYCrCBwAAMBXhAwAAmIrwAQAATEX4AAAApiJ8AAAAUxE+AACAqbwKH/n5+UpKSpLdbpfdbldKSoreeecd9/bz588rKytLUVFR6tOnj9LT01VfX9/pRQMAgMDlVfgYNGiQVq9ereLiYv3jH//QpEmTNGvWLB0/flyStHTpUr311lt64403dODAAdXU1Gj27NldUjgAAAhMXl3hdObMmR7PV61apfz8fB0+fFiDBg3Sxo0bVVhYqEmTJkmSCgoKlJiYqMOHD2v8+PGdVzUAAAhYPs/5uHTpkrZu3aqmpialpKSouLhYFy5c0JQpU9z7jBgxQvHx8Tp06NBVj9Pc3CyXy+XxAAAAwcvr8HH06FH16dNHNptNDz/8sHbs2KGRI0eqrq5OoaGh6tu3r8f+MTExqquru+rx8vLyFBkZ6X44nU6vfwgAABA4vA4fw4cPV2lpqT766CMtWrRI8+bN06effupzATk5OWpoaHA/Tp8+7fOxAACA//P6rrahoaEaMmSIJCk5OVlFRUVas2aN7rnnHrW0tOjs2bMeox/19fWKjY296vFsNptsNpv3lQMAgIDU4et8XL58Wc3NzUpOTlbPnj317rvvurdVVFSourpaKSkpHX0bAAAQJLwa+cjJydH06dMVHx+vxsZGFRYWav/+/dq9e7ciIyM1f/58ZWdnq3///rLb7Vq8eLFSUlJY6QIAANy8Ch9nzpzR/fffr9raWkVGRiopKUm7d+/W1KlTJUkvvfSSevToofT0dDU3Nys1NVXr16/vksIBAEBg8ip8bNy48Zrbe/XqpXXr1mndunUdKgoAAAQv7u0CAABM5fVqF8BXJSUlVpfQrVRVVVldAgC0ifCBLudwOCRJmZmZFlfSPfXu3dvqEgDAA+EDXc7hcKimpka1tbVWl9KtlJeXKyMjQ/Hx8VaXAgAeCB8whcPhcI+AAAC6NyacAgAAUxE+AACAqQgfAADAVIQPAABgKsIHAAAwFeEDAACYivABAABMRfgAAACmInwAAABTET4AAICpCB8AAMBUhA8AAGAqwgcAADAV4QMAAJiK8AEAAExF+AAAAKYifAAAAFMRPgAAgKkIHwAAwFSEDwAAYKobrS4AQNcqKSmxuoSAV1VVZXUJQFAhfABByuFwSJIyMzMtriR49O7d2+oSgKBA+ACClMPhUE1NjWpra60uJeCVl5crIyND8fHxVpcCBAWvwkdeXp62b9+uEydOKCwsTBMmTNBvfvMbDR8+3L3PHXfcoQMHDni87qGHHtLvf//7zqkYQLs5HA73CAgA+AuvJpweOHBAWVlZOnz4sPbs2aMLFy5o2rRpampq8tgvMzNTtbW17sfzzz/fqUUDAIDA5dXIx65duzyeb9q0SQMHDlRxcbEmTpzobg8PD1dsbGznVAgAAIJKh5baNjQ0SJL69+/v0f7aa69pwIABGj16tHJycvTNN99c9RjNzc1yuVweDwAAELx8nnB6+fJlLVmyRD/4wQ80evRod/vcuXM1ePBgxcXFqaysTMuWLVNFRYW2b9/e5nHy8vK0YsUKX8sAAAABxufwkZWVpWPHjun999/3aF+4cKH7z2PGjJHD4dDkyZN16tQp3XzzzVccJycnR9nZ2e7nLpdLTqfT17IAAICf8yl8PPLII3r77bd18OBBDRo06Jr7jhs3TpJ08uTJNsOHzWaTzWbzpQwAABCAvAofhmFo8eLF2rFjh/bv36+EhITrvqa0tFSSWO4HIOBxtdiOKy8vt7oE+AGvwkdWVpYKCwv15ptvKiIiQnV1dZKkyMhIhYWF6dSpUyosLFRaWpqioqJUVlampUuXauLEiUpKSuqSHwAAuhpXi+18ERERVpcAC4UYhmG0e+eQkDbbCwoK9MADD+j06dPKyMjQsWPH1NTUJKfTqbvvvlvLly+X3W5v13u4XC5FRkaqoaGh3a8BgK7Wet0idFxERISGDh1qdRnoZN78/vYqfJiB8AEAQODx5vd3h67zAQAA4C3CBwAAMBXhAwAAmIrwAQAATEX4AAAApvL58updpXXxDTeYAwAgcLT+3m7PIlq/Cx+NjY2SxP1dAAAIQI2NjYqMjLzmPn53nY/Lly+rpqZGERERV72oWTBpvZHe6dOnua6JF+g339BvvqHffEO/+SZQ+80wDDU2NiouLk49elx7VoffjXz06NHjujerC0Z2uz2gPmT+gn7zDf3mG/rNN/SbbwKx36434tGKCacAAMBUhA8AAGAqwofFbDabnn76adlsNqtLCSj0m2/oN9/Qb76h33zTHfrN7yacAgCA4MbIBwAAMBXhAwAAmIrwAQAATEX4AAAApiJ8AAAAUxE+TLJq1SpNmDBB4eHh6tu37xXbP/nkE82ZM0dOp1NhYWFKTEzUmjVrrthv//79Gjt2rGw2m4YMGaJNmzZ1ffEWul6/SdIvf/lLJScny2az6dZbb21zn7KyMt1+++3q1auXnE6nnn/++a4r2g+0p9+qq6s1Y8YMhYeHa+DAgXriiSd08eJFj3262+etLSUlJZo6dar69u2rqKgoLVy4UOfOnfPYpz192d189tlnmjVrlgYMGCC73a7bbrtN+/bt89iHfvO0f/9+hYSEtPkoKipy7xcM32eED5O0tLTopz/9qRYtWtTm9uLiYg0cOFBbtmzR8ePH9dRTTyknJ0dr165171NVVaUZM2bozjvvVGlpqZYsWaIFCxZo9+7dZv0Yprtev7X6+c9/rnvuuafNbS6XS9OmTdPgwYNVXFysF154Qc8884z+8Ic/dEXJfuF6/Xbp0iXNmDFDLS0t+vDDD7V582Zt2rRJv/71r937dMfP23fV1NRoypQpGjJkiD766CPt2rVLx48f1wMPPODepz192R3dddddunjxot577z0VFxfrlltu0V133aW6ujpJ9FtbJkyYoNraWo/HggULlJCQoO9///uSguj7zICpCgoKjMjIyHbt+4tf/MK488473c9/9atfGaNGjfLY55577jFSU1M7s0S/1J5+e/rpp41bbrnlivb169cb/fr1M5qbm91ty5YtM4YPH97JVfqfq/Xbzp07jR49ehh1dXXutvz8fMNut7v7qTt/3lpt2LDBGDhwoHHp0iV3W1lZmSHJqKysNAyjfX3Z3Xz55ZeGJOPgwYPuNpfLZUgy9uzZYxgG/dYeLS0tRnR0tPHss8+624Ll+4yRDz/W0NCg/v37u58fOnRIU6ZM8dgnNTVVhw4dMru0gHLo0CFNnDhRoaGh7rbU1FRVVFTo66+/trAy6xw6dEhjxoxRTEyMuy01NVUul0vHjx9379PdP2/Nzc0KDQ31uENnWFiYJOn999+X1L6+7G6ioqI0fPhwvfrqq2pqatLFixe1YcMGDRw4UMnJyZLot/b461//qq+++koPPviguy1Yvs8IH37qww8/1Ouvv66FCxe62+rq6jz+okpSTEyMXC6Xvv32W7NLDBhX67fWbd1Re/qEz5s0adIk1dXV6YUXXlBLS4u+/vprPfnkk5Kk2tpaSXy+2hISEqK9e/fqyJEjioiIUK9evfTiiy9q165d6tevnyT6rT02btyo1NRUjzu9B0u/ET464Mknn7zq5KDWx4kTJ7w+7rFjxzRr1iw9/fTTmjZtWhdUbq2u6rdgR791nvb25ahRo7R582b99re/VXh4uGJjY5WQkKCYmBiP0ZDuor39ZhiGsrKyNHDgQP3973/Xxx9/rB//+MeaOXOmO7R1J7783f3iiy+0e/duzZ8/36Kqu9aNVhcQyB577DGPiWdtuemmm7w65qeffqrJkydr4cKFWr58uce22NhY1dfXe7TV19fLbre7h4IDQVf027Vcrd9atwWKzuy32NhYffzxxx5t3+2TYPm8tcWbvpw7d67mzp2r+vp69e7dWyEhIXrxxRfd29vTl8Givf323nvv6e2339bXX38tu90uSVq/fr327NmjzZs368knn6TfvuO7f3cLCgoUFRWlH/3oRx7twfJ9RvjogOjoaEVHR3fa8Y4fP65JkyZp3rx5WrVq1RXbU1JStHPnTo+2PXv2KCUlpdNqMENn99v1pKSk6KmnntKFCxfUs2dPSf/pt+HDh7uHgANBZ/ZbSkqKVq1apTNnzmjgwIGS/tMndrtdI0eOdO8TDJ+3tvjSl61D23/605/Uq1cvTZ06VVL7+jJYtLffvvnmG0m6YnSoR48eunz5siT67VoMw1BBQYHuv/9+93dWq2D5PmO1i0n+9a9/GUeOHDFWrFhh9OnTxzhy5Ihx5MgRo7Gx0TAMwzh69KgRHR1tZGRkGLW1te7HmTNn3Mf45z//aYSHhxtPPPGEUV5ebqxbt8644YYbjF27dln1Y3W56/WbYRhGZWWlceTIEeOhhx4yhg0b5t6ndTb42bNnjZiYGOO+++4zjh07ZmzdutUIDw83NmzYYNWP1eWu128XL140Ro8ebUybNs0oLS01du3aZURHRxs5OTnuY3THz1tbfve73xnFxcVGRUWFsXbtWiMsLMxYs2aNe3t7+rK7+fLLL42oqChj9uzZRmlpqVFRUWE8/vjjRs+ePY3S0lLDMOi3a9m7d68hySgvL79iW7B8nxE+TDJv3jxD0hWPffv2GYbxn2WibW0fPHiwx3H27dtn3HrrrUZoaKhx0003GQUFBab/LGa6Xr8ZhmH88Ic/bHOfqqoq9z6ffPKJcdtttxk2m8343ve+Z6xevdr8H8ZE7em3zz//3Jg+fboRFhZmDBgwwHjssceMCxcueBynu33e2nLfffcZ/fv3N0JDQ42kpCTj1VdfvWKf9vRld1NUVGRMmzbN6N+/vxEREWGMHz/e2Llzp8c+9Fvb5syZY0yYMOGq24Ph+yzEMAzDrFEWAACA7jddGwAAWIrwAQAATEX4AAAApiJ8AAAAUxE+AACAqQgfAADAVIQPAABgKsIHAAAwFeEDAACYivABAABMRfgAAACm+j9VhzikKkwEAQAAAABJRU5ErkJggg==",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"con.sql(\n",
" \"\"\"\n",
" SELECT partition_cell, st_extent_agg(geom)\n",
" FROM \"buildings_by_cell/*/*.parquet\"\n",
" GROUP BY partition_cell\n",
" \"\"\"\n",
").to_pandas().plot(edgecolor=\"black\", facecolor=\"none\")\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<duckdb.duckdb.DuckDBPyConnection at 0x1295a4df0>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shutil.rmtree(\"buildings_by_sort\", ignore_errors=True)\n",
"\n",
"# This mechanism takes a file size as its only parameter\n",
"file_size_bytes = 104857600 # 100 MB\n",
"row_group_size = 122880\n",
"\n",
"con.raw_sql(f\"\"\"\n",
"COPY (\n",
" SELECT\n",
" geom\n",
" FROM 'microsoft-buildings-point.parquet'\n",
" ORDER BY geom.st_aswkb().s2_cellfromwkb()\n",
") TO \"buildings_by_sort\" WITH (\n",
" FORMAT PARQUET,\n",
" -- Can also specify in bytes via ROW_GROUP_SIZE_BYTES\n",
" ROW_GROUP_SIZE {row_group_size},\n",
" FILE_SIZE_BYTES {file_size_bytes}\n",
")\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perfect file size partitioning:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_0.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_1.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 102M Dec 5 16:01 buildings_by_sort/data_10.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 102M Dec 5 16:01 buildings_by_sort/data_11.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_12.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_13.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_14.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_15.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_16.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_17.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_18.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 50M Dec 5 16:01 buildings_by_sort/data_19.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_2.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 102M Dec 5 16:01 buildings_by_sort/data_3.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_4.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 101M Dec 5 16:01 buildings_by_sort/data_5.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_6.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_7.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_8.parquet\n",
"-rw-r--r-- 1 deweydunnington staff 100M Dec 5 16:01 buildings_by_sort/data_9.parquet\n"
]
}
],
"source": [
"! ls -lh buildings_by_sort/** | grep -e \".parquet\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Terrible spatial partitioning:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: >"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAEBCAYAAADYacenAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAmXUlEQVR4nO3dfXBU13nH8R8yaEFYEiALvRSJkV8AGaxkUDtYjkMxwlAg1LWh4zFW/FIMDsWuDXaC1bBxsMyIvDQ2rWFDEwrOCIXWGjNtYyrGEMCNA7EqIYOxLGQqV/JIAtUDkkxs8Xb7R2a3WrQr7eu9d3e/n5kdae89e+6zZ1e7j84959wRhmEYAgAAMEmS1QEAAIDEQvIBAABMRfIBAABMRfIBAABMRfIBAABMRfIBAABMRfIBAABMRfIBAABMNdLqAK537do1dXR0KDU1VSNGjLA6HAAAEADDMNTX16fc3FwlJQ3dt2G75KOjo0N5eXlWhwEAAELQ3t6uSZMmDVkmqOTj+9//vjZu3Oi1berUqfroo48kSV9++aWee+457dmzR/39/VqwYIG2bdumrKysgI+RmprqCT4tLS2Y8AAAgEV6e3uVl5fn+R4fStA9H9OnT9eBAwf+v4KR/1/F2rVr9dZbb+mNN95Qenq6nnrqKT3wwAN69913A67ffaolLS2N5AMAgBgTyJCJoJOPkSNHKjs7e9D2np4e7dixQ9XV1Zo7d64kaefOnSosLNSxY8d05513BnsoAAAQh4Ke7dLS0qLc3FzdfPPNevjhh9XW1iZJqq+v1+XLlzVv3jxP2WnTpik/P19Hjx71W19/f796e3u9bgAAIH4FlXzMmjVLu3btUm1trVwul1pbW/X1r39dfX196urqUnJyssaNG+f1mKysLHV1dfmts7KyUunp6Z4bg00BAIhvQZ12Wbhwoef3oqIizZo1S5MnT9a//Mu/aMyYMSEFUF5ernXr1nnuuwesAACA+BTWImPjxo3TlClT9PHHHys7O1uXLl3ShQsXvMqcPXvW5xgRN4fD4RlcyiBTAADiX1jJx+eff64zZ84oJydHxcXFGjVqlA4ePOjZ39zcrLa2NpWUlIQdKAAAiA9BnXZ5/vnntWTJEk2ePFkdHR168cUXdcMNN+ihhx5Senq6VqxYoXXr1mnChAlKS0vT008/rZKSkriY6dLS0qK+vj6rwwBgkdTUVN12221WhwHEhaCSj08//VQPPfSQPvvsM2VmZuruu+/WsWPHlJmZKUl65ZVXlJSUpKVLl3otMhbrWlpaNGXKFKvDAGCx06dPk4AAETDCMAzD6iAG6u3tVXp6unp6emwz/qOhoUHFxcWqqqpSYWGh1eEAMFlTU5PKyspUX1+vmTNnWh0OYEvBfH/b7tou0RbK6ZOmpiaf29va2nTx4sVIhAXAxtzLBfj7LADsyM6nChOq54PTJwCARGLmqUJ6Pvxw93gEe/rE3eU68HHubRUVFSooKIhKvACs19raKqfTKSn4zw7AKu7vKLtOlEio5MOtsLAwqPO27iXkfSkoKODDCIBl7Ny1DviTkMlHMFpaWnT//fdLksrKygbt97UNQHyy6987s3AQaxI2+Qh04OnAAWYDT7G4u2I57QLEt4GnXez29+6O7b333rNt9zqs4f7u8jdI2uoes4QacOqeMrt3715PbwYAAIko0j1mDDgdhnt6bCCDx9yDdq4v72sQKoD44+8zwA74HII/Q7037DAYNSGTD7dgB576Kh9sHQBil13/3u0aF6xn1/dGWBeWAwAACFZC93wMNYXWbeBgHV+/s+IhYF9WD6oD4FtCJx/BDjplqi0Qe5iGCthPQicf0vADyBhwCsQmOwyqA+BbwicfwQzGYcApAADhY8ApAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwFckHAAAwVVjJx+bNmzVixAg9++yznm1z5szRiBEjvG7f+ta3wo0TAADEiZGhPrCurk7bt29XUVHRoH0rV67USy+95LmfkpIS6mEAAECcCann4/PPP9fDDz+sn/3sZxo/fvyg/SkpKcrOzvbc0tLSwg4UAADEh5B6PtasWaPFixdr3rx5evnllwft3717t6qqqpSdna0lS5bI6XT67f3o7+9Xf3+/535vb28oIcHmdu3apU8//dTqMEzz2Wefeb2vYb7u7m5JUmVlpTIzM8OuJxJ1RVqknmO0OBwOZWRkDNo+adIkPfbYY+YHBNsIOvnYs2ePGhoaVFdX53P/8uXLNXnyZOXm5urEiRNav369mpub9eabb/osX1lZqY0bNwYbBmLIrl279Pjjj1sdBhJUTU2NLeuKJLvGNRwSkMQVVPLR3t6uZ555Rm+//bZGjx7ts8yqVas8v99xxx3KyclRaWmpzpw5o1tuuWVQ+fLycq1bt85zv7e3V3l5ecGEBZtz93iUlpZqzpw51gZjgo6ODrlcLi1btmzY/0YvXLigX/7ylyZFBtjH+++/b3UIsFBQyUd9fb3OnTunmTNnerZdvXpV77zzjl577TX19/frhhtu8HrMrFmzJEkff/yxz+TD4XDI4XCEEjtizJw5c7Rhwwarw4i6hoYGuVwulZeXe/2t+Cv7y1/+UlVVVSosLDQpwsTQ1NSksrKysNvWXY8k271OkXqO0eAvtsrKStXU1HBaMsEFlXyUlpbq5MmTXtsef/xxTZs2TevXrx+UeEhSY2OjJCknJyf0KIE4V1hYOGyigtBEsm3t+jrZNS5pcGx2HJsC8wWVfKSmpmrGjBle28aOHauMjAzNmDFDZ86cUXV1tRYtWqSMjAydOHFCa9eu1ezZs31OyQUAAIkn5HU+fElOTtaBAwf06quv6uLFi8rLy9PSpUsToqsdAFpaWtTX12fa8Zqamrx+2om/2NwzdLq7u9XQ0GB6XLEiNTVVt912m9VhRE3Yycfhw4c9v+fl5enIkSPhVgkAMaelpUVTpkyx5NjuMSl25C+2mpqamJ2lY5bTp0/HbQIS0Z4PAEhU7h4PMwd/xvKA02XLlqm8vNzCCKW2tjZdvHjR0hh8aW1tldPpNLUXzWwkHwAQQVYM/ozFAaeZmZmWxtzS0qL777/fsuMHoq2tzbava7hIPgAACceKnqpA7du3T06n05a9MpFC8mETZg9UM1NHR4fnZyIMMAtmEKCdBwyGww7d2a2trZL+8EEeTvu66xmurkgdL5TYzDxmoPzFdvr0aUnWDzh1x2THXiO7vZZRYdhMT0+PIcno6emJeN319fWGJKOqqsqQZEgy6uvrA3rM9WXd24d7fCBOnz7tOQY3bty4cTPvtnfv3rA/wyPN/R1VVVUVch1DfUdF8vtroGC+v+n5sAE7d/9Fws9//nO5XC6tXr1aTzzxhNXhRF0wgwDtPGAwVO7nVFFRoYKCAsvicA/aCzcOdz2ShqwrUscLJTar29oXf7Ft2bJFdXV1Ki0t1Q9/+EPL4kuEUxt2RvJhI3bs/ouE3Nxcz894fH7+BPN6xuNrv2jRIkufU0NDg5xOZ9hxuOuRpIKCgmGTxEDKRJoVxwzU9bGlpaVZGM1gra2tg07/xPsaG3ZA8gEAAQpkPQ0r1tyIxXU+Dh48qOLiYpOjGczpdHqSy4HieY0NOyD5AIAAVVRUaNGiRT73WXEKzc6n7fzF9p3vfEcHDx60zWmX619Td9zxOgHALkg+ACBABQUFw57CYZ0Pb9fHNn78eM9PK2N2zygJ5DVF5CVZHQAAAEgsJB8AAMBUJB8AAMBUJB8AAMBUJB8AAMBUJB8AAMBUJB8AAMBUJB8AAMBUJB8AAMBUJB8AAMBULK8OAEFoaWnxed0P93Ld7p9msOKYgfIX2/nz5z0/r7+arJlaW1s9PwfGMVSbcrXbyCH5AIAAdXV1acqUKUOW4aq23mL1qrb+4uZqt5FB8gEAAfriiy8kyedVZLmqrbdYuartihUrdM8993i2t7a2eq52W1BQMGj7e++9F/Ur3rp7ZeIZyQcABGmoq8gGcoVZf6duYJ5x48ZJknbs2KEdO3YM2u+rN0Qyt5dp7Nixph3LbCQfAGCilpaWYU/dBIvTLub58Y9/7NVTEg3uXqP8/PyoHsdKJB8AYCJ3j0ckTpVw2iXy8fnb7j5Nk52dPWzPFobHVFsAsEBhYaFSU1OtDgPDSE1N1cyZM73GfyB89HwAQIA6Ojok+Z6GGei0V/f+Q4cO6fnnn49IXJx2CZ2/+AZuP336tFnhJIywko/NmzervLxczzzzjF599VVJ0pdffqnnnntOe/bsUX9/vxYsWKBt27YpKysrEvECgGVcLpekob/sA00EIpV4IDp+/OMfKzs7W2VlZQwOjoKQk4+6ujpt375dRUVFXtvXrl2rt956S2+88YbS09P11FNP6YEHHtC7774bdrAAYKVly5appqYmrKm27nIVFRVyOp1hjddgzEfk4xs4tsNubRpPQko+Pv/8cz388MP62c9+ppdfftmzvaenRzt27FB1dbXmzp0rSdq5c6cKCwt17Ngx3XnnnZGJGgAskJmZKSn8qbaSPGMIAi0/lEjUES3XxzZ+/HjPTzvEfH18dlwtNh6FNOB0zZo1Wrx4sebNm+e1vb6+XpcvX/baPm3aNOXn5+vo0aM+6+rv71dvb6/XDQAAxK+gez727NmjhoYG1dXVDdrX1dWl5ORkz+ItbllZWerq6vJZX2VlpTZu3BhsGAAAIEYF1fPR3t6uZ555Rrt379bo0aMjEkB5ebl6eno8t/b29ojUCwAA7Cmo5KO+vl7nzp3TzJkzNXLkSI0cOVJHjhzR3//932vkyJHKysrSpUuXdOHCBa/HnT17VtnZ2T7rdDgcSktL87oBAID4FdRpl9LSUp08edJr2+OPP65p06Zp/fr1ysvL06hRo3Tw4EEtXbpUktTc3Ky2tjaVlJRELmoAABCzgko+UlNTNWPGDK9tY8eOVUZGhmf7ihUrtG7dOk2YMEFpaWl6+umnVVJSwkwXAAAgKQornL7yyitKSkrS0qVLvRYZAwAAkCKQfBw+fNjr/ujRo7V161Zt3bo13KoBAEAc4sJyAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCQfAADAVCOtDgCIVw0NDcOWaWpq8voZD+zynCIVx8DHd3d3+60z0OO597e2toYdn13a2hd/sZ0/f96KcGAzJB9AhOXk5EiSVq5cGfBjysrKohWOZezynCIZR01NzbB1Bno8p9MZVPmh2KWtffEXW3JyssmRwE5IPoAIy8nJUUdHhzo7O4ct29TUpLKyMlVVVamwsNCE6KLPLs8pUnG465GkZcuWqaamxmedgR7PXa6iokJOpzOs+OzS1r74i62yslI1NTVKT0+3MDpYjeQDiIKcnBxPD0ggCgsLNXPmzChGZD67PKdIxpGZmTlsnYEer6CgIGLx2aWtfbk+NncbIrEx4BQAAJiK5AMAAJiK0y6ADQQyMyZW2GUGBrNdrOUvNncbIrEFlXy4XC65XC598sknkqTp06fre9/7nhYuXChJmjNnjo4cOeL1mCeffFI//elPIxMtEGdCmRkTK+wyA4PZLtbyF5vD4TA5EthJUMnHpEmTtHnzZt12220yDEOvv/667rvvPh0/flzTp0+X9IcP0ZdeesnzmJSUlMhGDMSRYGbGxAq7zMBgtou1/MX285//XC6XSxkZGRZGB6sFlXwsWbLE6/6mTZvkcrl07NgxT/KRkpKi7OzsyEUIxLlgZ8bECrvMwGC2i7Wujy03N9fCaGAXIQ84vXr1qvbs2aOLFy+qpKTEs3337t266aabNGPGDJWXl+v3v//9kPX09/ert7fX6wYAAOJX0ANOT548qZKSEn355Ze68cYbtXfvXt1+++2SpOXLl2vy5MnKzc3ViRMntH79ejU3N+vNN9/0W19lZaU2btwY+jMAAAAxJejkY+rUqWpsbFRPT49qamr06KOP6siRI7r99tu1atUqT7k77rhDOTk5Ki0t1ZkzZ3TLLbf4rK+8vFzr1q3z3O/t7VVeXl4ITwUAAMSCoJOP5ORk3XrrrZKk4uJi1dXVacuWLdq+ffugsrNmzZIkffzxx36TD4fDwahnAAASSNiLjF27dk39/f0+9zU2NkpSXA6mAwAAoQmq56O8vFwLFy5Ufn6++vr6VF1drcOHD2v//v06c+aMqqurtWjRImVkZOjEiRNau3atZs+eraKiomjFDwAAYkxQyce5c+f0yCOPqLOzU+np6SoqKtL+/ft17733qr29XQcOHNCrr76qixcvKi8vT0uXLtWGDRuiFTtizLFjx7R7926rw0CUuVfu3Ldvn6Urb0YqDnc9knT69Gm/dQZ6PHe5d999N+z47NLWvvhbwbWjo8OKcGAzQSUfO3bs8LsvLy9v0OqmgCTPeJ+33npLb731lsXRwCzuFTytFsk4Dh48OGydgR7P5XIFVX4odmlrX/ytcDpmzBiTI4GdcG0XRN3UqVMlSRUVFZ6FlRC/Wltb5XQ6LX+9IxWHux5JKi0t1cGDB33WGejx3OVWr14tl8sVVnx2aWt/xo4dq/z8fK9t+/btk9PpZDHKBEfyAdMsWrTItqswInIaGhrkdDotf70jFYe7HkmaMmWKDh486LPOQI/nLve1r31NLpcrrPjs0tbBsNvpIVgj7NkuAAAAwSD5AAAApiL5AAAApiL5AAAApmLAKUzT0NBgdQgwgXtAodUDCyMVx8DHd3d3+60z0OO59/tbByOU2Kxu62AMXDcFiYvkA1HnXl5/5cqVFkcCM/lb38FskYyjpqZm2DoDPZ57Bk0k4rNLWwdj7NixVocAC5F8IOpycnLU0dGhzs5Oq0OBCZqamlRWVqaqqioVFhbGfBzueiRp2bJlqqmp8VlnoMdzl6uoqJDT6QwrPru0dTDcMV+//gcSC8kHTJGTk8MFBhNMYWGhLdaeiGQcmZmZw9YZ6PHci4JFIj67tDUQKAacAgAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAUwWVfLhcLhUVFSktLU1paWkqKSnRf/zHf3j2f/nll1qzZo0yMjJ04403aunSpTp79mzEgwYAALErqORj0qRJ2rx5s+rr6/Vf//Vfmjt3ru677z6dOnVKkrR27Vr9+7//u9544w0dOXJEHR0deuCBB6ISOAAAiE0jgym8ZMkSr/ubNm2Sy+XSsWPHNGnSJO3YsUPV1dWaO3euJGnnzp0qLCzUsWPHdOedd0YuagAAELNCHvNx9epV7dmzRxcvXlRJSYnq6+t1+fJlzZs3z1Nm2rRpys/P19GjR/3W09/fr97eXq8bAACIX0EnHydPntSNN94oh8Ohb33rW9q7d69uv/12dXV1KTk5WePGjfMqn5WVpa6uLr/1VVZWKj093XPLy8sL+kkAAIDYEXTyMXXqVDU2Nup3v/udVq9erUcffVQffvhhyAGUl5erp6fHc2tvbw+5LgAAYH9BjfmQpOTkZN16662SpOLiYtXV1WnLli168MEHdenSJV24cMGr9+Ps2bPKzs72W5/D4ZDD4Qg+cgAAEJOCTj6ud+3aNfX396u4uFijRo3SwYMHtXTpUklSc3Oz2traVFJSEnagiaChocHqEICwNTU1ef2M9TgGPr67u9tvnYEez72/tbU17Pjs0tbBiKVYET1BJR/l5eVauHCh8vPz1dfXp+rqah0+fFj79+9Xenq6VqxYoXXr1mnChAlKS0vT008/rZKSEma6DCMnJ0eStHLlSosjASKnrKzM6hAkRTaOmpqaYesM9HhOpzOo8kOxS1sHIzU11eoQYKGgko9z587pkUceUWdnp9LT01VUVKT9+/fr3nvvlSS98sorSkpK0tKlS9Xf368FCxZo27ZtUQk8nuTk5Kijo0OdnZ1WhwKErampSWVlZaqqqlJhYWHMx+GuR5KWLVummpoan3UGejx3uYqKCjmdzrDis0tbBys1NVW33Xab1WHAQkElHzt27Bhy/+jRo7V161Zt3bo1rKDsauBpkVjs7gQQnszMTElSYWGhZs6c6bPMUPsGKigoCKr8UCJRB2CmsMd8xLrhxlkMTC58nRaJxe5OwAx2+duIZBwMjgciI+GTD8ZZAPFt7969ys/PD/nxA0+7ZGRkRCosIKEldPJRW1vr6Ub1x/3Bs2HDBk2bNs2zvbW1VU6nUxUVFZ7uUwD2+tsYO3ZsWInH9To6OiTZb7YLEGsSOvnIzMwc9jypeybKyy+/7HO/e8Q6AG/x+Lfhcrkk2W+2CzNHEGsSOvkIhL+ZKP56RIBEZ6eej0hwPx9JWr16tVwul21mu0jMHEFsIvkIQE5OjqcHZOA2yX+PCJDo4rHnIzc3V5L9ZrsAsYbkI0SszQH4FqtrTwzl0KFDev75560OA4gbJB9h8NUjAuAP4uk/egZ2ApEV9FVtAQAAwkHPBwAE6JNPPpFkn6m2CJ2/18v9uiC6SD4AYBg33XSTpP+/xITdptoidP7av6ury/N7U1OTJylpbW2N+hXIEyEhJfkAgGG4FyMcanpsd3e3/uzP/syK8BAFAwcYD0xQnE6naTO54nn9FpIPAAjQcNNjA5kBF+l1PhAaf7OyBm6X5Pl94Po1ixYtinp88b5+C8kHAERIMDPgWOfDHvy1/8CEZODvBQUFvF4RwGwXAABgKpIPAABgKpIPAABgKsZ8AECQwplqyTof9uBvnQ9f2wdOtUVkkHwAQIDc632sXLky7LpY58Me/LX/wO0Dfx87dmzUY0oEJB8AEKDMzMywLyjJVFt7CHaqrfv3/Px8S+KNNyQfABCESF1Qkqm29hDsVFtEBgNOAQCAqej5AAALfPTRR5IYcGqVYAecIrISOvmI9sWBgEQU6NVdY0kkn1N3d7ck6eWXX5bEgFOrBTvgtK2tLeoxRcJQ71k7/G2OMAzDsDqIgXp7e5Wenq6enh6lpaVFtO6GhgYVFxertraWC0ABABLa6dOnI3r9mGC+vxOy56O9vV21tbX63//9X6tDAeLOwAtwuQdVxrpoPKd4bKdY4q/9B26XFLOvkft5+JtNZfWF6xKq56Ozs1O5ubkRrRMAALuKdO/GUOj58CMnJyfsOfoAhuZv/YRYFo3nFI/tFEuCXecjFl8jq3s3hhJU8lFZWak333xTH330kcaMGaO77rpLP/jBDzR16lRPmTlz5ujIkSNej3vyySf105/+NDIRhylSc/QBDC0e16+IxnP64osvIlofIi8e38tWCyr5OHLkiNasWaM/+ZM/0ZUrV/S3f/u3mj9/vj788EOvJWdXrlypl156yXM/JSUlchEDQBxw/xMUiaXaEbpAZrukpqaaFU7CCCr5qK2t9bq/a9cuTZw4UfX19Zo9e7Zne0pKirKzsyMTIQDEIU4DWyuQ0y6FhYW2PnURy8Ia89HT0yNJmjBhgtf23bt3q6qqStnZ2VqyZImcTqff3o/+/n719/d77vf29oYTEgDEDE4DW2+o5dU51RI9IScf165d07PPPquvfe1rmjFjhmf78uXLNXnyZOXm5urEiRNav369mpub9eabb/qsp7KyUhs3bgw1DAAAEGNCTj7WrFmjDz74QL/5zW+8tq9atcrz+x133KGcnByVlpbqzJkzuuWWWwbVU15ernXr1nnu9/b2Ki8vL9SwAACAzYWUfDz11FP61a9+pXfeeUeTJk0asuysWbMkSR9//LHP5MPhcMjhcIQSBgAbi6fLF8TjkvGJjtfSWkElH4Zh6Omnn9bevXt1+PDhgFZ8a2xslCTOawIJIp5ncXAdlvhz/bVaSErMEVTysWbNGlVXV+tf//VflZqaqq6uLklSenq6xowZozNnzqi6ulqLFi1SRkaGTpw4obVr12r27NkqKiqKyhMAYC/xOIuju7ub60HFqfvvv9/ndqbXRldQy6uPGDHC5/adO3fqscceU3t7u8rKyvTBBx/o4sWLysvL0/33368NGzYEvFR6NJdXB4BQdXZ2xlVCBf+YXhuaYL6/E+raLgAAIDqC+f5OMikmAAAASSQfAADAZCQfAADAVCQfAADAVCQfAADAVGFdWC4a3JNvuMAcAACxw/29HcgkWtslH319fZLE9V0AAIhBfX19Sk9PH7KM7db5uHbtmjo6OpSamup3UbN44r6QXnt7O+uaBIF2Cw3tFhraLTS0W2hitd0Mw1BfX59yc3OVlDT0qA7b9XwkJSUNe7G6eJSWlhZTbzK7oN1CQ7uFhnYLDe0Wmlhst+F6PNwYcAoAAExF8gEAAExF8mExh8OhF198UQ6Hw+pQYgrtFhraLTS0W2hot9AkQrvZbsApAACIb/R8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8AAAAU5F8mGTTpk266667lJKSonHjxg3a//777+uhhx5SXl6exowZo8LCQm3ZsmVQucOHD2vmzJlyOBy69dZbtWvXrugHb6Hh2k2S/uZv/kbFxcVyOBz66le/6rPMiRMn9PWvf12jR49WXl6efvjDH0YvaBsIpN3a2tq0ePFipaSkaOLEifr2t7+tK1eueJVJtPebLw0NDbr33ns1btw4ZWRkaNWqVfr888+9ygTSlonm9OnTuu+++3TTTTcpLS1Nd999tw4dOuRVhnbzdvjwYY0YMcLnra6uzlMuHj7PSD5McunSJf3lX/6lVq9e7XN/fX29Jk6cqKqqKp06dUrf/e53VV5ertdee81TprW1VYsXL9Y999yjxsZGPfvss3riiSe0f/9+s56G6YZrN7e/+qu/0oMPPuhzX29vr+bPn6/Jkyervr5eP/rRj/T9739f//iP/xiNkG1huHa7evWqFi9erEuXLum3v/2tXn/9de3atUvf+973PGUS8f12vY6ODs2bN0+33nqrfve736m2tlanTp3SY4895ikTSFsmom984xu6cuWKfv3rX6u+vl5f+cpX9I1vfENdXV2SaDdf7rrrLnV2dnrdnnjiCRUUFOiP//iPJcXR55kBU+3cudNIT08PqOxf//VfG/fcc4/n/ne+8x1j+vTpXmUefPBBY8GCBZEM0ZYCabcXX3zR+MpXvjJo+7Zt24zx48cb/f39nm3r1683pk6dGuEo7cdfu+3bt89ISkoyurq6PNtcLpeRlpbmaadEfr+5bd++3Zg4caJx9epVz7YTJ04YkoyWlhbDMAJry0TT3d1tSDLeeecdz7be3l5DkvH2228bhkG7BeLSpUtGZmam8dJLL3m2xcvnGT0fNtbT06MJEyZ47h89elTz5s3zKrNgwQIdPXrU7NBiytGjRzV79mwlJyd7ti1YsEDNzc06f/68hZFZ5+jRo7rjjjuUlZXl2bZgwQL19vbq1KlTnjKJ/n7r7+9XcnKy1xU6x4wZI0n6zW9+Iymwtkw0GRkZmjp1qn7xi1/o4sWLunLlirZv366JEyequLhYEu0WiH/7t3/TZ599pscff9yzLV4+z0g+bOq3v/2t/vmf/1mrVq3ybOvq6vL6Q5WkrKws9fb26osvvjA7xJjhr93c+xJRIG3C+02aO3euurq69KMf/UiXLl3S+fPn9cILL0iSOjs7JfH+8mXEiBE6cOCAjh8/rtTUVI0ePVo/+clPVFtbq/Hjx0ui3QKxY8cOLViwwOtK7/HSbiQfYXjhhRf8Dg5y3z766KOg6/3ggw9033336cUXX9T8+fOjELm1otVu8Y52i5xA23L69Ol6/fXX9Xd/93dKSUlRdna2CgoKlJWV5dUbkigCbTfDMLRmzRpNnDhR//mf/6n33ntPf/EXf6ElS5Z4krZEEsrf7qeffqr9+/drxYoVFkUdXSOtDiCWPffcc14Dz3y5+eabg6rzww8/VGlpqVatWqUNGzZ47cvOztbZs2e9tp09e1ZpaWmeruBYEI12G4q/dnPvixWRbLfs7Gy99957Xtuub5N4eb/5EkxbLl++XMuXL9fZs2c1duxYjRgxQj/5yU88+wNpy3gRaLv9+te/1q9+9SudP39eaWlpkqRt27bp7bff1uuvv64XXniBdrvO9X+7O3fuVEZGhv78z//ca3u8fJ6RfIQhMzNTmZmZEavv1KlTmjt3rh599FFt2rRp0P6SkhLt27fPa9vbb7+tkpKSiMVghki323BKSkr03e9+V5cvX9aoUaMk/aHdpk6d6ukCjgWRbLeSkhJt2rRJ586d08SJEyX9oU3S0tJ0++23e8rEw/vNl1Da0t21/U//9E8aPXq07r33XkmBtWW8CLTdfv/730vSoN6hpKQkXbt2TRLtNhTDMLRz50498sgjns8st3j5PGO2i0n+53/+xzh+/LixceNG48YbbzSOHz9uHD9+3Ojr6zMMwzBOnjxpZGZmGmVlZUZnZ6fndu7cOU8d//3f/22kpKQY3/72t42mpiZj69atxg033GDU1tZa9bSibrh2MwzDaGlpMY4fP248+eSTxpQpUzxl3KPBL1y4YGRlZRnf/OY3jQ8++MDYs2ePkZKSYmzfvt2qpxV1w7XblStXjBkzZhjz5883GhsbjdraWiMzM9MoLy/31JGI7zdf/uEf/sGor683mpubjddee80YM2aMsWXLFs/+QNoy0XR3dxsZGRnGAw88YDQ2NhrNzc3G888/b4waNcpobGw0DIN2G8qBAwcMSUZTU9OgffHyeUbyYZJHH33UkDTodujQIcMw/jBN1Nf+yZMne9Vz6NAh46tf/aqRnJxs3HzzzcbOnTtNfy5mGq7dDMMw/vRP/9RnmdbWVk+Z999/37j77rsNh8Nh/NEf/ZGxefNm85+MiQJpt08++cRYuHChMWbMGOOmm24ynnvuOePy5cte9STa+82Xb37zm8aECROM5ORko6ioyPjFL34xqEwgbZlo6urqjPnz5xsTJkwwUlNTjTvvvNPYt2+fVxnazbeHHnrIuOuuu/zuj4fPsxGGYRhm9bIAAAAk3nBtAABgKZIPAABgKpIPAABgKpIPAABgKpIPAABgKpIPAABgKpIPAABgKpIPAABgKpIPAABgKpIPAABgKpIPAABgqv8D2BB2S7A+pPAAAAAASUVORK5CYII=",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"con.sql(\n",
" \"\"\"\n",
" SELECT filename, st_extent_agg(geom)\n",
" FROM read_parquet(\"buildings_by_sort/*.parquet\", filename = true)\n",
" GROUP BY filename\n",
" \"\"\"\n",
").to_pandas().plot(edgecolor=\"black\", facecolor=\"none\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment