Skip to content

Instantly share code, notes, and snippets.

@rjzamora
Last active May 18, 2023 14:58
Show Gist options
  • Save rjzamora/63c88b56383b7c58f9bcc808ffe2f211 to your computer and use it in GitHub Desktop.
Save rjzamora/63c88b56383b7c58f9bcc808ffe2f211 to your computer and use it in GitHub Desktop.
Demo: Dask Expressions
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "9bbe3374-c6d7-4b2c-a55f-1290067c6ae5",
"metadata": {},
"source": [
"# Dask Expressions (`dask_expr`) Demo\n",
"\n",
"**Author**: Rick Zamora\n",
"\n",
"**Date**: May 18th 2023 (May 2023 Dask Demo Day)\n",
"\n",
"Some content taken from: https://github.com/mrocklin/dask-expr/blob/main/demo.ipynb"
]
},
{
"cell_type": "markdown",
"id": "da462c70-cd08-469b-bee5-d348e6c31b69",
"metadata": {},
"source": [
"## Motivation"
]
},
{
"cell_type": "markdown",
"id": "15074719-7abe-4386-bccc-f24f9f6be76a",
"metadata": {},
"source": [
"### High-level graph primer\n",
"\n",
"The `HighLevelGraph` concept was introduced to `dask/dask` a few years ago to delay low-level graph materialization until `compute`/`persist` time.\n",
"The current `dask.dataframe` and `dask.array` APIs now carry around a `HighLevelGraph` object by default."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8b38395b-e1f8-4d3e-b5b1-79f402638579",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"dask.highlevelgraph.HighLevelGraph"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from dask.datasets import timeseries\n",
"\n",
"mean = timeseries()[\"x\"].mean()\n",
"type(mean.dask)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c457ac71-4856-4dc6-8cb2-17cb8e4754f6",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"576pt\" height=\"97pt\"\n",
" viewBox=\"0.00 0.00 576.00 96.94\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(0.255093 0.255093) rotate(0) translate(4 376)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-376 2254,-376 2254,4 -4,4\"/>\n",
"<!-- 5758737106806174655 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>5758737106806174655</title>\n",
"<g id=\"a_node1\"><a xlink:title=\"A Materialized Layer with 1 Tasks.&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"1374,-372 865,-372 865,-336 1374,-336 1374,-372\"/>\n",
"<text text-anchor=\"middle\" x=\"1119.5\" y=\"-349\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">series&#45;mean&#45;#0&#45;944673f378692763b15d9a33c6730feb</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- &#45;3151588475461615459 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;3151588475461615459</title>\n",
"<g id=\"a_node2\"><a xlink:title=\"A DataFrameIO Layer with 30 Tasks.&#10;Number of Partitions: 30&#10;DataFrame Type: pandas&#10;4 DataFrame Columns: [&#39;name&#39;, &#39;id&#39;, &#39;x&#39;, &#39;y&#39;]&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"1305,-52 934,-52 934,0 1305,0 1305,-52\"/>\n",
"<text text-anchor=\"middle\" x=\"1119.5\" y=\"-16\" font-family=\"Helvetica,sans-Serif\" font-size=\"40.00\">make&#45;timeseries&#45;#1</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- 9057528504350784580 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>9057528504350784580</title>\n",
"<g id=\"a_node3\"><a xlink:title=\"A Blockwise Layer with 30 Tasks.&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"1223,-140 1016,-140 1016,-88 1223,-88 1223,-140\"/>\n",
"<text text-anchor=\"middle\" x=\"1119.5\" y=\"-104\" font-family=\"Helvetica,sans-Serif\" font-size=\"40.00\">getitem&#45;#2</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- &#45;3151588475461615459&#45;&gt;9057528504350784580 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>&#45;3151588475461615459&#45;&gt;9057528504350784580</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1119.5,-52.34C1119.5,-59.79 1119.5,-68.11 1119.5,-76.13\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1116,-76.03 1119.5,-86.03 1123,-76.03 1116,-76.03\"/>\n",
"</g>\n",
"<!-- &#45;3402585408051486730 -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>&#45;3402585408051486730</title>\n",
"<g id=\"a_node4\"><a xlink:title=\"A Blockwise Layer with 30 Tasks.&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"1105,-228 0,-228 0,-176 1105,-176 1105,-228\"/>\n",
"<text text-anchor=\"middle\" x=\"552.5\" y=\"-192\" font-family=\"Helvetica,sans-Serif\" font-size=\"40.00\">series&#45;sum&#45;chunk&#45;#3&#45;94b1e1a7d7b87a87de9b45fd3f731282</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- 9057528504350784580&#45;&gt;&#45;3402585408051486730 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>9057528504350784580&#45;&gt;&#45;3402585408051486730</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1015.5,-130.77C936.24,-142.8 825.18,-159.64 731.7,-173.82\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"731.25,-170.35 721.89,-175.31 732.3,-177.27 731.25,-170.35\"/>\n",
"</g>\n",
"<!-- 2135922201381022660 -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>2135922201381022660</title>\n",
"<g id=\"a_node6\"><a xlink:title=\"A Blockwise Layer with 30 Tasks.&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"2250,-228 1123,-228 1123,-176 2250,-176 2250,-228\"/>\n",
"<text text-anchor=\"middle\" x=\"1686.5\" y=\"-192\" font-family=\"Helvetica,sans-Serif\" font-size=\"40.00\">series&#45;count&#45;chunk&#45;#4&#45;008ad9d86c011b2eb6ad69c2bc8fbc7e</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- 9057528504350784580&#45;&gt;2135922201381022660 -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>9057528504350784580&#45;&gt;2135922201381022660</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1223.5,-130.77C1302.76,-142.8 1413.82,-159.64 1507.3,-173.82\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1506.7,-177.27 1517.11,-175.31 1507.75,-170.35 1506.7,-177.27\"/>\n",
"</g>\n",
"<!-- &#45;8967004739189986595 -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>&#45;8967004739189986595</title>\n",
"<g id=\"a_node5\"><a xlink:title=\"A DataFrameTreeReduction Layer with 1 Tasks.&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"1107.5,-300 923.5,-300 923.5,-264 1107.5,-264 1107.5,-300\"/>\n",
"<text text-anchor=\"middle\" x=\"1015.5\" y=\"-277\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">series&#45;sum&#45;agg&#45;#3</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- &#45;3402585408051486730&#45;&gt;&#45;8967004739189986595 -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>&#45;3402585408051486730&#45;&gt;&#45;8967004739189986595</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M703.78,-228.49C771.96,-239.97 850.72,-253.24 912.02,-263.57\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"911.32,-267 921.76,-265.21 912.48,-260.1 911.32,-267\"/>\n",
"</g>\n",
"<!-- &#45;8967004739189986595&#45;&gt;5758737106806174655 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>&#45;8967004739189986595&#45;&gt;5758737106806174655</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1041.48,-300.48C1054.48,-309.24 1070.41,-319.96 1084.47,-329.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1082.35,-332.21 1092.6,-334.89 1086.26,-326.41 1082.35,-332.21\"/>\n",
"</g>\n",
"<!-- 6114711466219745232 -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>6114711466219745232</title>\n",
"<g id=\"a_node7\"><a xlink:title=\"A DataFrameTreeReduction Layer with 1 Tasks.&#10;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"1552,-300 1357,-300 1357,-264 1552,-264 1552,-300\"/>\n",
"<text text-anchor=\"middle\" x=\"1454.5\" y=\"-277\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">series&#45;count&#45;agg&#45;#4</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- 2135922201381022660&#45;&gt;6114711466219745232 -->\n",
"<g id=\"edge7\" class=\"edge\">\n",
"<title>2135922201381022660&#45;&gt;6114711466219745232</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1610.86,-228.43C1580.49,-238.64 1545.9,-250.27 1517,-259.99\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1516.01,-256.63 1507.65,-263.13 1518.24,-263.26 1516.01,-256.63\"/>\n",
"</g>\n",
"<!-- 6114711466219745232&#45;&gt;5758737106806174655 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>6114711466219745232&#45;&gt;5758737106806174655</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M1370.83,-300.48C1323.17,-310.44 1263.32,-322.95 1213.98,-333.26\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"1213.53,-329.78 1204.46,-335.25 1214.96,-336.63 1213.53,-329.78\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x16c5a6710>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean.dask.visualize(size=\"8\")"
]
},
{
"cell_type": "markdown",
"id": "b395dea4-1041-43d0-9d1d-5e956a447820",
"metadata": {},
"source": [
"The introduction of high-level graphs has improved graph-generation performance, and made the following optimizations more effective:\n",
"\n",
"- Column projection\n",
"- High-level task fusion\n",
"\n",
"However, column projection remains limited, and the following optimizations remain alusive:\n",
"\n",
"- predicate-pushdown\n",
"- join reordering\n",
"- statistics-based graph re-writes\n",
"\n",
"\n",
"**Why?...**"
]
},
{
"cell_type": "markdown",
"id": "c8b486bc-570f-4d93-9506-2dc2fd8118cb",
"metadata": {},
"source": [
"### Metadata management is still separate from graph generation\n",
"\n",
"Initializing a `dask.dataframe.DataFrame` object currently requires both the task graph and metadata to be computed beforehand.\n",
"\n",
"```python\n",
"class DataFrame(_Frame):\n",
"\n",
" def __init__(\n",
" self,\n",
" # Task graph\n",
" dsk: dict | HighLevelGraph,\n",
" # Metadata\n",
" name,\n",
" meta,\n",
" divisions: tuple,\n",
" ):\n",
" ...\n",
"```\n",
"\n",
"This means that it is nearly impossible to \"regenerate\" a `HighLevelGraph` object in a general way!\n",
"\n",
"The primary goal of the Dask Expressions project is to experiment with a Dask Dataframe collection, where both metadata and graph generation are managed by a **regenerable** expression object.."
]
},
{
"cell_type": "markdown",
"id": "3abe8e13-2418-495f-be9e-1f8a80675859",
"metadata": {},
"source": [
"## Dask-Expr Quick Start"
]
},
{
"cell_type": "markdown",
"id": "8b185662-2e2c-48fa-9813-f7fee8a3c90e",
"metadata": {},
"source": [
"**\"Alpha\" version**: https://github.com/mrocklin/dask-expr"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9b41e5b8-0e35-4822-99cb-097db8215fde",
"metadata": {},
"outputs": [],
"source": [
"import dask_expr as dx"
]
},
{
"cell_type": "markdown",
"id": "304c473e-0a18-4255-a6b1-2611bdb2d8ca",
"metadata": {},
"source": [
"### Creating a `dask_expr` collection\n",
"\n",
"We currently support a few different mechanisms for collection creation:\n",
"\n",
"- `from_pandas`\n",
"- `datsets.timeseries`\n",
"- `read_parquet`\n",
"- `read_csv`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "2403e22e-8da0-47d6-b323-f1812c258039",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"<dask_expr.expr.DataFrame: expr=Timeseries(2ea1896)>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts = dx.datasets.timeseries()\n",
"ts"
]
},
{
"cell_type": "markdown",
"id": "e89b5f56-e7fa-4052-9595-f7afe4d6f45c",
"metadata": {},
"source": [
"Like `dask.dataframe`, one can use methods like `compute`/`persist`/`head` to execute the underlying task graph."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6d5b02e4-64d6-43f1-b933-b69ede3f7435",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>id</th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" </tr>\n",
" <tr>\n",
" <th>timestamp</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2000-01-01 00:00:00</th>\n",
" <td>George</td>\n",
" <td>1024</td>\n",
" <td>0.499962</td>\n",
" <td>0.499962</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000-01-01 00:00:01</th>\n",
" <td>Jerry</td>\n",
" <td>964</td>\n",
" <td>-0.431755</td>\n",
" <td>-0.431755</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000-01-01 00:00:02</th>\n",
" <td>Alice</td>\n",
" <td>1045</td>\n",
" <td>-0.683054</td>\n",
" <td>-0.683054</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000-01-01 00:00:03</th>\n",
" <td>Hannah</td>\n",
" <td>943</td>\n",
" <td>-0.284236</td>\n",
" <td>-0.284236</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000-01-01 00:00:04</th>\n",
" <td>Bob</td>\n",
" <td>1051</td>\n",
" <td>0.777228</td>\n",
" <td>0.777228</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name id x y\n",
"timestamp \n",
"2000-01-01 00:00:00 George 1024 0.499962 0.499962\n",
"2000-01-01 00:00:01 Jerry 964 -0.431755 -0.431755\n",
"2000-01-01 00:00:02 Alice 1045 -0.683054 -0.683054\n",
"2000-01-01 00:00:03 Hannah 943 -0.284236 -0.284236\n",
"2000-01-01 00:00:04 Bob 1051 0.777228 0.777228"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts.head()"
]
},
{
"cell_type": "markdown",
"id": "866ef2d8-782c-4d24-a1fc-bbc4f834ab0a",
"metadata": {},
"source": [
"Let's convert to Parquet for demonstration purposes."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4eca501d-e140-4014-b18a-73ba2f50c0d8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<dask_expr.expr.DataFrame: expr=ReadParquet(47ba38c)>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ts.compute().to_parquet(\"demo_parquet\", row_group_size=1_296_000)\n",
"#df = dx.read_parquet(\"demo_parquet\", index=\"timestamp\", calculate_divisions=True, split_row_groups=True)\n",
"df = dx.read_parquet(\"demo_parquet\", blocksize=\"1MiB\")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "e4bc9497-d315-45b3-9534-7377ae54719b",
"metadata": {},
"source": [
"Note that output IO (e.g. `to_parquet`) [has not been implemented just yet](https://github.com/mrocklin/dask-expr/issues/71). "
]
},
{
"cell_type": "markdown",
"id": "e10082f7-7044-459f-b845-f258f181958c",
"metadata": {},
"source": [
"## Basic Class structure"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5ffd9a54-0a43-474a-8946-3b1a60e09f47",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"<dask_expr.expr.Series: expr=ReadParquet(47ba38c)['x'] + 1>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser = (df.x + 1)\n",
"ser"
]
},
{
"cell_type": "markdown",
"id": "0bd2ebf0-7d3d-4fd2-8ad6-52440f777d09",
"metadata": {},
"source": [
"`Dataframe`/`Series`/`Index` objects live in `dask_expr/collections.py`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "68270fa0-de49-4860-ba05-8faad996e9fd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dask_expr.collection.Series"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(ser)"
]
},
{
"cell_type": "markdown",
"id": "3cdd6d94-3589-444d-8b0c-ef88419f513c",
"metadata": {},
"source": [
"Collections used to hold `_meta`, `divisions`, `_name`, and `__dask_graph__`. Now they hold just `expr`, which computes these things based on user inputs."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "6d918536-fac4-4e7c-8b98-35b3cccda9ff",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'_expr': ReadParquet(47ba38c)['x'] + 1}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser.__dict__"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "35355ba2-3fc8-442f-baa0-09a3c1cc651c",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"ReadParquet(47ba38c)['x'] + 1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser.expr"
]
},
{
"cell_type": "markdown",
"id": "a290cb24-9d27-40a0-b16a-178d9cd4f923",
"metadata": {},
"source": [
"Expressions have a type hierarchy which reflects user commands"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b5e2a83e-e73d-4731-a8bf-66315f471ffc",
"metadata": {},
"outputs": [],
"source": [
"expr = ser.expr"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3878fb5d-405e-4beb-abb6-ffce08627a2a",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"dask_expr.expr.Add"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(expr)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c7cda078-2825-4494-864f-7dd642b2d0b5",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[dask_expr.expr.Add,\n",
" dask_expr.expr.Binop,\n",
" dask_expr.expr.Elemwise,\n",
" dask_expr.expr.Blockwise,\n",
" dask_expr.expr.Expr,\n",
" object]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(expr).mro()"
]
},
{
"cell_type": "markdown",
"id": "f9a7b00b-4f9c-4d14-a65c-8d3912d813ff",
"metadata": {},
"source": [
"Expressions are composed of a *type* or *Operation* (like `Add`) and *operands*, (like `left` and `right`)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "81f2e534-ad51-4342-af45-9fb43385405a",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"ReadParquet(47ba38c)['x']"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expr.left"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "75498d83-3e35-4140-a5df-87d0dd00b5d5",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"dask_expr.expr.Projection"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(expr.left)"
]
},
{
"cell_type": "markdown",
"id": "50de110c-33af-44e5-8325-dca4a2c848e6",
"metadata": {},
"source": [
"Since the operands of an `Expr` may include other `Expr` objects, an `Expr` typically corresponds to an expression DAG."
]
},
{
"cell_type": "markdown",
"id": "988abc5b-e068-496f-b72b-6f11e7d9afc8",
"metadata": {},
"source": [
"## Expression Visualization"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "a7665e2d-ab78-4fe0-b20a-5ee7790c15ad",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"ser = df.x[df.id == 1000] + 1"
]
},
{
"cell_type": "markdown",
"id": "1f5097a8-fe88-4b03-9fce-1a896ebb2171",
"metadata": {},
"source": [
"### Pretty Print (`pprint`)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ccb03266-1eaf-4786-8834-ef69355d6cc5",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Add: right=1\n",
" Filter:\n",
" Projection: columns='x'\n",
" ReadParquet: path='demo_parquet' blocksize='1MiB' kwargs={'dtype_backend': None}\n",
" EQ: left=1000\n",
" Projection: columns='id'\n",
" ReadParquet: path='demo_parquet' blocksize='1MiB' kwargs={'dtype_backend': None}\n"
]
}
],
"source": [
"ser.pprint()"
]
},
{
"cell_type": "markdown",
"id": "398aa703-d423-423b-bda1-f22e5e45f8b4",
"metadata": {},
"source": [
"### GraphViz (visualize)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "129c4ab1-0e4e-40fa-8e09-ce0cb6f9df38",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"488pt\" height=\"332pt\"\n",
" viewBox=\"0.00 0.00 488.00 332.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 328)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-328 484,-328 484,4 -4,4\"/>\n",
"<!-- &#45;7254988889033553360 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>&#45;7254988889033553360</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"291.5,-324 160.5,-324 160.5,-288 291.5,-288 291.5,-324\"/>\n",
"<text text-anchor=\"middle\" x=\"226\" y=\"-301\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Add(Filter, 1)</text>\n",
"</g>\n",
"<!-- 1806835817120246363 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>1806835817120246363</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"327.5,-252 124.5,-252 124.5,-216 327.5,-216 327.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"226\" y=\"-229\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Filter(Projection, EQ)</text>\n",
"</g>\n",
"<!-- 1806835817120246363&#45;&gt;&#45;7254988889033553360 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>1806835817120246363&#45;&gt;&#45;7254988889033553360</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M226,-252.3C226,-259.59 226,-268.27 226,-276.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"222.5,-276.38 226,-286.38 229.5,-276.38 222.5,-276.38\"/>\n",
"</g>\n",
"<!-- 2170911916307910323 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>2170911916307910323</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"204,-180 0,-180 0,-144 204,-144 204,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"102\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">EQ(1000, Projection)</text>\n",
"</g>\n",
"<!-- 2170911916307910323&#45;&gt;1806835817120246363 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>2170911916307910323&#45;&gt;1806835817120246363</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M132.97,-180.48C148.92,-189.49 168.56,-200.58 185.68,-210.24\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"183.68,-213.13 194.11,-215 187.12,-207.03 183.68,-213.13\"/>\n",
"</g>\n",
"<!-- &#45;1859449575225480863 -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>&#45;1859449575225480863</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"278,-108 14,-108 14,-72 278,-72 278,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"146\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(ReadParquet, id)</text>\n",
"</g>\n",
"<!-- &#45;1859449575225480863&#45;&gt;2170911916307910323 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>&#45;1859449575225480863&#45;&gt;2170911916307910323</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M135.12,-108.3C130.22,-116.1 124.32,-125.49 118.86,-134.17\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"115.92,-132.29 113.56,-142.61 121.84,-136.01 115.92,-132.29\"/>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"293.5,-36 158.5,-36 158.5,0 293.5,0 293.5,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"226\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;1859449575225480863 -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;1859449575225480863</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M206.22,-36.3C196.61,-44.72 184.88,-54.98 174.33,-64.21\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"172.19,-61.44 166.96,-70.66 176.79,-66.71 172.19,-61.44\"/>\n",
"</g>\n",
"<!-- &#45;119366443535666876 -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>&#45;119366443535666876</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"480,-180 222,-180 222,-144 480,-144 480,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"351\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(ReadParquet, x)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;119366443535666876 -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;119366443535666876</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M248.58,-36.49C260.6,-46.31 275.28,-59.15 287,-72 304.75,-91.46 322.01,-115.88 334.12,-134.22\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"331.08,-135.97 339.47,-142.44 336.95,-132.15 331.08,-135.97\"/>\n",
"</g>\n",
"<!-- &#45;119366443535666876&#45;&gt;1806835817120246363 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>&#45;119366443535666876&#45;&gt;1806835817120246363</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M319.78,-180.48C303.7,-189.49 283.9,-200.58 266.64,-210.24\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"265.16,-207.06 258.14,-215 268.58,-213.17 265.16,-207.06\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e0c70>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser.visualize()"
]
},
{
"cell_type": "markdown",
"id": "0c83c267-0182-4c6b-8383-755e14671914",
"metadata": {},
"source": [
"## Task Graph Generation\n",
"\n",
"Collections generate a task graph by traversing the underlying expression tree, and calling the `_layer` method on each expression. This method returns a low-level task graph (**not** an HLG `Layer`)."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "63a19d51-d18c-4e08-9c43-d2abe6ebc148",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('readparquet-f9cf2f12e46cc665ff0dcd6d747ba38c',\n",
" 0): (<dask.dataframe.io.parquet.core.ParquetFunctionWrapper at 0x2866e1450>, [{'piece': ('/Users/rzamora/workspace/dask_match_dev/demo_parquet',\n",
" [0],\n",
" [])}]),\n",
" ('readparquet-f9cf2f12e46cc665ff0dcd6d747ba38c',\n",
" 1): (<dask.dataframe.io.parquet.core.ParquetFunctionWrapper at 0x2866e1450>, [{'piece': ('/Users/rzamora/workspace/dask_match_dev/demo_parquet',\n",
" [1],\n",
" [])}])}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.expr._layer()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "32d06566-d9fd-49b1-a936-7164cab2ec98",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAVgAAAEFCAYAAACmfP9pAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3deVRTZ94H8G8ISZBFgQCuCKiAoqBWRAW1Kii4d6qItnZxwemqdqZOW3s6S207nbe1o07HOtXWWpVaF6xLFUsRUUBFXABRhLKIUhcWZVMgkN/7hy+8YwFlyc0Tkt/nHM6ZJjf3+RKe+XpzcxcZEREYY4zp2noz0QkYY8xYccEyxphEuGAZY0wi5qIDtMWdO3cQHR0tOobBmzp1KqysrETHYI+QmpqKjIwM0TEMmr29PYKCgkTHaJMOWbC//PILwsLCRMcweDk5OXBzcxMdgz3C1q1b8emnn4qOYdB8fX07bMF26F0EOTk5ICL++c1PUlKS6D8NawVfX1/hc8ZQf958803Rf5526dAFyxhjhowLljHGJMIFyxhjEuGCZYwxiXDBMsaYRLhgGWNMIlywjDEmES5YxhiTCBcsY4xJhAuWMcYkwgXLGGMS4YJljDGJcMEyxphEuGB1RKvVio7AGDMwXLDtkJmZiWXLlsHV1RVqtRrTpk1DTEyM6FiMtYu7uzvCw8NFxzAKXLBtdP/+fcyYMQNff/01goOD8fLLLyMrKwvTp0/H8ePHRcdjrE2++eYb/PLLL6JjGI0OeUcDQ/Duu+/iypUrOHToECZPngwAWLZsGQYPHowXX3wROTk5ghMy1jLXr1/H3/72N5w5cwYpKSmi4xgV3oJto2+++QY+Pj4N5QoAXbt2RXBwMHJzc3H69GmB6RhrufLycmRmZqJLly4YPny46DhGhQu2DYqKinDnzp0m7xPk4eEBAEhOTtZ3LMbaZMCAAYiLi0NcXBwiIiJExzEqXLBtcOXKFQBA9+7dGz3n6ekJALh9+7ZeMzHGDA8XbBvUfwlgb2/f6DkXFxcAwN27d/WaiTFmeLhg20ClUgEASkpKGj1XWVkJALCzs9NrJsaY4eGCbYNu3boBQJNHCtSXrqOjo14zMcYMDxdsG3h4eEAmkzVZsPWHuYwYMULfsRhjBoYLtg169OiBsWPH4vjx48jOzm54XKPRICIiAj179sSwYcMEJmSMGQIu2DZauXIlNBoN5syZg8jISMTGxmL69OnIycnBxo0bIZPJREdkjAnGZ3K10aRJk7B161YsXrwYs2bNAgDY2tris88+e+jkA8aY6eKCbYe5c+di9uzZSE5OhlarxYgRIyCXy0XHYqzN+vXrByISHcNocMG2k7m5OUaOHCk6BmPMAPE+WMYYkwgXLGOMSYQLljHGJMIFyxhjEuGCZYwxiXDBMsaYRLhgGWNMIlywjDEmES5YxhiTCBcsY4xJhAuWMcYkwgXLGGMS4YJljDGJcMEyxphEuGAZY0wiXLCMMSaRDn3B7UOHDvHtsZvw3zdiZIbvzp072Llzp+gYBunKlSuiI7RLhy7Y1157TXQExtotOzsbYWFhomMYLF9fX9ER2kxGfAMenXj99dfxr3/9S3QMxtolIyMDsbGxePnll0VHMQbreR+sDqSlpWH9+vUoKCgQHYWxdomIiMD27dtFxzAaXLA6EBERAa1Wi++//150FMba5dtvv0ViYiLy8vJERzEKXLDtRETYtm0bAGDLli2C0zDWdklJSbh69SpkMhl/6aYjXLDtlJiYiOvXrwMAUlNTkZmZKTgRY20TEREBpVIJrVbLGws6wgXbTt999x0UCgUAQKlU4rvvvhOciLHW02q1iIiIQE1NDQDg0qVLSE9PF5yq4+OCbYfa2lp899130Gg0AICamhr+l591SLGxsSgsLGz4b4VCwd8p6AAXbDvExMSgpKTkocdyc3Nx7tw5QYkYa5v63QP1NBoNvvnmG/BRnO3DBdsOERERDbsH6vFuAtbR1NTUYNeuXQ27B+pdu3YNZ86cEZTKOHDBtlFVVRX27NnTsHugXv1uAq1WKygZY61z6NAhVFRUNHqcNxbajwu2jQ4ePIh79+41+VxhYSFOnDih50SMtc327dthbt74rPmamhps3boVdXV1AlIZBy7YNtq2bRvkcnmTzykUCkREROg5EWOtV1lZiYMHDzb6JFavuLgYx44d028oI8IF2wZlZWU4fPgwamtrm3xeo9Fgx44djfZpMWZoIiMjHzlPeWOhfbhg22DPnj3Nlmu9srIyREdH6ykRY22zbds2yGSyZp/XaDTYuXMnqqur9ZjKeHDBtsG2bdsee/iKTCbji2Ywg1ZUVISYmJjH7mOtqKjA4cOH9ZTKuHTo68GKUF1dDScnJ8yePbvhsZKSEsTExGDKlCmwsrJqeNzCwgJE9MgtBMZEycjIwNNPP/3QY6mpqSgsLERgYOBDj9+6dUuf0YwGXw9WB86cOQM/Pz/k5OTAzc1NdBzG2uxPf/oTjh07hqSkJNFRjAFfD5YxxqTCBcsYYxLhgmWMMYlwwTLGmES4YBljTCJcsIwxJhEuWMYYkwgXLGOMSYQLljHGJMIFyxhjEuGCZYwxiXDBMsaYRLhgGWNMIlywjDEmES5YxhiTCBcsY4xJhAuWMcYkwgXLGGMS4YJljDGJcMEyxphEuGAZY0wiXLCMMSYRLljGGJMIFyxjjEmEC5YxxiTCBcsYYxLhgmWMMYlwwTLGmES4YBljTCLmogN0VBqNBhUVFQCAu3fvwtnZGcXFxbC1tQUA2NjYwNyc315m+CorK1FTU9Pw37a2tigsLIS5uTnMzMzQpUsXgek6NhkRkegQhqayshLp6en45ZdfkJeXh6tXr+LatWu4efMmiouLUVxcjMrKyseux8bGBmq1Gg4ODnBycoKLiwtcXV3h4uKCfv36YeDAgbCwsNDDb8RM1Y0bN5Ceno68vLyGn2vXrjXM45KSkofKtSkymQxqtbrhp3v37nB1dW34GTBgANzc3GBmxh+If2O9yRfsvXv3kJSUhISEBJw/fx4pKSnIycmBVquFQqFAr169Gkqxa9eucHBwgIODA+zt7WFpaQkAsLKyglKpRFVVFe7fvw/gQUkXFxejqKgIhYWFuH37dsMELygoQF1dHeRyOdzd3eHj4wNfX1/4+/vD19cXKpVK5FvCOqgbN24gISEBJ0+eREpKClJTU1FYWAgAsLa2bijE3r17w8HB4aHSrP+0ZWdnBwAoLy9HbW0tADSUcf1PQUFBw1wuKipqWP+gQYPg4+ODkSNHYtSoUejfv7+Ad8GgmF7BajQaJCYm4vDhwzh69CjOnz+P2tpaODs7w8/PD97e3vDx8YGPjw9cXV0hl8slyZCTk4PU1NSGn6SkJNy8eRMqlQq+vr4IDAzE5MmTMXz4cEkysI6vpKQE0dHROHz4MI4fP47c3FzI5XJ4e3tj2LBh8Pb2bvhxdHSUJEN5eTkuX76MlJQUpKWlISUlBcnJybh37x4cHBwwevRoBAcHIyQkBK6urpJkMGCmUbAVFRU4cOAA9uzZg+joaJSVlcHd3R3BwcEICAhAQEAAnJ2dRcdEdnY2EhISEB8fj59++glXr16FWq1GcHAwQkNDERISwrsUTFxeXh527dqFvXv3IikpCTKZDP7+/pgwYQICAgIwYsQI2NjYCM2o0Whw7tw5JCYmIi4uDjExMaioqED//v0xY8YMhIaGwtfXV2hGPTHegq2trcXBgwexfft2/Pjjj9BoNJgwYQJmzJiBkJAQ9O3bV3TEx7p06RIOHz6M/fv3Iz4+HtbW1pgxYwbmz5+PiRMn8j4vE1FUVIRt27Zhx44dSEpKgr29PWbOnIkpU6YgKCjI4L+EqqmpwYkTJxAVFYXIyEjk5OSgb9++mDNnDl588UV4eHiIjiiV9SAjk52dTe+88w51796dzMzMKDAwkL788ksqKioSHa1dCgoKaO3atRQQEEAAyNXVlVatWkXXr18XHY1JQKvV0s8//0xhYWGkUqmoS5cutGDBAjp8+DDV1NSIjtcuZ86coRUrVlDv3r1JJpPRk08+Sdu2baP79++LjqZr/zaagj1x4gSFhoaSXC6n7t2701tvvUU5OTmiY0niypUr9NZbb5GTkxOZmZnRtGnT6OTJk6JjMR2orq6mLVu20MCBAwkADRs2jP7zn/9QRUWF6Gg6V1dXR9HR0RQaGkoKhYIcHR3prbfeooKCAtHRdKVjF6xWq6U9e/bQ8OHDCQAFBATQnj17qLa2VnQ0vaiurqatW7fS0KFDCQCNGzeOjhw5IjoWa4PS0lJatWoVOTk5kUqlogULFlBqaqroWHpz48YNeu+998jBwYEsLCxoyZIllJubKzpWe3Xcgj1w4AANHTqUzMzMaNasWSa/Bffzzz9TcHAwAaAxY8bQsWPHREdiLVBRUUF///vfSa1Wk62tLa1cuZJu3LghOpYwlZWVtH79eurTpw8plUp66aWX6Nq1a6JjtVXHK9hTp07RqFGjSCaT0cyZMyklJUV0JIOSkJBAgYGBBIAmTZpEaWlpoiOxJtTW1tKGDRvIycmJbGxs6N1336WSkhLRsQxGTU0Nffnll9S7d2+ysLCgFStWUGlpqehYrdVxCvb69es0f/78hp3iZ86cER3JoMXGxpKvry+Zm5vTK6+80uG/5DMmMTEx5OPjQwqFgpYvX06FhYWiIxms6upqWrduHanVauratStt3LiR6urqRMdqKcMv2NraWvrnP/9JVlZW5ObmRrt37xYdqcOoq6ujr7/+mrp160b29va0efNm0mq1omOZrFu3btHcuXMJAE2ZMoUyMjJER+owiouL6fXXXydzc3MaPnw4XbhwQXSkljDsgk1NTSU/Pz9SKpX0l7/8xRgP49CLsrIyWrZsGZmZmVFQUBBlZ2eLjmRytmzZQmq1mnr37k0HDx4UHafDunjxIvn7+5NCoaCVK1caeicYZsHW1tbSRx99RAqFgkaNGkUXL14UHckonDp1igYNGkSWlpb0xRdfiI5jEm7evElTpkwhMzMzev3116m8vFx0pA6vrq6OPv/8c7KxsaH+/fvT2bNnRUdqjuEVbF5eHo0dO5ZUKhV9+umnHWl/S4dQU1ND7777Lsnlcpo+fTrdvn1bdCSjdeDAAXJycqI+ffpQQkKC6DhGJz8/n8aPH09KpZL+8Y9/GGJXGFbB/vDDD2Rra0teXl4dZR9Lh3X8+HFycXGhbt268SFdOlZTU0NLly4lmUxGzz//fEf89rvDqKuro3/84x+kVCopMDDQ0DYYDKNga2traeXKlSSTySg8PJzu3bsnOpJJuHv3Lj399NNkbm5Oq1evFh3HKPz66680evRosra2poiICNFxTEZycjK5ubmRs7MznT59WnSceuIL9s6dOzRp0iRSqVS0ceNG0XFMjlarpY8//pjkcjmFhYXxP27tkJiYSN27dyd3d3c+/liA4uJiCgkJIZVKRZs2bRIdh0h0webm5pKXlxf17NmTkpKSREYxedHR0WRvb08jR46kW7duiY7T4ezatYs6depEU6dOpbt374qOY7Lq6uoaPg2//fbbog9LFFewFy5coJ49e5K3tzddvXpVVAz2X7KyssjDw4Pc3Nzo8uXLouN0GGvWrCEzMzMKDw8njUYjOg4jom+++YaUSiXNmjVL5KcyMQV77NgxsrGxoYkTJ/K/9gamqKiIxowZQ3Z2dnTq1CnRcQyaVqulP/7xjySTyegvf/mL6DjsN2JiYsjW1pbGjh1LZWVlIiLov2APHTpEnTp1orlz53b461oaq8rKSgoODqYuXbpQfHy86DgGSavV0tKlS0kul9OWLVtEx2HNSE9Ppx49epCvr6+I08X1W7CRkZGkVCpp0aJFJnNJwY6qqqqKnnrqKbKysqKYmBjRcQxKXV0dLViwgFQqFe3du1d0HPYYGRkZ1KtXLxo6dKi+S1Z/BXvw4EFSKpX06quvit7xzFpIo9FQWFgYWVlZ8YHy/0er1dKSJUvIwsKCDh8+LDoOa6GcnBxycXEhX19ffR6XrJ+CjYmJIQsLC3rhhRcM8WwL9gi1tbUUGhpKXbp04SuYEdGKFStIoVDQvn37REdhrfTLL79Qjx49aNSoUfo6ZVn6gj1z5gxZW1vTvHnzeLdAB1VVVUWTJk0iBwcHyszMFB1HmPfff5/kcjnt2LFDdBTWRunp6eTg4EAhISH6OOJD2oLNzc2lbt266euXYRKqrKwkPz8/6tevn0lev/Tbb78lmUxGGzZsEB2FtdOZM2fIysqKwsPDpR5KuoItLS0lb29vGjhwIB+KZSQKCwupX79+NHz4cKqsrBQdR2/i4uJIpVLRW2+9JToK05GDBw+SXC6njz76SMphpCnY2tpamjRpEvXs2bMj30+HNeHSpUtkZ2dH8+fPFx1FL7Kzs8nOzo7mzJnD3x8YmXXr1pFMJpPySBBpCvadd94hlUrFp78aqejoaJLL5bRu3TrRUSR17949euKJJ2jw4MEmtcVuSsLDw8nGxobS09OlWL3uC3bfvn0kk8n4wi1GbtWqVaRQKCguLk50FMm8+OKLZGdnx3eAMGI1NTUUEBBAnp6eUhy+pduCvXr1Ktna2upj5zETTKvV0owZM6hnz55UXFwsOo7Obdq0iczMzPhYVxOQn59Pjo6O9Pzzz+t61f+WERFBB7RaLYKCgnDr1i0kJyejU6dOulgtM2B3797F4MGDMWzYMERGRoqOozPZ2dkYOnQoXnnlFXz88cei4zA9iIqKwpQpU7B9+3bMmzdPV6tdr7Mt2A8//JBUKhWdP39eV6tkHUBcXBzJ5XL66quvREfRCY1GQyNHjqQnnniCqqurRcdhevTKK6+Qra0t5eXl6WqVutlFkJqaSkqlkj799FNdrI51MCtWrCAbGxvKz88XHaXdVq1aRZaWlny5RhN079498vLyookTJ+rqdP727yLQarUYM2YM6urqkJCQALlcrqvNa9ZBVFdXY+jQoXB1dcWhQ4dEx2mzK1euYMiQIVi1ahXefPNN0XGYAElJSfD398dXX32FF154ob2ra/8ugs8++4yUSiXfWtvExcXFkUwmo++++050lDapq6uj0aNH0+DBg/kymibu9ddfJ3t7e7p582Z7V9W+XQTXr18na2tr+vOf/9zeIMwILFmyhLp27doh76K6ceNGMjc35+8QGJWVlVHv3r11cVRB+3YRzJ8/H4mJibh06RIsLCzauznNOrg7d+7Aw8MDCxYswP/8z/+IjtNi5eXl8PDwQFhYGNasWSM6DjMAu3fvxpw5c3D8+HGMHj26ratZ3+aCPXXqFPz9/bF79248/fTTbQ3AjMznn3+OP/7xj0hLS4OHh4foOC3y5ptvYvPmzcjMzIRarRYdhxmIcePGoaqqCidPnoRMJmvLKtpesP7+/lCpVIiNjW3Ly5mRqq2txZAhQ+Du7o69e/eKjvNYOTk5GDBgAD777DO8+uqrouMwA3LhwgX4+vri22+/xTPPPNOWVbStYA8ePIgZM2YgKSkJvr6+bRmYGbH6+XH69GkMHz5cdJxHevHFFxt2c5mbm4uOwwxMO+dH6wuWiDBixAj07NmzQ2yhMDFGjRoFOzs7gz5sKysrC15eXtiyZUtbt1CYkcvLy4Onpyc2bNiABQsWtPblrS/YyMhIhIaG4ty5cxg8eHBrB2QmIjo6GpMmTUJCQgL8/f1Fx2nSM888g5SUFKSmpvLx26xZS5YsQXR0NDIzM6FQKFrz0tYXrJ+fH3r37o3du3e3LiUzOaNHj4a9vT32798vOkojOTk58PDwwLZt2zB37lzRcZgBu3r1Ktzd3bFp0yY8//zzrXnperPWLH38+HGcOXOGz3JhLfLmm2/i4MGDuHTpkugojfzzn/+Es7MzZs+eLToKM3AuLi6YM2cOPv30U7T2K6tWbcHOnDkTJSUlOHHiRKtDMtNDRBg4cCBGjx6NL7/8UnScBnfu3EHv3r3xwQcfYNmyZaLjsA4gNTUVQ4YMwZEjRzBx4sSWvqzlW7A5OTk4ePAg3njjjbYlZCZHJpNh6dKl2Lp1K0pKSkTHabBp0yaYm5tj4cKFoqOwDsLHxwcTJkzA2rVrW/W6Fhfspk2b0L17d8yYMaPV4Zjpmj9/PpRKJbZt2yY6CoAHW9WbNm3Cc889BxsbG9FxWAfy0ksv4fDhw8jPz2/xa1pUsLW1tdiyZQsWLlzIxwqyVrG2tkZYWBg2btwoOgoA4NixY8jMzOStV9ZqM2bMgIODA7755psWv6ZFBXvgwAHcvHmTJyVrk8WLF+PixYs4deqU6CjYtGkTRowYgSFDhoiOwjoYpVKJF154AV9//TW0Wm2LXtOigo2IiMD48ePh6urannzMRPn5+cHLywsRERFCc1RWVuKHH35oywHjjAF4cGbX1atXER8f36LlH1uwFRUV+PHHHxEWFtbucMx0hYWFYdeuXairqxOW4cCBA6ipqcHvfvc7YRlYx+bl5YVBgwZh586dLVr+sQW7f/9+aDQaPPXUU+0Ox0xXWFgYbt68KfQQv507d2LChAlwcnISloF1fGFhYdi5cydqa2sfu+xjCzYyMhITJkyAo6OjTsIx0+Tp6QkfHx9hZwBWVlYiKioKoaGhQsZnxmPOnDkoLCxs0W6CRxasRqPBzz//zIdmMZ2YPn26sIu/xMbGoqqqClOnThUyPjMeHh4e8PDwQFRU1GOXfWTBJiQkoLS0FCEhIToLx0xXSEgIcnNzkZmZqfexo6KiMHToUHTv3l3vYzPjExISgsOHDz92uUcWbFRUFDw8PNC3b1+dBWOma9SoUbC3t2/RxNS1qKgoTJ48We/jMuM0efJkpKWloaCg4JHLPbJgjx49ikmTJuk0GDNdcrkc48ePx9GjR/U6bl5eHrKzs1tzDjljj/Tkk09CqVQ+di43W7CVlZW4cOFCe274xVgjo0ePRkJCQquvStQeCQkJUCgU8PPz09uYzLh16tQJw4YNQ0JCwiOXa7Zgk5KSoNFoDPZiyaxjCggIQHFxMa5cuaK3MRMSEjBs2DB06tRJb2My41e/sfAozRZsYmIinJ2d4ezsrPNgzHQNGTIEVlZWj52YunTy5EneUGA65+/vj0uXLuHOnTvNLtNswZ47d44/UjGdUygUGDp0KM6dO6eX8aqrq5Genm7wN19kHY+fnx+0Wi0uXLjQ7DLNFmxKSgp8fHwkCcZMm7e3N1JTU/Uy1qVLl6DRaPj+cUznunfvDkdHR6SlpTW7TJMFW1lZidzcXC5YJglvb2+kpaXp5YuulJQUqFQquLu7Sz4WMz2DBg1qfcFevHgRWq0WgwYNkiwYM13e3t4oLS3FtWvXJB8rPT0dXl5efB1jJon6jYXmNFmw2dnZUCgUcHNzkywYM139+vUD8OA2RFLLzs6Gh4eH5OMw09SvXz9kZ2c3+3yTBZuXlwdnZ2e+VzyTRNeuXWFpaYnc3FzJx8rNzeXrGDPJuLm5oaioCOXl5U0+32zBmuKkTE1Nxdq1ax952AVrP5lMBhcXF+Tl5Uk+Vl5eHlxcXCQfx5DwPNaf+k/5zc3lJgs2Pz/f5CYlAMTHx2P58uW4efOm6Ch6kZiYiA8++AC3bt3S+9hubm6SF2xZWRnu3r1rcnOZ57H+1G+IXr16tcnnmyzY27dvo2vXrpKFYobhxIkTeO+993Djxg29j+3k5ITCwkJJx6hff7du3SQdh4klch5bWVnBysqq2bncZMEWFxdDrVZLGoyZNgcHBxQXF0s6RlFREQDwXGaScnBwaJhrv9VkwRYVFcHBwUHSUE1ZunQpFi1ahOvXr+PVV1996C4Kd+/exSuvvIJBgwahW7duePrpp5u8ePOxY8fw6quvwsPDA87Ozpg3bx42bNjQ6F5QZ86cQWhoKPr06YOgoCB8/vnnjY7LDAsLw0cffYTExESEhYXB0dERAwcOxD/+8Y9Gd5VsybiP+v3Onj2LKVOmoGvXrggKCsKGDRuwf/9+jBkzpuHjx/PPP4/58+c3+p0//vhjjBkz5qFbWDzu/VqyZAk2bNgAAFi4cCGWLl3a/B9GAmq1utlJqSv1Ba7vuczz2HTmMfCYjQX6jerqagJAP/zww2+fktyTTz5Jnp6e5OPjQwDoiSeeICKia9eukaurK1lZWdHLL79Mb7/9Ng0dOpTMzMzon//8Z8Prjx49SnK5nOzt7em1116jv/71rxQQEEAAaMWKFQ3LxcbGkqWlJdnb29PixYtpyZIlZGtrS66urgSALl26REREarWa+vbtS126dKGnnnqKVq5cSb6+vgSAFi1a1Opxm/v9jh8/TpaWltS9e3d644036LnnniM7OzsaMGDAQ3m8vLyof//+jd63hQsXEgCqrq5u8fv1ySef0KhRowgAhYWF0dq1a3XyN2yp//znP2RnZyfpGNu2bSOlUinpGE3heWw685iIaOLEibRkyZKmnvp3o4ItLS0lABQVFSV9st948sknCQAFBwfT5cuXGx5/9tlnCQCdOnWq4bHq6mqaMGECKZVKKi4uJiKi8PBwUqlUdOfOnYbl7t+/T927d3/oDzp48GCys7Oj3NzchscyMzPJ0tKy0cQEQJ999lnDcnV1dTR+/HiSyWSUnJzcqnGb+/2GDx9OarWaCgoKGh5LTk6mTp06tWlitvT9+vjjjwkAnT9/vom/hrS2bNlCnTp1knSMTZs2UefOnSUdoyk8j01nHhMRTZ8+nZ5//vmmnvp3o10ENTU1AACVStXGDeb2W7VqFfr37w8AKCkpQUREBIYPH44RI0Y0LKNUKhEeHo6amhpERkYCAP7whz/gzJkzsLW1bViupqYGtra2KCsrAwCcOnUKKSkpeOWVVx46FM3d3R3PPfdcoyy2trZYvnx5w3+bmZlh5cqVICL89NNPLR63ud/vwoULOHPmDF566SX06NGjYZlhw4YhMDCw5W/a/2nN+yWSSqVCdXW1pGPU1NRAqVRKOsaj8Dw2/nkMPMjU3FxudP5g/YKiJqajo+NDVz66cuUKiAgVFRUICwt7aNn6P3r9mRT9+/dHcXExVq9ejZMnTyIvLw9ZWVkoKytr+KNnZGQAeHDZvN8aOHBgo8fc3d0hk8maXNtSueAAABmESURBVK414zb3+2VlZQFAk6clDxkyBAcPHmz0+KO05v0SSalUQqvVora2VrLTWKurq4VtKPA8/n/GPI+BBxsL9+7da/K5Rluw9TuYRZ27/dv/Q9TvPFapVFAoFA/9qNVqPPvssw0T5ZNPPkGvXr2watUqaDQaBAUF4ZtvvkFAQEDD+kpKSgCgybPULCwsGj3W1E3yrKysHlq+JeM29/vVH97R1MWgW/qPXP3vBLTu/RKp/ner/8QkhdraWmFnI/I8/n/GPI+BB5fg1Gg0TT7XqEUVCgUANPsCfevTpw+AB/8Cb9u27aHn6urqUF5eDktLSxQWFuLtt9+Go6MjsrKyYGNj07Dchx9+2PC/68+8iIuLw+9+97uH1tfUge+//PJLo8fql/P09GzxuM2pPwj+8uXLmDlz5kPP1W+l1JPJZI2+9QXw0N0BWvp+iVZfrFJ+UlIoFA99Iy0Sz+P/Z0zzGHgwl+t787cabcHqY8uiNfr16wdHR0ccOXKkUen//e9/h52dHZKSknD16lVotVo8/fTTD02Oa9euPXRBXF9fXygUikY3K6utrUVERESj8TMzMxs+/tTbvHkzgAcffVo6bnO8vb0hl8tx5MiRhw6vuXv3LuLi4h5a1tXVFXl5eQ+9D+np6Q/9n6el75do1dXVMDMzk/STklKp5Hn8f3geS6empqb5XVG//dqrrKxM6FEEvXr1avT4pk2bCADNmTOHzp49S1lZWfTpp5+SSqWiiRMnklarpbKyMrK2tiZ7e3vav38/ZWZm0ubNm6lXr15kZ2dHnTt3poyMDCIiWr58OQGghQsX0tmzZ+ncuXM0c+ZMsrOza/Ttq0wmIy8vL4qMjKSLFy/S+++/T2ZmZjRnzhwiolaN29zv99prrxEAWrJkCSUlJVFUVBQFBgaSmZnZQ3nef/99AkDPPPMMxcbG0saNG6lfv37k4ODw0LevLXm/iIi+++47AkAvv/wyJSUl6fiv+Wj6OIrgq6++EnYUAc9j05jHRETTpk1r9iiCRgVbU1NDAGjv3r3SJ/uN5v5wRETr1q0jCwsLAkAAyNzcnF566aWGQzWIiHbu3EnW1tYNy9jb29OWLVto9+7dZGVlRebm5kREVFVVReHh4Q3LAaDAwEDaunVro4kZFBREL7zwQsMkAUDjxo2joqKiVo/b3O9XWVlJL7300kN5xowZQ2+88cZDeSorK2ny5MkNy/Ts2ZPefvttevvttx+amC19v4qKimjkyJENv5M+bdiwgezt7SUdIyIighQKhaRjNIXnsenMYyKiwMBA+v3vf9/UU40LlojIxsaGNm3aJG2qNigrK6O4uDj68ccfKT8/v8llioqKKDo6mi5evNjwL1z941lZWQ8tm5+fTz/++CPl5OQ0uS61Wk0hISFERFRSUkI//fQTpaent3vc5ty6dYuOHj1K165dI6L//5e+fmLWu337Np0/f/6hcZrSkveLiKigoIDKyspalFFXPvjgA3J3d5d0jKioKAJApaWlko7TWjyPHzCGeUxENGTIEFq5cmVTT/27yR1garVa8vPE28LGxgZjx4595DJqtRpBQUFNPv7bc9Jbc9dcOzs7TJw4USfjNsfJyQlOTk6PXc7R0fGh0xOb05L3C0CjQ3D0QR/Xu6hff3FxMTp37izpWK3B8/gBY5jHwINLCzT33jR5LQJ9XIiDmbbi4mLJrxFQP+mlvuYBM22P2lhosmAdHR1N5lqSj9K9e3chF72p17lzZ/Ts2bPZQ0A6slu3bkn+3tZvHYm4Tqgh4XksnfLycty/f7/ZLfEmdxG4uLjg8uXLkgbrCB51MzN9WLZsGZYtWyY0g1Ryc3MxatQoScewtraGWq3Wy50TDBnPY+nU3/aouTvANLkFW3+cGmNSICLk5+fr5aaarq6uzV5tnrH2ys3NhUwma33BXr9+3WDO5mLG5caNG6iqqtJLwerj1jTMdOXl5cHJyanZs8qaLFh3d3fU1dUZzMUUmHHJzMwEAPTt21fysfr16/fQKZiM6VJmZmbDbeib0mTBenl5wdzcXPi+G2acUlNToVar9XJYjbe3NzIyMgzmlFlmXFJTU+Hj49Ps800WrIWFBdzd3blgmSTS0tIeOSl1ycfHBxqNptEFRxhrLyLCxYsX4e3t3ewyTRYs8GBipqSkSBKMmba0tLRHTkpd8vT0hEqlQmpqql7GY6bj+vXruHv3bpPXwK3XbMEOGzbMYK5Ww4xHdXU1UlJSMGzYML2Mp1Ao4OPjw3OZ6dzp06chl8ubvOh5vWYLNiAgADdv3kROTo4k4ZhpSk5ORlVVFUaPHq23MQMCApCQkKC38ZhpiI+Px5AhQx66vONvPXIL1sLCgicm06n4+Hh069at4YLK+hAQEICUlBSUl5frbUxm/BISEpq828N/a7ZgVSoVhg0bhhMnTug8GDNd8fHxet16BQB/f3/U1dXh1KlTeh2XGa/y8nJcuHAB/v7+j1yu2YIFgMDAQERFRek0GDNdNTU1OHbsWJNXa5JSjx49MGDAABw5ckSv4zLjFRMTA61Wi/Hjxz9yuUcW7OTJk3Ht2jVcunRJp+GYaTpx4gQqKiowadIkvY89efJk3lhgOhMVFQVfX9/HXpbxkQU7fPhwqNVqnphMJ6KiojBgwAC9nCL7WyEhIUhPT+frEjCdiIqKQkhIyGOXe2TByuVyhISEYN++fToLxkzX/v37MXnyZCFjjx07FjY2Njhw4ICQ8ZnxSE1NxdWrV1s0lx9ZsAAwe/ZsxMfHo6CgQCfhmGk6f/48MjMzMWfOHCHjq1QqTJ8+HTt37hQyPjMe33//PZydnTFixIjHLvvYgp08eTJsbGywZ88enYRjpmnnzp3o3bs3/Pz8hGUICwtDQkICbyywdtm1axfCwsIgk8keu+xjC1alUmHGjBn8Lz9rMyJq1aSUSnBwMGxsbLBr1y5hGVjHdvbsWWRlZbX4k9hjCxYAnn32WSQmJvIFM1ibHD9+HNnZ2Zg/f77QHCqVCqGhodi8ebPQHKzj2rx5Mzw9PeHr69ui5VtUsBMnToSLiwu++uqrdoVjpmnjxo3w8/PT2xW0HmXx4sVITU3F6dOnRUdhHcz9+/cRERGBRYsWtfiTWIsK1szMDAsXLsTmzZtRXV3drpDMtNy9exd79+7F4sWLRUcBAIwYMQJDhgzBxo0bRUdhHcyuXbtQUVGB5557rsWvaVHBAsDChQtRWlrK+69Yq3z99deQy+WYO3eu6CgNFi1ahB07dqCkpER0FNaBfPHFF5g5cya6devW4te0uGB79uyJ0NBQfPLJJyCiNgVkpkWj0WDt2rVYvHjxI684pG8LFiyAhYUFvvjiC9FRWAdx4sQJnDp1Cm+88UarXtfiggWAP/7xj0hNTcXRo0dbNQgzTd9//z0KCgrw2muviY7yECsrKyxevBjr1q1DVVWV6DisA1i9ejX8/Pwee3GX35JRKzdHx48fD5VKxafPskciIjzxxBPw9PTEjh07RMdppKCgAH369MH69euxaNEi0XGYAbty5Qq8vLywc+dOzJo1qzUvXd/qgv3pp58QHByMhISEVrc5Mx179+7FrFmzcP78eQwePFh0nCaFh4cjJiYGGRkZUCqVouMwA/XMM8/gwoULSEtLg1wub81LW1+wADBu3DiYm5vj559/bu1LmQnQarV44okn4OHhYdAnqOTn58PDwwNr167F73//e9FxmAFKT0+Hj48PduzYgdDQ0Na+vG0Fe+zYMYwfPx6xsbEYN25ca1/OjNyOHTswf/58pKamwsvLS3ScR3rttdewb98+ZGVlwcLCQnQcZmBmzZqFX375BefPn4eZWau+sgLaWrAAMGnSJJSUlCApKaktAzMjVVVVBS8vL4wePRrffvut6DiP9euvv8LDwwPvvvsu3nnnHdFxmAGJj4/H2LFjsW/fPkyfPr0tq1jf5mZcs2YNUlJSsGXLlraughmh1atX49atW/jggw9ER2mRHj164K233sJHH32EX3/9VXQcZiC0Wi3+8Ic/YMKECW0t1weoHV555RXq2rUr3b17tz2rYUbi5s2b1LlzZ/rggw9ER2mVe/fukaurK7344ouiozADsWnTJjI3N6fU1NT2rObfbd5FAADFxcXw8PDAc889hzVr1rS95ZlRmDt3LpKSkpCeno5OnTqJjtMqO3fuxLx583Ds2DGMGTNGdBwmUHFxMQYMGIC5c+di3bp17VlV2/fB1tu8eTMWL16MEydO8GFbJuzQoUOYOnUqDhw4gGnTpomO0yYzZ85ERkYGUlJS+AsvE/bCCy8gOjoaly5dgq2tbXtW1f6CBR584XXz5k0kJyfz8YQmqLy8HAMHDsSTTz6JrVu3io7TZvn5+Rg0aBCWL1+O999/X3QcJkBsbCwCAwMRGRmJp556qr2rW9+ufbD1srOzydLSkt59911drI51MIsXLyYHBwe6ffu26Cjt9q9//YsUCgWdOXNGdBSmZ3fu3CEXFxeaPXu2rlb5b50ULBHR+vXryczMjGJjY3W1StYB7N27lwDQ999/LzqKTmi1Wpo8eTL17duXysrKRMdhevTMM8+Qk5MT3bx5U1er1F3BEhHNmDGDnJ2dqaSkRJerZQaqoKCA1Go1hYeHi46iU/W/15IlS0RHYXry7bffkkwmo4MHD+pyte07iuC3CgsL4ePjA39/f+zevVvo/ZeYtDQaDYKCgnDz5k2cO3cOVlZWoiPpVGRkJGbPno3t27dj3rx5ouMwCWVkZMDPzw8LFy7U9dFQutkH+98SEhJIoVDQxx9/rOtVMwOybNky6tSpE507d050FMmYwu9o6srLy8nLy4tGjBhBVVVVul69bncR1Fu9ejXJ5XKKjo6WYvVMsIiICJLJZLR9+3bRUSRVU1NDY8aMoX79+tGdO3dEx2E6ptVqadasWdS1a1e6du2aFENIU7BERGFhYWRvb08ZGRlSDcEEOHXqFFlaWtKyZctER9GLGzduUI8ePSgoKIhqampEx2E69N5775G5ubmUX8xLV7D37t2jUaNGkZubmy6/lWMC5ebmUteuXWny5Mmk0WhEx9Gbs2fPkrW1NS1YsEB0FKYj9Z/C1q9fL+Uw0hUsEdGtW7eoT58+NHLkSKqsrJRyKCax4uJi6t+/Pw0dOpTKy8tFx9G7gwcPklwup1WrVomOwtopOjqaFAoFvfPOO1IPJW3BEhFlZGSQg4MDBQcHS7ETmelBWVkZ+fn5kYuLCxUUFIiOI8wXX3xBMpmMPv/8c9FRWBudPHmSrK2t6ZlnniGtViv1cNIXLBHRhQsXyM7OjmbOnMn7sTqYe/fu0bhx48jJyYkuX74sOo5wn332GclkMvryyy9FR2GtlJKSQvb29hQcHEzV1dX6GFI/BUtEFB8fT1ZWVhQWFsYl20Hcu3ePgoODSa1WU1pamug4BuPdd98luVxO27ZtEx2FtVBqaio5OTlRUFAQ3b9/X1/D6q9giYhiYmLIysqKZs6cybsLDFxZWRk9+eSTpFar6ezZs6LjGJw333yTzMzMaOPGjaKjsMdISkoie3t7GjduHFVUVOhzaP0WLNGDExFsbW1p0qRJ/MWXgSouLiY/Pz/q1q0bb7k+wl/+8heSyWS0Zs0a0VFYM+Li4qhz5840ZcoUunfvnr6H13/BEj047MXBwYH8/Pz4EC4Dk5OTQ/379ycXFxfKzMwUHcfgffLJJySTyehPf/qTPr40Ya3w/fffk4WFBYWGhuprn+tviSlYogeXOPT09CRXV1e6dOmSqBjsv5w+fZq6du1K3t7elJ+fLzpOh7Fz506ysLCgp59+WsRWEmvCmjVryMzMjJYuXUq1tbWiYogrWCKiwsJC8vf3Jzs7Ozp8+LDIKCYvIiKCLC0tacqUKSZ5nGt7HT16lOzs7GjUqFEmfSibaFVVVbRo0SKSy+WGsOtGbMESEd2/f5/mz59PZmZmtGrVKv6YpWcajYaWL19OAGjZsmUmdYaWrl26dIk8PT2pW7dudPz4cdFxTM7Vq1dp+PDh1KVLF9q/f7/oOESGULD11q1bRwqFgmbMmEHFxcWi45iE69ev09ixY8nS0tLoL9yiL6WlpfTUU0+RQqGgzz77jDcY9OTIkSPk6OhIAwcOpCtXroiOU89wCpaI6MSJE9SzZ0/q1asXxcTEiI5j1Pbs2UNqtZo8PDwoJSVFdByjotVq6aOPPiJzc3MKCQmhGzduiI5ktKqqquiNN94gmUxG8+bNM7TdW4ZVsEQPDhGaNWsWmZmZ0YoVK/R5ULBJKCsro8WLFxMACg8P1/dxgSbl5MmT1LdvX3J0dKQffvhBdByjk5qaSoMHD6bOnTvTli1bRMdpiuEVbL2vvvqKbGxsyNPTk+Li4kTHMQr79+8nZ2dncnBwoMjISNFxTEJZWRktWLCAANCcOXP4sEQdqKqqovfee4+USiX5+/tTdna26EjNMdyCJSLKz8+nadOmkUwmoyVLlvC+2TYqKCigOXPmEAB69tlnjeLurx1NVFQUubq6kp2dHW3atInq6upER+qQYmNjydPTk6ytrWnt2rWG/j4adsHW27FjB3Xr1o3UajV9/vnn/E13C92/f58+/PBDsra2JldXVz4UTrCKigp64403yNzcnHx9fSk+Pl50pA4jNzeXZs+eTQBo6tSpdPXqVdGRWqJjFCzRg29nV6xYQUqlkgYOHEgHDhwQHclg1dXV0XfffUdubm5kZWVFq1at4gPgDUhaWhoFBQWRTCajuXPnUlZWluhIBqukpIRWrlxJFhYW5OHhoeu7vkqt4xRsvaysLHrqqadIJpPRyJEj6aeffhIdyWBotVqKjIwkb29vMjMzo+eff56uX78uOhZrxg8//EAeHh5kbm5OixYtory8PNGRDEZZWRm9//77ZGtrS/b29rR69WpRp7u2R8cr2HpJSUk0efJkAkABAQG0d+9eQ98fI5mamhravn07DRkyhGQyGYWGhlJ6erroWKwFNBoNbd68mdzc3EipVNKiRYtM+m9369Yt+vOf/0xqtZq6dOlCf/3rX6m0tFR0rLbquAVbLyEhgaZPn05mZmbk4eFBX3zxhaEdCyeZkpISWr16NfXu3ZvMzc0pLCyMLly4IDoWa4OamhrauHEjeXp6kkwmo6lTp9LPP/9sMicqpKen0+9//3vq1KkTOTg40J///Gdj+FK74xdsvcuXL1N4eDhZWFiQjY0NhYeH0+nTp0XH0jmtVkvHjh2j+fPnN/yuy5cvp9zcXNHRmA7U1dXRvn37aOzYsQSA+vXrR3//+9/p119/FR1N5yoqKmjz5s0UEBBAAMjd3Z3Wr19vTJcxNZ6CrVdcXExr1qyhQYMGEQDy8vKiv/3tbx3+difnz5+nd955h/r06UMAyM/Pj7788ksqKysTHY1JJC0tjZYuXUr29vYNZ4V9/fXXVFJSIjpam1VVVdH+/ftp/vz5ZGNjQyqVisLCwig6OtoYd/H9W0ZEBCN16tQpREREYPfu3bhx4wYGDx6MadOmISQkBKNGjYJcLhcdsVk1NTU4ceIEoqKisH//fmRmZsLFxQVz5szB/Pnz4ePjIzoi05Oqqir88MMP2LFjB44cOQKtVovAwEBMnToVISEh6Nu3r+iIj1RYWIjo6GgcOnQIP/74I0pLS+Hv74+wsDDMmzcPDg4OoiNKZb1RF2y9uro6nDhxArt378bhw4eRk5MDOzs7TJgwAaNHj0ZAQACGDh0Kc3NzYRmrq6uRnJyMhIQExMfHIzY2FhUVFejfvz+mTp2K2bNnY8SIEZDJZMIyMvFKS0uxb98+7Nu3Dz///DPKysrg7u6OwMBABAQEICAgAG5ubkIzFhUVITExEfHx8YiLi0NycjLkcjlGjx6NadOmITQ0FM7OzkIz6olpFOxvXblyBVFRUYiJiUFiYiKKi4thZWWFoUOHYtCgQRg8eDAGDRoEd3d3dO3aVadjExFu3LiBK1eu4OLFi0hLS0NqaipSUlJQVVWFbt26YfTo0QgMDERISAhcXV11Oj4zHhqNBomJiYiKisLx48eRnJyMmpoa9OjRA0OGDIGPjw+8vb0xcOBA9O3bF9bW1jodv7q6Gvn5+UhPT8fFixeRmpqK1NRUZGZmQiaTwcvLC2PGjMGkSZMQGBgIGxsbnY7fAZhmwf43IkJGRgYSExNx9uxZpKWlIS0tDaWlpQCATp06wc3NDW5ubnB0dIRarYaDgwPs7e3RqVMnWFhYAAA6d+6MsrIyAMD9+/dx//59FBcXo6ioCMXFxbh9+zZyc3ORl5eH6upqAIC9vT0GDx4Mb29vDBs2DAEBAQb/cY8ZrqqqKiQnJ+PkyZO4cOEC0tLSkJGRAY1GAwBwdHSEq6srevXqBScnJzg4OECtVsPKygpdunSBmZkZFAoFzM3Ncf/+fQAPtpjr53L9z7Vr15CXl4dff/0VWq0WMpkMbm5uDYU+YsQI+Pv7w87OTuTbYQi4YJuTn5+P7Oxs5OXlNRRjfVkWFxejpKQEVVVVDRPxv1laWsLCwgJqtbrhx8HBoaGoXV1d0a9fP/To0UPAb8ZMiUajQVZWFnJychrmcUFBAQoLCxvmcmVlJcrKylBXV9fo9ba2to3mcvfu3R+ayx4eHqa4ddoSXLC6QEQoLS2Fra2t6CiMtUtNTQ1qa2thaWkpOooxWC/uWx0jIpPJuFyZUVAqlVAqlaJjGA0z0QEYY8xYccEyxphEzAHcER2CMcaM0L3/BZb2Y1soqYSZAAAAAElFTkSuQmCC",
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.visualize(tasks=True)"
]
},
{
"cell_type": "markdown",
"id": "072f8b95-44a3-4977-97f4-d8429a837593",
"metadata": {},
"source": [
"## Optimization"
]
},
{
"cell_type": "markdown",
"id": "6d69eea0-1a6c-4996-a70a-2deb22580b8a",
"metadata": {},
"source": [
"Currently includes two kinds of optimization passes:\n",
"\n",
"1. **Simplify**\n",
" - Takes multiple passes over the expression graph until nothing needs to change\n",
" - Each expression may replace itself or it's parent\n",
" - allows operations to be pushed down and re-ordered\n",
" - \"Abstract\" expressions can replace themselves with other expressions\n",
"2. **Fusion**\n",
" - Finds groups of `Blockwise` expressions and converts the group to a single `Fused` expression."
]
},
{
"cell_type": "markdown",
"id": "790c1078-a464-498f-9f6b-704108c92cc1",
"metadata": {
"tags": []
},
"source": [
"### Simplify"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "8e5db90b-dd0e-4e30-b6ed-b2e343328c6c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Add: right=1\n",
" Filter:\n",
" Projection: columns='x'\n",
" ReadParquet: path='demo_parquet' blocksize='1MiB' kwargs={'dtype_backend': None}\n",
" EQ: left=1000\n",
" Projection: columns='id'\n",
" ReadParquet: path='demo_parquet' blocksize='1MiB' kwargs={'dtype_backend': None}\n"
]
}
],
"source": [
"ser.pprint()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "6d66995e-3042-428a-95ca-a55c1b126cb6",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Add: right=1\n",
" ReadParquet: path='demo_parquet' columns=['x'] filters=[[('id', '==', 1000)]] blocksize='1MiB' kwargs={'dtype_backend': None} _series=True\n"
]
}
],
"source": [
"ser.simplify().pprint()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "39c5c355-0e95-41fb-8367-be270765daba",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"214pt\" height=\"116pt\"\n",
" viewBox=\"0.00 0.00 214.00 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 210,-112 210,4 -4,4\"/>\n",
"<!-- 2546038844601650633 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>2546038844601650633</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"206,-108 0,-108 0,-72 206,-72 206,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"103\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Add(ReadParquet, 1)</text>\n",
"</g>\n",
"<!-- 4975111112553377444 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>4975111112553377444</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"170.5,-36 35.5,-36 35.5,0 170.5,0 170.5,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"103\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- 4975111112553377444&#45;&gt;2546038844601650633 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>4975111112553377444&#45;&gt;2546038844601650633</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M103,-36.3C103,-43.59 103,-52.27 103,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"99.5,-60.38 103,-70.38 106.5,-60.38 99.5,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e0af0>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser.simplify().visualize()"
]
},
{
"cell_type": "markdown",
"id": "ba28061b-68f7-4abd-9d2b-427643d00b7b",
"metadata": {},
"source": [
"### Fusion"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e770d156-9c1d-4550-b754-49a606d99494",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fused(d7ebf):\n",
"| Add: right=1\n",
"| ReadParquet: path='demo_parquet' columns=['x'] filters=[[('id', '==', 1000)]] blocksize='1MiB' kwargs={'dtype_backend': None} _series=True\n"
]
}
],
"source": [
"ser.optimize().pprint()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "a8ea611c-8e41-4b25-81e2-1571fb5d33e9",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"80pt\" height=\"44pt\"\n",
" viewBox=\"0.00 0.00 80.00 44.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 40)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-40 76,-40 76,4 -4,4\"/>\n",
"<!-- 6283737263869195331 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>6283737263869195331</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"72,-36 0,-36 0,0 72,0 72,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"36\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Fused</text>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e24d0>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser.optimize().visualize()"
]
},
{
"cell_type": "markdown",
"id": "70c52020-2beb-4163-ab3a-bd54218062af",
"metadata": {},
"source": [
"## Abstract Expressions\n",
"\n",
"There are many operations in Dask DataFrame that can be implemented in multiple ways (depending on metadata and partitioning information), or that actually correspond to a composition of other operations. In `dask_expr`, we call these \"abstract\" expressions, and we rely on `simplify` to convert these expressions to non-abstract expressions before graph-generation time."
]
},
{
"cell_type": "markdown",
"id": "b5c9d7ec-a87c-4ba2-8975-a0ed5db950b3",
"metadata": {},
"source": [
"### Mean (Simplest Example)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "ed30fe64-b9b9-43fc-a5b9-a84595ab9eb1",
"metadata": {},
"outputs": [],
"source": [
"mean = df.mean()"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "d367c398-0248-4d50-98d8-a0e8053de59d",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"207pt\" height=\"116pt\"\n",
" viewBox=\"0.00 0.00 207.00 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 203,-112 203,4 -4,4\"/>\n",
"<!-- 8462502350530249379 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>8462502350530249379</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"199,-108 0,-108 0,-72 199,-72 199,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"99.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Mean(ReadParquet)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"167,-36 32,-36 32,0 167,0 167,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"99.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;8462502350530249379 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;8462502350530249379</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M99.5,-36.3C99.5,-43.59 99.5,-52.27 99.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"96,-60.38 99.5,-70.38 103,-60.38 96,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e1660>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean.visualize()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "2facbcc3-603a-4fe6-9d36-f261892c7cec",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"418pt\" height=\"188pt\"\n",
" viewBox=\"0.00 0.00 418.00 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 414,-184 414,4 -4,4\"/>\n",
"<!-- &#45;7944823574735353292 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>&#45;7944823574735353292</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"290,-180 126,-180 126,-144 290,-144 290,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"208\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Div(Sum, Count)</text>\n",
"</g>\n",
"<!-- 5030698514045464466 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>5030698514045464466</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"202,-108 0,-108 0,-72 202,-72 202,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"101\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Count(ReadParquet)</text>\n",
"</g>\n",
"<!-- 5030698514045464466&#45;&gt;&#45;7944823574735353292 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>5030698514045464466&#45;&gt;&#45;7944823574735353292</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.72,-108.48C141.23,-117.32 157.81,-128.16 172.38,-137.7\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"170.06,-140.37 180.35,-142.91 173.9,-134.51 170.06,-140.37\"/>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"275.5,-36 140.5,-36 140.5,0 275.5,0 275.5,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"208\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;5030698514045464466 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;5030698514045464466</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M181.28,-36.48C167.77,-45.32 151.19,-56.16 136.62,-65.7\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"135.1,-62.51 128.65,-70.91 138.94,-68.37 135.1,-62.51\"/>\n",
"</g>\n",
"<!-- &#45;6559014336408917422 -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>&#45;6559014336408917422</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"410,-108 220,-108 220,-72 410,-72 410,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"315\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Sum(ReadParquet)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;6559014336408917422 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;6559014336408917422</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M234.72,-36.48C248.23,-45.32 264.81,-56.16 279.38,-65.7\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"277.06,-68.37 287.35,-70.91 280.9,-62.51 277.06,-68.37\"/>\n",
"</g>\n",
"<!-- &#45;6559014336408917422&#45;&gt;&#45;7944823574735353292 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;6559014336408917422&#45;&gt;&#45;7944823574735353292</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M288.28,-108.48C274.77,-117.32 258.19,-128.16 243.62,-137.7\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"242.1,-134.51 235.65,-142.91 245.94,-140.37 242.1,-134.51\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x16c5a67a0>"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mean.simplify().visualize()"
]
},
{
"cell_type": "markdown",
"id": "ad8228c0-6a53-4417-b061-7b256af01fe4",
"metadata": {},
"source": [
"### Merge"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "e304d3da-990f-4d19-a46c-413404f0074a",
"metadata": {},
"outputs": [],
"source": [
"merged = df[[\"id\", \"x\"]].merge(ts[[\"id\", \"y\"]], on=\"id\")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "fcb7c281-ea0d-4c9a-acbb-a2a3a4bcb65b",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"624pt\" height=\"188pt\"\n",
" viewBox=\"0.00 0.00 624.00 188.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-184 620,-184 620,4 -4,4\"/>\n",
"<!-- &#45;3505828150904528709 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>&#45;3505828150904528709</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"440,-180 164,-180 164,-144 440,-144 440,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"302\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Merge(Projection, Projection)</text>\n",
"</g>\n",
"<!-- &#45;6591904313663984018 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;6591904313663984018</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"288,-108 0,-108 0,-72 288,-72 288,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"144\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(Timeseries, [&#39;id&#39;, &#39;y&#39;])</text>\n",
"</g>\n",
"<!-- &#45;6591904313663984018&#45;&gt;&#45;3505828150904528709 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;6591904313663984018&#45;&gt;&#45;3505828150904528709</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M183.46,-108.48C204.37,-117.75 230.26,-129.22 252.51,-139.07\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"250.82,-142.15 261.38,-143.01 253.66,-135.75 250.82,-142.15\"/>\n",
"</g>\n",
"<!-- 643655380795912720 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>643655380795912720</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"200.5,-36 87.5,-36 87.5,0 200.5,0 200.5,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"144\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Timeseries</text>\n",
"</g>\n",
"<!-- 643655380795912720&#45;&gt;&#45;6591904313663984018 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>643655380795912720&#45;&gt;&#45;6591904313663984018</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M144,-36.3C144,-43.59 144,-52.27 144,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"140.5,-60.38 144,-70.38 147.5,-60.38 140.5,-60.38\"/>\n",
"</g>\n",
"<!-- &#45;2663861881151484105 -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>&#45;2663861881151484105</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"616,-108 306,-108 306,-72 616,-72 616,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"461\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(ReadParquet, [&#39;id&#39;, &#39;x&#39;])</text>\n",
"</g>\n",
"<!-- &#45;2663861881151484105&#45;&gt;&#45;3505828150904528709 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>&#45;2663861881151484105&#45;&gt;&#45;3505828150904528709</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M421.29,-108.48C400.25,-117.75 374.19,-129.22 351.81,-139.07\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"350.61,-135.78 342.87,-143.01 353.43,-142.18 350.61,-135.78\"/>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"528.5,-36 393.5,-36 393.5,0 528.5,0 528.5,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"461\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;2663861881151484105 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;2663861881151484105</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M461,-36.3C461,-43.59 461,-52.27 461,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"457.5,-60.38 461,-70.38 464.5,-60.38 457.5,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e3520>"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.visualize()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "f36b19b4-9c77-4578-96c0-16addc83e25d",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"912pt\" height=\"332pt\"\n",
" viewBox=\"0.00 0.00 912.00 332.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 328)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-328 908,-328 908,4 -4,4\"/>\n",
"<!-- 6649440095151428534 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>6649440095151428534</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"634,-324 269,-324 269,-288 634,-288 634,-324\"/>\n",
"<text text-anchor=\"middle\" x=\"451.5\" y=\"-301\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">BlockwiseMerge(Projection, Projection)</text>\n",
"</g>\n",
"<!-- 9018757749234557260 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>9018757749234557260</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"442.5,-252 150.5,-252 150.5,-216 442.5,-216 442.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"296.5\" y=\"-229\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(DiskShuffle, [&#39;id&#39;, &#39;y&#39;])</text>\n",
"</g>\n",
"<!-- 9018757749234557260&#45;&gt;6649440095151428534 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>9018757749234557260&#45;&gt;6649440095151428534</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M335.21,-252.48C355.63,-261.71 380.89,-273.11 402.65,-282.94\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"401.08,-286.07 411.63,-287 403.96,-279.69 401.08,-286.07\"/>\n",
"</g>\n",
"<!-- &#45;7124541912327084677 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>&#45;7124541912327084677</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"443,-180 0,-180 0,-144 443,-144 443,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">DiskShuffle(AssignPartitioningIndex, _partitions)</text>\n",
"</g>\n",
"<!-- &#45;7124541912327084677&#45;&gt;9018757749234557260 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>&#45;7124541912327084677&#45;&gt;9018757749234557260</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M240.04,-180.3C248.96,-188.63 259.83,-198.78 269.64,-207.93\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"267.11,-210.36 276.81,-214.62 271.89,-205.24 267.11,-210.36\"/>\n",
"</g>\n",
"<!-- &#45;8633293669623547828 -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>&#45;8633293669623547828</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"389.5,-108 53.5,-108 53.5,-72 389.5,-72 389.5,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">AssignPartitioningIndex(Timeseries)</text>\n",
"</g>\n",
"<!-- &#45;8633293669623547828&#45;&gt;&#45;7124541912327084677 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>&#45;8633293669623547828&#45;&gt;&#45;7124541912327084677</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M221.5,-108.3C221.5,-115.59 221.5,-124.27 221.5,-132.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"218,-132.38 221.5,-142.38 225,-132.38 218,-132.38\"/>\n",
"</g>\n",
"<!-- &#45;1030092915665515506 -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>&#45;1030092915665515506</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"278,-36 165,-36 165,0 278,0 278,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Timeseries</text>\n",
"</g>\n",
"<!-- &#45;1030092915665515506&#45;&gt;&#45;8633293669623547828 -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>&#45;1030092915665515506&#45;&gt;&#45;8633293669623547828</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M221.5,-36.3C221.5,-43.59 221.5,-52.27 221.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"218,-60.38 221.5,-70.38 225,-60.38 218,-60.38\"/>\n",
"</g>\n",
"<!-- 429733411555409660 -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>429733411555409660</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"790.5,-252 498.5,-252 498.5,-216 790.5,-216 790.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"644.5\" y=\"-229\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(DiskShuffle, [&#39;id&#39;, &#39;x&#39;])</text>\n",
"</g>\n",
"<!-- 429733411555409660&#45;&gt;6649440095151428534 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>429733411555409660&#45;&gt;6649440095151428534</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M596.3,-252.48C570.18,-261.96 537.69,-273.74 510.12,-283.74\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"509.05,-280.4 500.85,-287.1 511.44,-286.98 509.05,-280.4\"/>\n",
"</g>\n",
"<!-- 3077170189944401231 -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>3077170189944401231</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"904,-180 461,-180 461,-144 904,-144 904,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"682.5\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">DiskShuffle(AssignPartitioningIndex, _partitions)</text>\n",
"</g>\n",
"<!-- 3077170189944401231&#45;&gt;429733411555409660 -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>3077170189944401231&#45;&gt;429733411555409660</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M673.11,-180.3C668.92,-188.02 663.89,-197.29 659.22,-205.89\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"656.2,-204.11 654.51,-214.57 662.35,-207.45 656.2,-204.11\"/>\n",
"</g>\n",
"<!-- 7484891547605254188 -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>7484891547605254188</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"861.5,-108 503.5,-108 503.5,-72 861.5,-72 861.5,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"682.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">AssignPartitioningIndex(ReadParquet)</text>\n",
"</g>\n",
"<!-- 7484891547605254188&#45;&gt;3077170189944401231 -->\n",
"<g id=\"edge7\" class=\"edge\">\n",
"<title>7484891547605254188&#45;&gt;3077170189944401231</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M682.5,-108.3C682.5,-115.59 682.5,-124.27 682.5,-132.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"679,-132.38 682.5,-142.38 686,-132.38 679,-132.38\"/>\n",
"</g>\n",
"<!-- 6569651679040712417 -->\n",
"<g id=\"node9\" class=\"node\">\n",
"<title>6569651679040712417</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"750,-36 615,-36 615,0 750,0 750,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"682.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- 6569651679040712417&#45;&gt;7484891547605254188 -->\n",
"<g id=\"edge8\" class=\"edge\">\n",
"<title>6569651679040712417&#45;&gt;7484891547605254188</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M682.5,-36.3C682.5,-43.59 682.5,-52.27 682.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"679,-60.38 682.5,-70.38 686,-60.38 679,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e3a60>"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.simplify().visualize()"
]
},
{
"cell_type": "markdown",
"id": "1ba72dc9-83c6-4626-adb8-baf9fedd5414",
"metadata": {},
"source": [
"### Groupby"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "47bbf04e-7064-4fb1-b763-dd949d524a50",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<dask_expr.groupby.GroupBy at 0x2866e3a90>"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gb = df.groupby([\"id\", \"name\"])\n",
"gb"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "a6d5c92e-03c6-4662-ba23-ee203ece054c",
"metadata": {},
"outputs": [],
"source": [
"agg = gb.agg({\"x\": [\"mean\", \"count\"], \"y\": [\"min\", \"max\"]})"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "45cbc021-4568-4f92-97a3-3b2f98dac915",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"343pt\" height=\"116pt\"\n",
" viewBox=\"0.00 0.00 343.00 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 339,-112 339,4 -4,4\"/>\n",
"<!-- &#45;4797283925098725576 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>&#45;4797283925098725576</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"335,-108 0,-108 0,-72 335,-72 335,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"167.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">GroupbyAggregation(ReadParquet)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"235,-36 100,-36 100,0 235,0 235,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"167.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;4797283925098725576 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;4797283925098725576</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M167.5,-36.3C167.5,-43.59 167.5,-52.27 167.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"164,-60.38 167.5,-70.38 171,-60.38 164,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2866e3b80>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg.visualize()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "45c1875d-6184-4cf8-b0ea-3f8721f2e51f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th colspan=\"2\" halign=\"left\">x</th>\n",
" <th colspan=\"2\" halign=\"left\">y</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th>mean</th>\n",
" <th>count</th>\n",
" <th>min</th>\n",
" <th>max</th>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1024</th>\n",
" <th>George</th>\n",
" <td>-0.007272</td>\n",
" <td>911</td>\n",
" <td>-0.999357</td>\n",
" <td>0.999042</td>\n",
" </tr>\n",
" <tr>\n",
" <th>964</th>\n",
" <th>Jerry</th>\n",
" <td>0.030720</td>\n",
" <td>663</td>\n",
" <td>-0.999137</td>\n",
" <td>0.999412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1045</th>\n",
" <th>Alice</th>\n",
" <td>-0.020831</td>\n",
" <td>474</td>\n",
" <td>-0.999746</td>\n",
" <td>0.999464</td>\n",
" </tr>\n",
" <tr>\n",
" <th>943</th>\n",
" <th>Hannah</th>\n",
" <td>-0.003769</td>\n",
" <td>262</td>\n",
" <td>-0.999948</td>\n",
" <td>0.999229</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1051</th>\n",
" <th>Bob</th>\n",
" <td>0.031584</td>\n",
" <td>361</td>\n",
" <td>-0.991676</td>\n",
" <td>0.996783</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x y \n",
" mean count min max\n",
"id name \n",
"1024 George -0.007272 911 -0.999357 0.999042\n",
"964 Jerry 0.030720 663 -0.999137 0.999412\n",
"1045 Alice -0.020831 474 -0.999746 0.999464\n",
"943 Hannah -0.003769 262 -0.999948 0.999229\n",
"1051 Bob 0.031584 361 -0.991676 0.996783"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agg.head()"
]
},
{
"cell_type": "markdown",
"id": "48706f4e-7cf9-4f8e-9823-f9cc73111bd0",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"- Fill in API and sand down rough edges\n",
" - Testing / Benchmarking\n",
"- cuDF-backend support (mostly there)\n",
"- Partition statistics (in progress)\n",
"- Distributed protocol and annotations\n",
"- Adoption plan for Dask DataFrame\n",
"\n",
"\n",
"**Open Issues**: https://github.com/mrocklin/dask-expr/issues"
]
},
{
"cell_type": "markdown",
"id": "be4fdae2-2111-42a4-8ed6-27a7496fdf45",
"metadata": {},
"source": [
"### Hypothetical Up-Stream Adoption Plan?\n",
"\n",
"- Add distinct `dask.dataframe.expr.DataFrame`/`Series`/`Index`/`Scalar` collections upstream\n",
"- Allow users to \"opt in\" to `expr` collections at IO/creation time in `dask.dataframe`\n",
"- Automatically fall back from `expr` to legacy collections when an unsupported API is used"
]
},
{
"cell_type": "markdown",
"id": "8a6ec87f-cd84-4827-9a7d-6633834aa66c",
"metadata": {},
"source": [
"## Other Fun Features"
]
},
{
"cell_type": "markdown",
"id": "36e17f40-8790-49a6-81a5-1f1c1acf9c3b",
"metadata": {},
"source": [
"### Partition Statistics (PR \"in progress\")"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "928df0ac-0b8f-401e-ad65-32deb654df07",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1296000, 1296000)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df._lengths"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "607e780f-ddb3-4ef3-bbc9-64721b03b290",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1296000, 1296000)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = df[\"id\"] + 1\n",
"df2._lengths"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "8f23eb26-cae6-4ea6-90f7-49ede55d2d70",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"(1296000,)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.partitions[0]._lengths"
]
},
{
"cell_type": "markdown",
"id": "78a67f4f-ee85-4085-be37-077cb2371864",
"metadata": {},
"source": [
"### Shuffling"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "588f066d-f4af-4c8a-b11d-a7438e7e7583",
"metadata": {},
"outputs": [],
"source": [
"shuffled = df.shuffle(\"id\", backend=\"disk\")"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "745d92e9-d3ca-4adf-91a0-35e8bf656da8",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"245pt\" height=\"116pt\"\n",
" viewBox=\"0.00 0.00 245.00 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 241,-112 241,4 -4,4\"/>\n",
"<!-- 3798777802873758013 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>3798777802873758013</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"237,-108 0,-108 0,-72 237,-72 237,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"118.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Shuffle(ReadParquet, id)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"186,-36 51,-36 51,0 186,0 186,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"118.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;3798777802873758013 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;3798777802873758013</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M118.5,-36.3C118.5,-43.59 118.5,-52.27 118.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"115,-60.38 118.5,-70.38 122,-60.38 115,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2a6b88d00>"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shuffled.visualize()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "eccf4ca5-664d-4d8b-9234-d219235dbdeb",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"451pt\" height=\"260pt\"\n",
" viewBox=\"0.00 0.00 451.00 260.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-256 447,-256 447,4 -4,4\"/>\n",
"<!-- &#45;1846022950123296251 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>&#45;1846022950123296251</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"415.5,-252 27.5,-252 27.5,-216 415.5,-216 415.5,-252\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-229\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Projection(DiskShuffle, [&#39;name&#39;, &#39;id&#39;, &#39;x&#39;, &#39;y&#39;])</text>\n",
"</g>\n",
"<!-- &#45;841283058254229872 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;841283058254229872</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"443,-180 0,-180 0,-144 443,-144 443,-180\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-157\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">DiskShuffle(AssignPartitioningIndex, _partitions)</text>\n",
"</g>\n",
"<!-- &#45;841283058254229872&#45;&gt;&#45;1846022950123296251 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;841283058254229872&#45;&gt;&#45;1846022950123296251</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M221.5,-180.3C221.5,-187.59 221.5,-196.27 221.5,-204.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"218,-204.38 221.5,-214.38 225,-204.38 218,-204.38\"/>\n",
"</g>\n",
"<!-- &#45;8324597626502438306 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>&#45;8324597626502438306</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"400.5,-108 42.5,-108 42.5,-72 400.5,-72 400.5,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">AssignPartitioningIndex(ReadParquet)</text>\n",
"</g>\n",
"<!-- &#45;8324597626502438306&#45;&gt;&#45;841283058254229872 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>&#45;8324597626502438306&#45;&gt;&#45;841283058254229872</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M221.5,-108.3C221.5,-115.59 221.5,-124.27 221.5,-132.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"218,-132.38 221.5,-142.38 225,-132.38 218,-132.38\"/>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"289,-36 154,-36 154,0 289,0 289,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"221.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;8324597626502438306 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;8324597626502438306</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M221.5,-36.3C221.5,-43.59 221.5,-52.27 221.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"218,-60.38 221.5,-70.38 225,-60.38 218,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x16c6eaf20>"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shuffled.simplify().visualize()"
]
},
{
"cell_type": "markdown",
"id": "e27d45d9-58c3-47ad-b281-336341525063",
"metadata": {},
"source": [
"### Culling (Partition Filtering)\n",
"\n",
"Culling is no longer achieved as a distinct optimization pass. Intead operations like `head` will inject a `Partitions` operation, which will lead to culling through the following `simplify` actions:\n",
"\n",
"1. `Partitions` will be pushed down through all `Blockwise` expressions\n",
"2. Non-`Blockwise` expressions can \"absorb\" the `Partitions` operation by inheriting from a `PartitionsFiltered` mix-in class"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "73e4b4ef-2b26-465f-b083-74c5013e88fe",
"metadata": {},
"outputs": [],
"source": [
"head = (ts + 1)[\"x\"].head(compute=False)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "ced56426-de9e-4cca-b235-9367eff270ba",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Head:\n",
" Projection: columns='x'\n",
" Add: right=1\n",
" Timeseries: seed=1547633919\n"
]
}
],
"source": [
"head.pprint()"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "cd0c2be6-57ad-4ce6-9522-196624a63260",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Add: right=1\n",
" Projection: columns='x'\n",
" BlockwiseHead:\n",
" Timeseries: seed=1547633919 _partitions=[0]\n"
]
}
],
"source": [
"head.simplify().pprint()"
]
},
{
"cell_type": "markdown",
"id": "1dd6565c-3d8d-4dd5-b5e8-03b40fd2767a",
"metadata": {},
"source": [
"For this case, the task graph will be optimized and fused into a single task:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "733281d3-187e-484b-8551-c6a888bc4b01",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of tasks in ts: 30\n",
"Number of tasks in head: 1\n"
]
}
],
"source": [
"print(f\"Number of tasks in ts: {len(ts.optimize().dask)}\")\n",
"print(f\"Number of tasks in head: {len(head.optimize().dask)}\")"
]
},
{
"cell_type": "markdown",
"id": "756f0314-7152-4a36-9802-689c7beb628d",
"metadata": {},
"source": [
"### Repartition\n",
"\n",
"\n",
"**NOTE**: Currently supports repartitioning by partition count or divisions."
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "83b7c5d9-783e-411e-b445-3a524246364d",
"metadata": {},
"outputs": [],
"source": [
"repartitioned = df.repartition(npartitions=10)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "7223394b-6d16-4a32-b3de-40c9c433d472",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"253pt\" height=\"116pt\"\n",
" viewBox=\"0.00 0.00 253.00 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 249,-112 249,4 -4,4\"/>\n",
"<!-- 8177942315209403173 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>8177942315209403173</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"245,-108 0,-108 0,-72 245,-72 245,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"122.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">Repartition(ReadParquet)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"190,-36 55,-36 55,0 190,0 190,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"122.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;8177942315209403173 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;8177942315209403173</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M122.5,-36.3C122.5,-43.59 122.5,-52.27 122.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"119,-60.38 122.5,-70.38 126,-60.38 119,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x16c70ca30>"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"repartitioned.visualize()"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "82dfaf50-a16d-487d-af1b-71cc9d024aaf",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<!-- Generated by graphviz version 7.1.0 (0)\n",
" -->\n",
"<!-- Pages: 1 -->\n",
"<svg width=\"333pt\" height=\"116pt\"\n",
" viewBox=\"0.00 0.00 333.00 116.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-112 329,-112 329,4 -4,4\"/>\n",
"<!-- &#45;1607244477355053694 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>&#45;1607244477355053694</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"325,-108 0,-108 0,-72 325,-72 325,-108\"/>\n",
"<text text-anchor=\"middle\" x=\"162.5\" y=\"-85\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">RepartitionDivisions(ReadParquet)</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>&#45;8467414101651346718</title>\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"230,-36 95,-36 95,0 230,0 230,-36\"/>\n",
"<text text-anchor=\"middle\" x=\"162.5\" y=\"-13\" font-family=\"Helvetica,sans-Serif\" font-size=\"20.00\">ReadParquet</text>\n",
"</g>\n",
"<!-- &#45;8467414101651346718&#45;&gt;&#45;1607244477355053694 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>&#45;8467414101651346718&#45;&gt;&#45;1607244477355053694</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M162.5,-36.3C162.5,-43.59 162.5,-52.27 162.5,-60.46\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"159,-60.38 162.5,-70.38 166,-60.38 159,-60.38\"/>\n",
"</g>\n",
"</g>\n",
"</svg>\n"
],
"text/plain": [
"<graphviz.graphs.Digraph at 0x2a6b8a470>"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"repartitioned.simplify().visualize()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7827ba3-b2a4-4fad-8a0d-df914ef9fb04",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "24c3eb3e-f665-46b5-a03c-baadb5b0b347",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment