Skip to content

Instantly share code, notes, and snippets.

import os
import dask
import s3fs
dask.config.set({"num_workers": 2})
dask.config.set({"scheduler": "threads"})
fs = s3fs.S3FileSystem(anon=True)
paths = fs.glob("s3://ursa-labs-taxi-data/2009/**.parquet")
import contextlib
import os
import subprocess
import time
import s3fs
BUCKET = "ursa-labs-taxi-data"
KEY = "2009/01/data.parquet"
URL = f"s3://{BUCKET}/{KEY}"

Notes on High-Level-Graph-ication

The essential idea behind a high-level-graph is this: it's a lazy mapping which can produce low-level Dask task graphs on demand. Until these low-level tasks are produced (called "materialization"), they are a couple of advantages:

  1. They allow for higher level reasoning about graph structure, including optimizations that would be challenging or impossible once the graph is represented by many low-level tasks.
  2. They can be used to produce only the necessary keys for a full computation. That is, later operations like slicing can feed back into previous HLG layers and allow them to not produce tasks which won't be needed (called HLG culling). This can be a significant time and memory saving process.
  3. They can be much cheaper to serialize and communicate than low level task graphs.

However, HLG Layers have proven difficult to write. Broadly speaking, these difficulties have been for two reasons: algorithmic (specifically regarding culling) and serializability.

Cu

import io
import os
import re
import string
import zipfile
import altair
import altair_saver
import junitparser
import pandas
import time
import subprocess
import dask
from dask.distributed import Client, wait
# sp = subprocess.Popen(["viztracer", "-m", "distributed.cli.dask_scheduler"])
sp = subprocess.Popen(
["viztracer", "-m", "distributed.cli.dask_scheduler", "-o", "results.json"]
)
@ian-r-rose
ian-r-rose / ibis-postgis.ipynb
Last active June 12, 2019 19:43
Demonstration notebook using Ibis and PostGIS
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ian-r-rose
ian-r-rose / issue_10062.ipynb
Created October 30, 2015 02:01
Issue 10062
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.