Skip to content

Instantly share code, notes, and snippets.

@avriiil
Created January 18, 2022 10:48
Show Gist options
  • Save avriiil/a6154ad10e9413e6bffe67d3ecbe465c to your computer and use it in GitHub Desktop.
Save avriiil/a6154ad10e9413e6bffe67d3ecbe465c to your computer and use it in GitHub Desktop.
Gist to write large parquet files to S3 on M1 (avoid blosc issues)
# ...spin up cluster...connect Dask...etc.
# use client.submit() to write large parquet files to S3 (to avoid blosc issues on M1)
def submit_jobs():
from distributed import get_client
with get_client() as client:
large = dask.datasets.timeseries(start="2000", end="2015", freq="10s", partition_freq="1M")
large.to_parquet(
's3://coiled-datasets/dask-merge/large.parquet',
engine="fastparquet"
)
client.submit(submit_jobs).result()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment