Skip to content

Instantly share code, notes, and snippets.

@BalazsHoranyi
Created May 14, 2018 20:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BalazsHoranyi/57156d25668a410e0ea0fb90750b6afe to your computer and use it in GitHub Desktop.
Save BalazsHoranyi/57156d25668a410e0ea0fb90750b6afe to your computer and use it in GitHub Desktop.
def to_dask_array(df):
# https://stackoverflow.com/questions/37444943/dask-array-from-dataframe?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
partitions = df.to_delayed()
shapes = [part.values.shape for part in partitions]
dtypes = partitions[0].dtypes
results = compute(dtypes, *shapes) # trigger computation to find shape
dtypes, shapes = results[0], results[1:]
chunks = [da.from_delayed(part.values, shape, dtypes)
for part, shape in zip(partitions, shapes)]
return da.concatenate(chunks, axis=0)
interactions = to_dask_array(df[['user_id', 'repo_id', 'created_at']])
da.to_npy_stack('interactions', interactions)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment