Skip to content

Instantly share code, notes, and snippets.

@CMCDragonkai
Last active October 25, 2019 06:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CMCDragonkai/80c6a01bdae85709afb2178cc43d07a7 to your computer and use it in GitHub Desktop.
Save CMCDragonkai/80c6a01bdae85709afb2178cc43d07a7 to your computer and use it in GitHub Desktop.
Dask Ideas #python #dask

Dask Ideas

  • dd.Series is linearly partitioned into smaller pd.Series
  • dd.Series.apply and dd.Series.map are the same and executes a function on each element
  • dd.Series.map_partition maps a function to partitions of the series
  • dd.Series.reduction chunks each partition, then aggregates each partition
  • pd.Series is backed by a Numpy array with additional metadata

Series Reduction

When chunk returns a pd.Series. The aggregate will receive a pd.DataFrame. This DataFrame will have columns equal to the indexes of the Series structure. Visualize this as a transposition of the Series as a column vector to a row vector of the DataFrame. Remember this still happens if your chunk function is just the identity function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment