Quick comparison between `pandas` and `dask` groupby functionality.
mrocklin commented Aug 3, 2015

Nice comparison.

If your data fits in memory then you should almost always just use Pandas. Full groupby-applies like df.groupby(...).apply(func) are hard to do in parallel and require a full dataset shuffle. Dask (or any parallel library) should perform about as well under groupby-reductions for standard reductions like df.groupby(...).col.mean().

spott commented Feb 8, 2018

If I understand this correctly, you are comparing a pandas groupby to dask converting from pandas then doing a groupby.

Is this really a fair test?

