Instantly share code, notes, and snippets.
If your data fits in memory then you should almost always just use Pandas. Full groupby-applies like df.groupby(...).apply(func) are hard to do in parallel and require a full dataset shuffle. Dask (or any parallel library) should perform about as well under groupby-reductions for standard reductions like df.groupby(...).col.mean().
Sorry, something went wrong.
If I understand this correctly, you are comparing a pandas groupby to dask converting from pandas then doing a groupby.
Is this really a fair test?