Skip to content

Instantly share code, notes, and snippets.

@karpanGit
Created November 3, 2019 19:23
Show Gist options
  • Save karpanGit/5bb00134deb5643c7d72a0d38074fe9b to your computer and use it in GitHub Desktop.
Save karpanGit/5bb00134deb5643c7d72a0d38074fe9b to your computer and use it in GitHub Desktop.
pandas: timing different aggregation implementations
# time built in and custom written aggregation functions
import pandas as pd
import numpy as np
N = 1000000
df = pd.DataFrame({'a': np.random.randn(N), 'key1':['a']*int(N/2)+['b']*int(N/2)})
def aggrTest1():
res = df.groupby('key1').sum()
def aggrTest2():
res = df.groupby('key1').agg(lambda x: x.sum())
def aggrTest3():
res = df.groupby('key1').agg(lambda x: sum(x.to_list()))
# %timeit aggrTest1()
# 44.3 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# %timeit aggrTest2()
# 72.6 ms ± 2.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# %timeit aggrTest3()
# 101 ms ± 6.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment