Skip to content

Instantly share code, notes, and snippets.

@MarcoGorelli
Last active June 17, 2024 08:48
Show Gist options
  • Save MarcoGorelli/1da1971063caf0b3e5133f5dfba3315b to your computer and use it in GitHub Desktop.
Save MarcoGorelli/1da1971063caf0b3e5133f5dfba3315b to your computer and use it in GitHub Desktop.
add_lags timing
import polars as pl
import numpy as np
from sklego.pandas_utils import add_lags
rng = np.random.default_rng(1)
N = 10_000_000
a = rng.integers(0, 10, size=N)
b = rng.integers(0, 10, size=N)
c = rng.integers(0, 10, size=N)
df = pl.DataFrame({'a': a, 'b': b, 'c': c})
In [2]:
...: results = %timeit -o add_lags(df, ['a', 'b', 'c'], [1,2,3,4,5])
...: results.best
10.8 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Out[2]: 0.009629685399995652
In [3]: results = %timeit -o add_lags(df.to_pandas(), ['a', 'b', 'c'], [1,2,3,4,5])
...: results.best
...:
1.47 s ± 70.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Out[3]: 1.375590296000155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment