Skip to content

Instantly share code, notes, and snippets.

@nagomiso
Created February 10, 2023 03:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nagomiso/036a98f49e0b4d117782d4ea15a640db to your computer and use it in GitHub Desktop.
Save nagomiso/036a98f49e0b4d117782d4ea15a640db to your computer and use it in GitHub Desktop.
Speed up loops of DataFrame.groupby()
# base
for _, g in df.groupby("foo_column"):
do_something(g)
# fast implementation
col2idx = {col: idx for idx, col in enumerate(df.columns)}
array = df.to_numpy()
_, indices = np.unique(array[:, col2idx["foo_column"]], return_index=True)
grouped_array = np.split(array, indices[1:])
for g in grouped_array:
# :warning: g is ndarray
do_something(g)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment