Created
February 10, 2023 03:56
-
-
Save nagomiso/036a98f49e0b4d117782d4ea15a640db to your computer and use it in GitHub Desktop.
Speed up loops of DataFrame.groupby()
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# base | |
for _, g in df.groupby("foo_column"): | |
do_something(g) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# fast implementation | |
col2idx = {col: idx for idx, col in enumerate(df.columns)} | |
array = df.to_numpy() | |
_, indices = np.unique(array[:, col2idx["foo_column"]], return_index=True) | |
grouped_array = np.split(array, indices[1:]) | |
for g in grouped_array: | |
# :warning: g is ndarray | |
do_something(g) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment