Skip to content

Instantly share code, notes, and snippets.

@Susensio
Created December 19, 2022 18:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Susensio/783b3f4ad38a60696ca75ddfb45779c2 to your computer and use it in GitHub Desktop.
Save Susensio/783b3f4ad38a60696ca75ddfb45779c2 to your computer and use it in GitHub Desktop.
Pandas groupby and get most_frequent
# Default way of handling groupby and mostfrequent is slow
df.groupby(groupby_column)[null_column].agg(lambda x: x.iat[0])
# if there are nan's in df:
df.groupby(groupby_column)[null_column].agg(lambda x: x.iat[0] if not x.isnull().all() else np.nan)
# faster way: use value_counts and keep first value
(df
.groupby(groupby_column)[null_column]
.value_counts(sort=True, dropna=False)
.reset_index(name='Counts')
.drop_duplicates(subset=groupby_column, keep='first')
.set_index(groupby_column)
.drop(columns='Counts')
.squeeze()
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment