Skip to content

Instantly share code, notes, and snippets.

@ricgu8086
Created December 4, 2021 19:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ricgu8086/3af876213a8eb3880bd75f09dd1ea4b5 to your computer and use it in GitHub Desktop.
Save ricgu8086/3af876213a8eb3880bd75f09dd1ea4b5 to your computer and use it in GitHub Desktop.
def group_others(serie: pd.Series,
min_threshold: int) -> pd.Series:
"""
This function finds categorical values with little representation
and group them under the category "OTHERS" to mitigate the curse
of dimensionality, thus avoiding overfitting
"""
condition = (serie.value_counts() < min_threshold).values
other_group = list(serie.value_counts()[condition].index)
return serie.replace(other_group, "OTHERS")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment