Skip to content

Instantly share code, notes, and snippets.

@mzaradzki
Last active July 3, 2017 12:08
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save mzaradzki/42fe0d1586516d631f5d7987e65aee75 to your computer and use it in GitHub Desktop.
for col in categories:
cs = dfX[col].value_counts(normalize=False, sort=True, ascending=False)
rare_values = [k for k in cs.keys() if cs[k]<40] # Theshold = 40 occurrences
if len(rare_values)>0:
print( 'Trim values : ', col, len(rare_values))
dfX.loc[dfX[col].isin(rare_values), col] = col+'_rare'
# Output :
# Trim values : funder 1730
# Trim values : installer 1982
# Trim values : wpt_name 37344
# Trim values : num_private 59
# Trim values : scheme_name 2565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment