Skip to content

Instantly share code, notes, and snippets.

@mzaradzki
Last active July 3, 2017 12:08
Embed
What would you like to do?
for col in categories:
cs = dfX[col].value_counts(normalize=False, sort=True, ascending=False)
rare_values = [k for k in cs.keys() if cs[k]<40] # Theshold = 40 occurrences
if len(rare_values)>0:
print( 'Trim values : ', col, len(rare_values))
dfX.loc[dfX[col].isin(rare_values), col] = col+'_rare'
# Output :
# Trim values : funder 1730
# Trim values : installer 1982
# Trim values : wpt_name 37344
# Trim values : num_private 59
# Trim values : scheme_name 2565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment