Skip to content

Instantly share code, notes, and snippets.

@mzaradzki
Created July 3, 2017 10:08
Show Gist options
  • Save mzaradzki/d0ddeba8b15b26608b78a6c42b0674c5 to your computer and use it in GitHub Desktop.
Save mzaradzki/d0ddeba8b15b26608b78a6c42b0674c5 to your computer and use it in GitHub Desktop.
for col in numerical_columns: # Check frequency of most common values
cs = dfX[col].value_counts(normalize=True, sort=True, ascending=False)
for k in cs.keys()[0:5]:
print( col, k, int(cs[k]*1000)/10. )
# Output :
# amount_tsh 0.0 70.0 # 70% of the record are 0
# amount_tsh 500.0 5.2
# amount_tsh 50.0 4.1
# amount_tsh 1000.0 2.5
# amount_tsh 20.0 2.4
# population 0 36.0 # 36% of the record are 0
# population 1 11.8 # 12% of the record are 1
# population 200 3.2
# population 150 3.1
# population 250 2.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment