Skip to content

Instantly share code, notes, and snippets.

@liannewriting
Last active January 23, 2020 16:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save liannewriting/7ca7900da6700d3e6c9855cc9ccd01a7 to your computer and use it in GitHub Desktop.
Save liannewriting/7ca7900da6700d3e6c9855cc9ccd01a7 to your computer and use it in GitHub Desktop.
data_cleaning_202001
num_rows = len(df.index)
low_information_cols = [] #
for col in df.columns:
cnts = df[col].value_counts(dropna=False)
top_pct = (cnts/num_rows).iloc[0]
if top_pct > 0.95:
low_information_cols.append(col)
print('{0}: {1:.5f}%'.format(col, top_pct*100))
print(cnts)
print()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment