Skip to content

Instantly share code, notes, and snippets.

@liannewriting
Last active January 27, 2020 16:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save liannewriting/ba0d16215a7ab81520b9238778862436 to your computer and use it in GitHub Desktop.
Save liannewriting/ba0d16215a7ab81520b9238778862436 to your computer and use it in GitHub Desktop.
data_cleaning_202001
# first create missing indicator for features with missing data
for col in df.columns:
missing = df[col].isnull()
num_missing = np.sum(missing)
if num_missing > 0:
print('created missing indicator for: {}'.format(col))
df['{}_ismissing'.format(col)] = missing
# then based on the indicator, plot the histogram of missing values
ismissing_cols = [col for col in df.columns if 'ismissing' in col]
df['num_missing'] = df[ismissing_cols].sum(axis=1)
df['num_missing'].value_counts().reset_index().sort_values(by='index').plot.bar(x='index', y='num_missing')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment