Skip to content

Instantly share code, notes, and snippets.

@EdMan1022
Created September 1, 2017 21:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EdMan1022/586a32ac5ecc4ee38a105ebdc8de1ae5 to your computer and use it in GitHub Desktop.
Save EdMan1022/586a32ac5ecc4ee38a105ebdc8de1ae5 to your computer and use it in GitHub Desktop.
Create a dataframe with null variable columns indicating indices with null values for each column
import pandas as pd
import numpy as np
df = pd.DataFrame(data=[[3., 1., np.NaN], [np.NaN, 3., 2.], [4., 1., 3.]], index=[0, 1, 2],
columns=['apple', 'carrot', 'pear'])
def null_only_categorical_func(data):
"""
Take a series, and return values of 1. where the series is null
:param data: (pandas Series) input column
:return: (pandas Series) column of 1s for indices where data is null
"""
cat_column = pd.Series(0., data.index)
cat_column[(data.isnull())] = 1.
return cat_column
cat_df = df.apply(null_only_categorical_func, axis=0)
trimmed_cat_df = cat_df.drop(cat_df.columns[cat_df.sum() == 0.], axis=1)
trimmed_cat_df.columns = trimmed_cat_df.columns + '='
output = pd.concat([df.fillna(0.), trimmed_cat_df], axis=1).sort_index(axis=1)
print(output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment