Skip to content

Instantly share code, notes, and snippets.

@decorouz
Created October 31, 2023 23:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save decorouz/74f6d7c75c34a1057b6a5ddd8a8d2af1 to your computer and use it in GitHub Desktop.
Save decorouz/74f6d7c75c34a1057b6a5ddd8a8d2af1 to your computer and use it in GitHub Desktop.
Analyzing Missing Data in Pandas DataFrame: A Practical Example
import numpy as np
import pandas as pd
rng = np.random.default_rng(1)
# Create a values from normal distribution with mean 0 and variance 1
data = rng.standard_normal((127, 5))
missing = rng.choice([0, np.nan], p=[0.7, 0.3], size=data.shape) # 30% missing data
data += missing
# Create the DataFrame
df = pd.DataFrame(data, columns=['col1', 'col2', 'col3', 'col4', 'col5'])
# Print missing in each column
for col in df.columns:
template = f"Column '{col}' has {np.isnan(df[col]).mean():.2%} has missing values"
print(template)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment