Created
October 31, 2023 23:54
-
-
Save decorouz/74f6d7c75c34a1057b6a5ddd8a8d2af1 to your computer and use it in GitHub Desktop.
Analyzing Missing Data in Pandas DataFrame: A Practical Example
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
rng = np.random.default_rng(1) | |
# Create a values from normal distribution with mean 0 and variance 1 | |
data = rng.standard_normal((127, 5)) | |
missing = rng.choice([0, np.nan], p=[0.7, 0.3], size=data.shape) # 30% missing data | |
data += missing | |
# Create the DataFrame | |
df = pd.DataFrame(data, columns=['col1', 'col2', 'col3', 'col4', 'col5']) | |
# Print missing in each column | |
for col in df.columns: | |
template = f"Column '{col}' has {np.isnan(df[col]).mean():.2%} has missing values" | |
print(template) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment