Last active
May 25, 2023 11:13
-
-
Save miriamspsantos/a821c79d355b8a74561203bdfed3bdc8 to your computer and use it in GitHub Desktop.
Dataset Overview of the Adult Census Dataset (Medium).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
# Load the data | |
df = pd.read_csv('data/adult.csv', na_values='?') | |
# Dataset Overview | |
df.head() # preview a sample | |
df.shape # number of observations and features | |
# (32561, 15) | |
df.dtypes # data types | |
#age int64 | |
#workclass object | |
#fnlwgt int64 | |
#education object | |
# (...) | |
df[df.duplicated()] # check duplicated rows | |
df.isna().sum() # missing values per feature | |
#age 0 | |
#workclass 1836 | |
#fnlwgt 0 | |
# (...) | |
df.isna().sum().sum() # number of missing cells | |
round(df.isna().sum().sum() / df.size * 100, 1) # percentage of missing cells |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment