Skip to content

Instantly share code, notes, and snippets.

View miriamspsantos's full-sized avatar
:octocat:

Miriam Seoane Santos miriamspsantos

:octocat:
View GitHub Profile
@miriamspsantos
miriamspsantos / adult_profiling_report.py
Created May 25, 2023 11:30
Profiling Report for the Adult Census Income Dataset (Medium)
# Make the necessary imports
import pandas as pd
from ydata_profiling import ProfileReport
# Load the data
df = pd.read_csv('data/adult.csv', na_values='?')
# Generate the report
profile = ProfileReport(df,title="Adult Census Profile")
@miriamspsantos
miriamspsantos / adult_categories.py
Created May 25, 2023 11:15
Adult Dataset: Number and frequency of existing categories (Medium).
cat_cols = ['workclass', 'education', 'education.num',
'marital.status', 'occupation', 'relationship', 'race',
'sex', 'native.country', 'income']
for col in cat_cols:
categories = df.groupby(col).size()
print(categories)
#workclass
#Federal-gov 960
@miriamspsantos
miriamspsantos / adult_dataset_overview.py
Last active May 25, 2023 11:13
Dataset Overview of the Adult Census Dataset (Medium).
import pandas as pd
# Load the data
df = pd.read_csv('data/adult.csv', na_values='?')
# Dataset Overview
df.head() # preview a sample
df.shape # number of observations and features
# (32561, 15)