Skip to content

Instantly share code, notes, and snippets.

@patrickbrus
Created May 28, 2021 13:51
Show Gist options
  • Save patrickbrus/f9e8dd71715ad8506f1723ea0f6cf7f3 to your computer and use it in GitHub Desktop.
Save patrickbrus/f9e8dd71715ad8506f1723ea0f6cf7f3 to your computer and use it in GitHub Desktop.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# read in data from csv file
df = pd.read_csv(r"data\healthcare-dataset-stroke-data.csv")
print(df.head()) # helpful as first dive into data and features
# call df.describe() to get some statistics of numerical columns
df.describe()
# call df.info to get data types and count of null values per column
df.info()
# check unique values and drop columns only containing one unique value per row -> no learnings
for column in df.columns:
print(f"Column {column} contains {df[column].unique().shape[0]} unique values. { 100 * df[column].unique().shape[0] / df[column].shape[0]}% of total data. \n")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment