Skip to content

Instantly share code, notes, and snippets.

@SaraM92
Created September 29, 2020 19:36
Show Gist options
  • Save SaraM92/b9479a28e1d9d233432ec108e2f35c12 to your computer and use it in GitHub Desktop.
Save SaraM92/b9479a28e1d9d233432ec108e2f35c12 to your computer and use it in GitHub Desktop.
#Import Pnadas to deal with datasets
import pandas as pd
#Dataset source
#https://www.kaggle.com/dipam7/student-grade-prediction
df = pd.read_csv('student-mat.csv')
df.head(3)
#student achieved 80% or higher as a final score
df['grade_A'] = np.where(df['G3']*5 >= 80, 1, 0)
#value of 1 if a student missed 10 or more classes
df['high_absenses'] = np.where(df['absences'] >= 10, 1, 0)
#drop all columns we don’t care about
df = df[['grade_A','high_absenses','count']]
df.head()
pd.pivot_table(
df,
values='count',
index=['grade_A'],
columns=['high_absenses'],
aggfunc=np.size,
fill_value=0
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment