Skip to content

Instantly share code, notes, and snippets.

@sananand007
Last active May 31, 2018 15:41
Show Gist options
  • Save sananand007/23eedc96c0207fa63892ff3480ce6843 to your computer and use it in GitHub Desktop.
Save sananand007/23eedc96c0207fa63892ff3480ce6843 to your computer and use it in GitHub Desktop.
Kaggle_titanic_dataset [Medium-3]
# Turn into classes using pd.cut()
df_train_filt2['Fareclass']=pd.qcut(df_train_filt2['Fare'], 4, labels=[1,2,3,4])
df_train_filt2['Ageclass']=pd.qcut(df_train_filt2['Age'], 5, labels=[1,2,3,4,5])
df_test_filt1['Fareclass']=pd.qcut(df_test_filt1['Fare'], 4, labels=[1,2,3,4])
df_test_filt1['Ageclass']=pd.qcut(df_test_filt1['Age'], 5, labels=[1,2,3,4,5])
df_train_filt2.drop(['Fare','Age'], axis=1, inplace=True)
df_test_filt1.drop(['Fare','Age'], axis=1, inplace=True)
# Get the encoding done to get rid of string columns that you cannot train
df_train_filt2=pd.get_dummies(df_train_filt2, columns=\
['Sex','Embarked','Pclass','Ageclass','Fareclass'], drop_first=True)
df_test_filt1=pd.get_dummies(df_test_filt1, columns=\
['Sex','Embarked','Pclass','Ageclass','Fareclass'], drop_first=True)
# Get the Features and Labels
y=df_train_filt2['Survived']
X=df_train_filt2.iloc[:,2:] #taking all except the 1st two columns
Passenger_id = df_test_filt1['PassengerId']
df_test_filt1.drop(labels=['PassengerId'], inplace=True, axis=1)
# Train and test set splits
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=.20, random_state=1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment