Skip to content

Instantly share code, notes, and snippets.

@ogyalcin
Created August 2, 2018 11:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ogyalcin/68e26b719ef73409c7b6f002b333f3f4 to your computer and use it in GitHub Desktop.
Save ogyalcin/68e26b719ef73409c7b6f002b333f3f4 to your computer and use it in GitHub Desktop.
Preparing the Train Dataset
train['Age'].fillna(train['Age'].median(),inplace=True) # Imputing Missing Age Values
train['Embarked'].fillna(train['Embarked'].value_counts().index[0], inplace=True) # Imputing Missing Embarked Values
d = {1:'1st',2:'2nd',3:'3rd'} #Creating a dictionary to convert Passenger Class from 1,2,3 to 1st,2nd,3rd.
train['Pclass'] = train['Pclass'].map(d) #Mapping the column based on the dictionary
train.drop(['PassengerId','Name','Ticket','Cabin'], 1, inplace=True) # Dropping Unnecessary Columns
categorical_vars = train[['Pclass','Sex','Embarked']] # Getting Dummies of Categorical Variables
dummies = pd.get_dummies(categorical_vars,drop_first=True)
train = train.drop(['Pclass','Sex','Embarked'],axis=1) #Dropping the Original Categorical Variables to avoid duplicates
train = pd.concat([train,dummies],axis=1) #Now, concat the new dummy variables
train.head() #Check the clean version of the train data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment