Skip to content

Instantly share code, notes, and snippets.

@dradecic
Created September 1, 2019 15:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dradecic/2b6c1d81e6089cf6022b36f82b460f4b to your computer and use it in GitHub Desktop.
Save dradecic/2b6c1d81e6089cf6022b36f82b460f4b to your computer and use it in GitHub Desktop.
rfecv_2_cleaning
data.drop(['Ticket', 'PassengerId'], axis=1, inplace=True)
gender_mapper = {'male': 0, 'female': 1}
data['Sex'].replace(gender_mapper, inplace=True)
data['Title'] = data['Name'].apply(lambda x: x.split(',')[1].strip().split(' ')[0])
data['Title'] = [0 if x in ['Mr.', 'Miss.', 'Mrs.'] else 1 for x in data['Title']]
data = data.rename(columns={'Title': 'Title_Unusual'})
data.drop('Name', axis=1, inplace=True)
data['Cabin_Known'] = [0 if str(x) == 'nan' else 1 for x in data['Cabin']]
data.drop('Cabin', axis=1, inplace=True)
emb_dummies = pd.get_dummies(data['Embarked'], drop_first=True, prefix='Embarked')
data = pd.concat([data, emb_dummies], axis=1)
data.drop('Embarked', axis=1, inplace=True)
data['Age'] = data['Age'].fillna(int(data['Age'].mean()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment