Skip to content

Instantly share code, notes, and snippets.

@tommydangerous
Created June 11, 2021 05:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tommydangerous/ce8482b368350766ee0ae395a583bdd9 to your computer and use it in GitHub Desktop.
Save tommydangerous/ce8482b368350766ee0ae395a583bdd9 to your computer and use it in GitHub Desktop.
encode_values.py
from sklearn.preprocessing import OneHotEncoder
categorical_columns = ['Pclass', 'Sex', 'Embarked', 'cabin_letter']
categorical_encoder = OneHotEncoder(handle_unknown='ignore')
categorical_encoder.fit(df[categorical_columns])
# Add the new columns to the data
new_column_names = []
for idx, cat_column_name in enumerate(categorical_columns):
values = categorical_encoder.categories_[idx]
new_column_names += [f'{cat_column_name}_{value}' for value in values]
df.loc[:, new_column_names] = \
categorical_encoder.transform(df[categorical_columns]).toarray()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment