Skip to content

Instantly share code, notes, and snippets.

@KonovalovaDS
Last active October 4, 2023 16:21
Show Gist options
  • Save KonovalovaDS/5be823198d0a159f9760f2194b8b38c6 to your computer and use it in GitHub Desktop.
Save KonovalovaDS/5be823198d0a159f9760f2194b8b38c6 to your computer and use it in GitHub Desktop.
Encoding categorical features
import category_encoders as ce
#Ordinal Encoding
ord_enc = ce.OrdinalEncoder()
df[['XXX', 'YYY', 'ZZZ']] = ord_enc.fit_transform(
df[['XXX', 'YYY', 'ZZZ']])
# OneHotEncoding
ohe_columns = ['XXX', 'YYY', 'ZZZ']
df = pd.get_dummies(df, columns = ohe_columns)
# Binary Encoding
bin_encoder = ce.BinaryEncoder(cols = ['XXX', 'YYY', 'ZZZ'])
df_bin = bin_encoder.fit_transform(df[['XXX', 'YYY', 'ZZZ']])
df = pd.concat([df, df_bin], axis = 1)
@KonovalovaDS
Copy link
Author

Use OneHotEncoding if number of unique characteristics is 15 - 20, not more than 20.
Use BinaryEncoding if number of unique characteristics is exceeding 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment