Last active
October 4, 2023 16:21
-
-
Save KonovalovaDS/5be823198d0a159f9760f2194b8b38c6 to your computer and use it in GitHub Desktop.
Encoding categorical features
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import category_encoders as ce | |
#Ordinal Encoding | |
ord_enc = ce.OrdinalEncoder() | |
df[['XXX', 'YYY', 'ZZZ']] = ord_enc.fit_transform( | |
df[['XXX', 'YYY', 'ZZZ']]) | |
# OneHotEncoding | |
ohe_columns = ['XXX', 'YYY', 'ZZZ'] | |
df = pd.get_dummies(df, columns = ohe_columns) | |
# Binary Encoding | |
bin_encoder = ce.BinaryEncoder(cols = ['XXX', 'YYY', 'ZZZ']) | |
df_bin = bin_encoder.fit_transform(df[['XXX', 'YYY', 'ZZZ']]) | |
df = pd.concat([df, df_bin], axis = 1) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Use OneHotEncoding if number of unique characteristics is 15 - 20, not more than 20.
Use BinaryEncoding if number of unique characteristics is exceeding 20.