Created
February 25, 2017 17:34
-
-
Save usmcamp0811/5bbffa7cdd9a4d2eee78a2ea3cd43ee5 to your computer and use it in GitHub Desktop.
This function takes a dataframe with N number of categorical variables and encodes them with the scikit label encoder. It will return the transformed dataframe and a dictionary with a label encoder for each text field so that an inverse transform can be done later.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def text_class_encoder(df): | |
dtypes = pd.DataFrame(df.dtypes) | |
text_cols = list(dtypes[dtypes.iloc[:,0] == 'object'].index) | |
label_encoder_dict = {} | |
for col in text_cols: | |
label_encoder_dict[col] = LabelEncoder() | |
label_encoder_dict[col].fit(df[col]) | |
df[col] = label_encoder_dict[col].transform(df[col]) | |
return df, label_encoder_dict |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment