Skip to content

Instantly share code, notes, and snippets.

@usmcamp0811
Created February 25, 2017 17:34
Show Gist options
  • Save usmcamp0811/5bbffa7cdd9a4d2eee78a2ea3cd43ee5 to your computer and use it in GitHub Desktop.
Save usmcamp0811/5bbffa7cdd9a4d2eee78a2ea3cd43ee5 to your computer and use it in GitHub Desktop.
This function takes a dataframe with N number of categorical variables and encodes them with the scikit label encoder. It will return the transformed dataframe and a dictionary with a label encoder for each text field so that an inverse transform can be done later.
def text_class_encoder(df):
dtypes = pd.DataFrame(df.dtypes)
text_cols = list(dtypes[dtypes.iloc[:,0] == 'object'].index)
label_encoder_dict = {}
for col in text_cols:
label_encoder_dict[col] = LabelEncoder()
label_encoder_dict[col].fit(df[col])
df[col] = label_encoder_dict[col].transform(df[col])
return df, label_encoder_dict
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment