Skip to content

Instantly share code, notes, and snippets.

@SehgalDivij
Created December 26, 2017 18:15
Show Gist options
  • Save SehgalDivij/72e20fb838c4cfbd81615283ac2636b9 to your computer and use it in GitHub Desktop.
Save SehgalDivij/72e20fb838c4cfbd81615283ac2636b9 to your computer and use it in GitHub Desktop.
Custom Encoder to convert Encode multilple columns of a dataset at once
"""
This snippet is not mine. It has simply been taken from a StackOverflow Answer by PriceHardman in the following link.
(Scroll one answer up from the answer this link takes you to)
https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn#31939145
To future me and anyone who comes across this:
- Run this on the entire dataset before splitting the dataset for consistency in the encodings.
"""
from sklearn.preprocessing import LabelEncoder
class MultiColumnLabelEncoder:
def __init__(self,columns = None):
self.columns = columns # array of column names to encode
def fit(self,X,y=None):
return self # not relevant here
def transform(self,X):
'''
Transforms columns of X specified in self.columns using
LabelEncoder(). If no columns specified, transforms all
columns in X.
'''
output = X.copy()
if self.columns is not None:
for col in self.columns:
output[col] = LabelEncoder().fit_transform(output[col])
else:
for colname,col in output.iteritems():
output[colname] = LabelEncoder().fit_transform(col)
return output
def fit_transform(self,X,y=None):
return self.fit(X,y).transform(X)
@Wbec
Copy link

Wbec commented Sep 18, 2018

If you use the link https://stackoverflow.com/a/30267328/ you will be directed straight to the answer, no scrolling needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment