Instantly share code, notes, and snippets.

Embed
What would you like to do?
title author date
Convert Pandas Categorical Data For SciKit-Learn
Damian Mingle
06/08/2018

Preliminaries

# Bring in libraries
from sklearn import preprocessing
import pandas as pd

Construct a DataFrame

# Create the data
raw_data = {'clinical_trial': [1, 2, 1, 2, 2],
            'observation': [1, 2, 3, 1, 1],
            'protocol': [0, 1, 0, 1, 0],
            'outcome': ['excellent', 'poor', 'normal', 'poor', 'excellent']}

# Fill the DataFrame
df = pd.DataFrame(raw_data, columns = ['clinical_trial', 'observation', 'protocol', 'outcome'])

Fit The Label Encoder

# Create a label encoder object 
le = preprocessing.LabelEncoder()

# Fit the encoder object (le) to a pandas field with categorical data
le.fit(df['outcome'])
LabelEncoder()

View The Labels

# Display labels
list(le.classes_)
['excellent', 'normal', 'poor']

Transform Categories Into Integers

# Apply the label encoder object to a pandas column
le.transform(df['outcome']) 
array([0, 2, 1, 2, 0], dtype=int64)

Transform Integers Into Categories

# Reverse numerical values into categorical names
list(le.inverse_transform([2, 0, 2]))
['poor', 'excellent', 'poor']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment