Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
iris = load_iris()
df = pd.DataFrame(, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Factor(, iris.target_names)
train, test = df[df['is_train']==True], df[df['is_train']==False]
features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['species'])[features], y)
preds = iris.target_names[clf.predict(test[features])]
pd.crosstab(test['species'], preds, rownames=['actual'], colnames=['preds'])
Copy link

AlexMikhalev commented Jun 11, 2013

had to change line:
preds = iris.target_names[clf.predict(test[features]).astype(int)]

Copy link

nettrom commented Jan 31, 2014

Pandas' Factor no longer exists, instead it's called Categorical. Changing line 9 to the following works with Pandas 0.13:

df['species'] = pd.Categorical(, iris.target_names)

Copy link

shuckle16 commented Feb 12, 2016

agreed ^ Pandas.Factor doesn't work. As of 2/12/2016 it's
df['species'] = pd.Categorical.from_codes(, iris.target_names)

Copy link

formatkaka commented Aug 25, 2016

Please accept this fork as Factor is no longer supported.
rf_iris fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment