Skip to content

Instantly share code, notes, and snippets.

@socratesk
Created October 11, 2018 16:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save socratesk/485e267b875b885024211b158301de99 to your computer and use it in GitHub Desktop.
Save socratesk/485e267b875b885024211b158301de99 to your computer and use it in GitHub Desktop.
# import libraries
import numpy as np
import pandas as pd
import eli5
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from eli5.sklearn import PermutationImportance
# load data file
train = pd.read_csv('50-Startups.csv')
# perform one-hot encoding for categorical variable
trainDummies = pd.get_dummies(train['State'], prefix = 'state')
# combine original and one-hot encoded dataframes together
train = pd.concat([train, trainDummies], axis=1)
# extract dependent (predictor) feature
y = train.Profit
# extract independent features from combined dataframe by removing unwanted features
X = train.drop(["Profit", "State", "state_New York"], axis=1)
# split train and validation datasets
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=5)
# create Random Forest Regressor model and fit dependent and independent features
rf_model = RandomForestRegressor(random_state=0).fit(train_X, train_y)
# compute Permutation Importance using Random Forest Regressor model on validation dataset
permImportance = PermutationImportance(rf_model, random_state=0).fit(val_X, val_y)
# print computer feature weights
eli5.show_weights(permImportance, feature_names = X.columns.tolist())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment