Skip to content

Instantly share code, notes, and snippets.

@finlytics-hub
Created July 7, 2020 07:16
Show Gist options
  • Save finlytics-hub/aca814cf6175dbbc8111f6179da1de67 to your computer and use it in GitHub Desktop.
Save finlytics-hub/aca814cf6175dbbc8111f6179da1de67 to your computer and use it in GitHub Desktop.
Practical demonstration of SimpleImputer with CV and for loop
# import required libraries
from sklearn.ensemble import RandomForestClassifier # can be any classifier of your choice
from sklearn.impute import SimpleImputer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.pipeline import Pipeline
# define a list of all strategies to be evaluated
strategies = ['mean', 'median', 'most_frequent', 'constant']
# for loop to evaluate all the strategies
for s in strategies:
# define modeling pipeline. Replace fill_value parameter with what constant you want to use
pipeline = Pipeline(steps=[('i', SimpleImputer(strategy=s, fill_value = 0)), ('m', RandomForestClassifier())])
# define cross-validation criteria
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# fit and evaluate the model defined in pipeline with cross-validation as defined in cv
scores = cross_val_score(pipeline, X, y, scoring='accuracy', cv=cv)
# print the mean accuracy score
print('%s: %.3f' % (s, np.mean(scores)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment