Skip to content

Instantly share code, notes, and snippets.

@uberwach
Created October 27, 2015 13:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save uberwach/6c966b9f3524456f2831 to your computer and use it in GitHub Desktop.
Save uberwach/6c966b9f3524456f2831 to your computer and use it in GitHub Desktop.
Evaluation Lab
# Solution Exercise 2.2
virgin_idx = y == 2
y_virgin = np.zeros(len(y))
y_virgin[virgin_idx] = 1
from sklearn import grid_search
from sklearn.cross_validation import LeaveOneOut, StratifiedKFold
parameters = {'kernel':('linear', 'rbf'), 'C':[0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100]}
svc = SVC()
# a)
clf = grid_search.GridSearchCV(svc, parameters, scoring='accuracy', cv=StratifiedKFold(y_virgin, 2))
clf.fit(X, y_virgin)
best_clf_a = clf.best_estimator_
y_score = best_clf_a.decision_function(X)
fpr_a, tpr_a, _ = metrics.roc_curve(y_virgin, y_score, pos_label=1)
print 'Best Classifier (2-fold): {}'.format(best_clf_a)
print 'Best Accuracy (2-fold): {}'.format(clf.best_score_)
# b)
clf = grid_search.GridSearchCV(svc, parameters, scoring='accuracy', cv=LeaveOneOut(150))
clf.fit(X, y_virgin)
best_clf_b = clf.best_estimator_
y_score = best_clf_b.decision_function(X)
fpr_b, tpr_b, _ = metrics.roc_curve(y_virgin, y_score, pos_label=1)
plot_roc_curves([fpr_a, fpr_b], [tpr_a, tpr_b])
print 'Best Classifier (LOOCV): {}'.format(best_clf_b)
print 'Best Accuracy (LOOCV): {}'.format(clf.best_score_)
@uberwach
Copy link
Author

Note that it is a wiser idea to handle the kernel as a hyper-parameter and use 3-stage validation. This essentially means:

Split in X_train, X_holdout
Learn best hyper-parameters (i.e. the kernel) via CV on X_train only
Then train on X_train and test on X_holdout normal model parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment