-
-
Save uberwach/6c966b9f3524456f2831 to your computer and use it in GitHub Desktop.
Evaluation Lab
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Solution Exercise 2.2 | |
virgin_idx = y == 2 | |
y_virgin = np.zeros(len(y)) | |
y_virgin[virgin_idx] = 1 | |
from sklearn import grid_search | |
from sklearn.cross_validation import LeaveOneOut, StratifiedKFold | |
parameters = {'kernel':('linear', 'rbf'), 'C':[0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100]} | |
svc = SVC() | |
# a) | |
clf = grid_search.GridSearchCV(svc, parameters, scoring='accuracy', cv=StratifiedKFold(y_virgin, 2)) | |
clf.fit(X, y_virgin) | |
best_clf_a = clf.best_estimator_ | |
y_score = best_clf_a.decision_function(X) | |
fpr_a, tpr_a, _ = metrics.roc_curve(y_virgin, y_score, pos_label=1) | |
print 'Best Classifier (2-fold): {}'.format(best_clf_a) | |
print 'Best Accuracy (2-fold): {}'.format(clf.best_score_) | |
# b) | |
clf = grid_search.GridSearchCV(svc, parameters, scoring='accuracy', cv=LeaveOneOut(150)) | |
clf.fit(X, y_virgin) | |
best_clf_b = clf.best_estimator_ | |
y_score = best_clf_b.decision_function(X) | |
fpr_b, tpr_b, _ = metrics.roc_curve(y_virgin, y_score, pos_label=1) | |
plot_roc_curves([fpr_a, fpr_b], [tpr_a, tpr_b]) | |
print 'Best Classifier (LOOCV): {}'.format(best_clf_b) | |
print 'Best Accuracy (LOOCV): {}'.format(clf.best_score_) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Note that it is a wiser idea to handle the kernel as a hyper-parameter and use 3-stage validation. This essentially means:
Split in X_train, X_holdout
Learn best hyper-parameters (i.e. the kernel) via CV on X_train only
Then train on X_train and test on X_holdout normal model parameters.