Skip to content

Instantly share code, notes, and snippets.

@bthirion
Created January 3, 2016 08:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bthirion/834b78c274e7f411665d to your computer and use it in GitHub Desktop.
Save bthirion/834b78c274e7f411665d to your computer and use it in GitHub Desktop.
sklearn's gcv is actually formally correct, i.e. equivalent to standard mse in leave-one-out, BUT in a setting (fit_intercept=False, data pre-centered) that implies that the result is wrong, at least when n_features >= n_samples - 1
"""
Experiment on gcv to understand the issue:
It is indeed equivalent to leave-one-out selection with mean_squared error
but in a setting without fit_intercept, and with y and X initially centered.
In theory, this is correct.
Author: Bertrand Thirion, 2016
"""
import numpy as np
from sklearn import linear_model
from sklearn.cross_validation import LeaveOneOut
from sklearn.model_selection import GridSearchCV
from sklearn.utils.testing import assert_array_almost_equal
# set the parameters
alphas = np.logspace(-5, 5, 6)
n_samples, n_features = 10, 20
# generate the data
np.random.seed([1])
y, X = (np.random.randn(n_samples), np.random.randn(n_samples, n_features))
X, y, _, _, _ = linear_model.Ridge._center_data(X, y, True)
for n_features_ in [5, 20]:
X_ = X[:, :n_features_]
# sklearn's GCV
gcv = linear_model.RidgeCV(
alphas=alphas, store_cv_values=True, gcv_mode='svd',
scoring='mean_squared_error').fit(X_, y)
gs = GridSearchCV(linear_model.Ridge(fit_intercept=False),
param_grid={'alpha': alphas},
cv=LeaveOneOut(n_samples), scoring='mean_squared_error')
gs.fit(X_, y)
score_gs = np.array([x[1] for x in gs.grid_scores_])
score_gcv = - np.mean((gcv.cv_values_.T - y) ** 2, 1)
assert_array_almost_equal(score_gs, score_gcv)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment