Skip to content

Instantly share code, notes, and snippets.

@zyxue
Last active September 27, 2017 15:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zyxue/b4d6837cf890f78cf818d0becc939ce8 to your computer and use it in GitHub Desktop.
Save zyxue/b4d6837cf890f78cf818d0becc939ce8 to your computer and use it in GitHub Desktop.
demonstration of sklearn GridSearchCV spawning multiple threads on linux
# related SF question: https://stackoverflow.com/questions/46351157/why-gridsearchcv-in-scikit-learn-spawn-so-many-threads
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
Cs = 10 ** np.arange(-2, 2, 0.1)
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
clf = LogisticRegression()
gs = GridSearchCV(
clf,
param_grid={'C': Cs, 'penalty': ['l1'],
'tol': [1e-10], 'solver': ['liblinear']},
cv=skf,
scoring='neg_log_loss',
n_jobs=5,
verbose=1,
refit=True)
N = 500000
Xs = np.concatenate([
np.random.random(N),
3 + np.random.random(N)
]).reshape(-1, 1)
ys = np.concatenate([
np.zeros(N),
np.ones(N)
])
gs.fit(Xs, ys)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment