Created
January 30, 2012 16:18
-
-
Save amueller/1705235 to your computer and use it in GitHub Desktop.
Scikit-learn rocks the cluster!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from IPython.parallel import Client | |
from sklearn.grid_search import GridSearchCV | |
from sklearn.cross_validation import KFold | |
from sklearn.svm import SVC | |
from sklearn import datasets | |
from sklearn.preprocessing import Scaler | |
from sklearn.utils import shuffle | |
digits = datasets.fetch_mldata("MNIST original") | |
X, y = digits.data, digits.target | |
X, y = shuffle(X, y) | |
X = Scaler().fit_transform(X) | |
params = dict(C=10. ** np.arange(-3, 3), gamma=10. ** np.arange(-3, 3)) | |
rc = Client(profile='sge') | |
view = rc.load_balanced_view() | |
grid = GridSearchCV(SVC(), param_grid=params, cv=KFold(len(y), 4), n_jobs=view) | |
grid.fit(X, y) | |
print(grid.grid_scores_) |
Actually you could directly do a hasattr(workers, "map_sync")
so that you would not need to do view.block = True
before passing the view to the GridSearchCV
.
Both good points :)
Didn't know about the "map_sync".
The thing about renaming "n_jobs" is that this name is hardcoded into sklearn. And I didn't have to touch sklearn to do this (yet) ;)
Which version of sklearn are you using? I tried running a similar example and got the following error:
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py in _repopulate_pool(self)
186 for use after reaping workers which have exited.
187 """
--> 188 for i in range(self._processes - len(self._pool)):
189 w = self.Process(target=worker,
190 args=(self._inqueue, self._outqueue,
TypeError: unsupported operand type(s) for -: 'LoadBalancedView' and 'int'
Just as @arnaudsj, I would find it very interesting to find out how to do that with the current version of scikit-learn.
Any news on this?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I really this
n_jobs
should be renamedworkers
(with backward compat). Also the implementation could do a ahasattr(workers, "map")
in addition tohasattr(workers, "__call__")
so that you could be able to pass the view directly.