Skip to content

Instantly share code, notes, and snippets.

@amueller
Created January 30, 2012 16:18
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amueller/1705235 to your computer and use it in GitHub Desktop.
Save amueller/1705235 to your computer and use it in GitHub Desktop.
Scikit-learn rocks the cluster!
import numpy as np
from IPython.parallel import Client
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.svm import SVC
from sklearn import datasets
from sklearn.preprocessing import Scaler
from sklearn.utils import shuffle
digits = datasets.fetch_mldata("MNIST original")
X, y = digits.data, digits.target
X, y = shuffle(X, y)
X = Scaler().fit_transform(X)
params = dict(C=10. ** np.arange(-3, 3), gamma=10. ** np.arange(-3, 3))
rc = Client(profile='sge')
view = rc.load_balanced_view()
grid = GridSearchCV(SVC(), param_grid=params, cv=KFold(len(y), 4), n_jobs=view)
grid.fit(X, y)
print(grid.grid_scores_)
@ogrisel
Copy link

ogrisel commented Jan 30, 2012

I really this n_jobs should be renamed workers (with backward compat). Also the implementation could do a a hasattr(workers, "map") in addition to hasattr(workers, "__call__") so that you could be able to pass the view directly.

@ogrisel
Copy link

ogrisel commented Jan 30, 2012

Actually you could directly do a hasattr(workers, "map_sync") so that you would not need to do view.block = True before passing the view to the GridSearchCV.

@amueller
Copy link
Author

Both good points :)
Didn't know about the "map_sync".

@amueller
Copy link
Author

The thing about renaming "n_jobs" is that this name is hardcoded into sklearn. And I didn't have to touch sklearn to do this (yet) ;)

@arnaudsj
Copy link

arnaudsj commented Dec 3, 2012

Which version of sklearn are you using? I tried running a similar example and got the following error:

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py in _repopulate_pool(self)
    186         for use after reaping workers which have exited.
    187         """
--> 188         for i in range(self._processes - len(self._pool)):
    189             w = self.Process(target=worker,
    190                              args=(self._inqueue, self._outqueue,

TypeError: unsupported operand type(s) for -: 'LoadBalancedView' and 'int'

@sstugk
Copy link

sstugk commented Mar 21, 2015

Just as @arnaudsj, I would find it very interesting to find out how to do that with the current version of scikit-learn.

@ilemhadri
Copy link

Any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment