Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@GaelVaroquaux
Last active June 21, 2016 03:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GaelVaroquaux/75c4a06140dbb4e0b9da40f06e362d44 to your computer and use it in GitHub Desktop.
Save GaelVaroquaux/75c4a06140dbb4e0b9da40f06e362d44 to your computer and use it in GitHub Desktop.
Notes from Pydata Paris discussion on scikit-learn

Notes on scikit-learn round table

Q:What possible additions to scikit-learn are important to you?

  • Xavier Dupré (Microsoft): keep .fit in the API but X can be a stream from Spark for example. Transparent for the user. Gaël: indexable and len is n_samples, is that good enough? Answer: X accessible through sequential iterator.
  • Jean-François Puget (IBM): IBM is betting on Spark at the scale of the company. Most machine learning applications have small data but some don't. How can the bridge b/w scikit-learn and Spark get better? How to get scikit-learn used in a distributed environment? Not all algorithms can work out-of-core, need distributed algorithm.
  • Jean-Paul Smet (Nexedi): Nexedi is an example company. Wendelin.core helps us removing the overhead, and enabling out of core computing. Next step of the story in a year.

Q:How do we make scikit-learn sustainable?

  • Fabian Mangeant (Airbus): Big industrial companies like Airbus need a few years of visibility. They can help funding, but in exchange for stability and warranties on long term support of some releases.
  • Jean Noel Puget (IBM): Some companies (maybe IBM) are interested in having scikit-learn scale better on clouds and distributed systems. They might contribute expertise and platforms for experimentation.
  • Jean-Paul Smet (Nexedi): It seems that you have a hiring problem. Do you need money, or people?

Notes taken by Loic Esteve and Gaël Varoquaux. We hope that we haven't distorted too much what was said, or forgotten anything important.

@datnamer
Copy link

Have you seen this?

Adapts spark to ooc and distributed.

https://github.com/jcrist/dask-learn

@amueller
Copy link

I think we wouldn't have a hiring problem if we had enough money.

@nabilm
Copy link

nabilm commented Jun 21, 2016

I assume you already saw this :
https://github.com/databricks/spark-sklearn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment