Create a gist now

Instantly share code, notes, and snippets.

Notes from reproducible machine learning discussion at ICML 2015 MLOSS workshop

Introductory quote:

"Machine learning people use hugely complex algorithms on trivially simple datasets. Biology does trivially simple algorithms on hugely complex datasets."

Concepts of reproducible science

  • Replicability

    Note: it may hinder science if it does not come with reusability because it pushes people to do all the same things

  • Reproducibility

    Reimplementation

  • Reusability: allows you to do things that the original creator did not have in mind

  • Education: replication by running code but unreadable code (probably going in the

The 2 last are more valuable for science, but require more work

Roadblocks

  • Man power

    We cannot achieve reusability and high quality for everything

  • Computing power

  • Data

    => solution creating a curated, tractable experiment: data + pb

  • Incentives pb

    What do we want from the way science moves forward (many incremental ideas in a conference)?

Solutions

  • Curated (maintained libraries)
  • Versioning of code and tags
  • Docker... but that's replication, not reuse
  • Changing incentives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment