Skip to content

Instantly share code, notes, and snippets.

@jvns
Last active August 29, 2015 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jvns/34ff2a21335159addd9c to your computer and use it in GitHub Desktop.
Save jvns/34ff2a21335159addd9c to your computer and use it in GitHub Desktop.

machine learning mistakes

This isn't an academic machine learning conference. This isn't a business conference. Nobody will tell you "Big data is at the foundation of all the megatrends happening today". This is a conference for people who work with data all day and have stories to tell about it.

This is a conference where we talk about what you do when you have a dataset and a business problem you're trying to solve and

  • you're not sure how to map your actual business problem to a metric
  • and you have lots of data but 80% of it is unlabelled and how should you handle that
  • and you start with a simple model and it does pretty well but could it do even better? and you're not sure.

We'll talk about what goes wrong when doing machine learning instead of how smart we all are and how awesome all of our algorithms are.

We want to hear your stories! Tell us:

  • An in-depth explanation of how you chose how to evaluate a model.
  • a model that you thought was good, but turned out to have an important mistake
  • when you've found it's appropriate to use extremely simple models, and when something more complicated has been appropriate
  • a simple method that works much better than you expected it to
  • a problem that you thought you could use machine learning to solve, but then it really didn't work
  • anything that took you five years to learn.
  • small data! medium data!
  • if you actually pretty much always use decision trees / logistic regression
  • missing data and messy data
  • a project that was a total failure
  • your favorite graphs to look at. Lots of graphs.
  • if you think carefully about the statistical assumptions your model makes, or if you treat it as a black box. (and why!)
  • a project you worked on for 6 months and it was a disaster

Let's learn from some of each others' mistakes so we don't have to make them all on our own. Let's talk about how ML is messy, and about the compromises we make, and how to end up with a reasonable approach.

what I'd like to know:

does this conference already exist? can I go to it? =D

@brainwane
Copy link

BTW what proportion of the projects people want to share case studies about are under some kind of super onerous nondisclosure agreement? :(

Would a bunch of workshops working through The Bad Data Handbook-type exercises be along the lines of sessions you'd want at this convening?

@brainwane
Copy link

(And The Command Line Murders: Teaching the Terminal with a Detective Noir concentrates on small/medium/missing/messy data, by the way.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment