Skip to content

Instantly share code, notes, and snippets.

@philipcmonk
Created September 8, 2017 18:47
Show Gist options
  • Save philipcmonk/8e4b955a92a914c77cad29e0a53a70eb to your computer and use it in GitHub Desktop.
Save philipcmonk/8e4b955a92a914c77cad29e0a53a70eb to your computer and use it in GitHub Desktop.

Today, Numerai is open sourcing our originality, concordance, and consistency criteria. Their code may be found here.

In Numerai's tournament, in order for a submission to be eligible for payout, it must be original, concordant, and consistent. They are described on the rules page:

Originality is a measure of whether a set of predictions is uncorrelated with predictions already submitted. Numerai wants to encourage new models over duplicate submissions.

Concordance is a measure of whether predictions on the validation set, test set, and live set appear to be generated by the same model. A data scientist who submits perfect answers on the validation set is unlikely to achieve concordance.

Consistency measures the percentage of eras in which a model achieves a logloss < -ln(0.5). Numerai wants models that work well consistently across eras. Only models with consistency above 75% are considered consistent.

Rationale

We are open sourcing this for four reasons:

  • We believe transparency is critical to trust. You don't have to take our word that our vague descriptions of our checks are accurate -- you can check the code yourself. It's important that our data scientists understand what they're being graded on.
  • If data scientists understand these checks, they will better be able to create predictions that don't fall afoul of them, and are therefore more useful to Numerai. In general, we would like data scientists to be able to focus more on improving their models and less on passing these checks.
  • Our algorithms are imperfect, and sometimes gameable. We would like to improve them so that they are simultaneously more effective at filtering bad submissions and more permissive of genuinely good submissions.
  • Our code isn't as clean or performant as we would like. For example, a pain point for many of our users is that the originality score takes a long time to compute.

One objection would be that botters may be able to game these checks since they can read the source code. However, security through obscurity isn't effective beyond a certain scale. Our tournament attracts sufficient interest that, if there's a way to game it, they will put in significant resources to do so. We believe the improvements gained by open sourcing the code do will more to thwart bad actors than any advantage they may gain from being able to analyze the source code.

Helping out

The number one reason why organizations don't open source code they don't want to keep secret is that most internal code isn't "fit for public eyes". They delay until they can clean it up, document it, and make it look respectable, which often never happens.

We've chosen instead to be very vulnerable and release it with essentially zero changes. We're not claiming the code is perfect, or even very great. It has a lot of room for cleanup, documentation, and improvement. If you see something you don't like, let us know by creating issues in the repository. If you can fix it (or someone else's issue), be bold, and send in pull requests.

Numerai is built on the idea that a large group of disparate data scientists will produce better results than any small team, so we believe that you can make these checks great.

We also believe in compensating the data scientists who improve tournament results. As a result, we will provide bounties for some tasks. Bounties will be paid out on a first-come, first-served basis, and will be denominated in Numeraire.

Examples of some initial bounties:

  • Document each check.
  • Improve the originality and concordance checks according to the suggestions in this document.
  • Create a process for easy benchmarking of speed.
  • Speed up the code.

Over time, new bounties will be added, and existing bounties may be modified. All issues that currently have a bounty on them will be tagged on the issues page. The rules for the bounty program are specified, in Q+A form, here.

To recap:

  • Read the code.
  • Think something should be changed? Submit an issue.
  • Able to fix an issue? Make the change, and submit a pull request.
  • Want to earn some extra Numeraire? Fix an issue that has a bounty on it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment