Skip to content

Instantly share code, notes, and snippets.

@twiecki
Created January 15, 2016 11:00
Show Gist options
  • Save twiecki/9f10654b05ff69859ae1 to your computer and use it in GitHub Desktop.
Save twiecki/9f10654b05ff69859ae1 to your computer and use it in GitHub Desktop.
Thomas Wiecki Strata Hadoop World London 2016 submission -- accepted
All that glitters is not gold: Comparing backtest and out-of-sample performance of 800.000 trading algorithms
“Past performance is no guarantee of future returns”. This cautionary message will certainly match the experience of many
investors. When automated trading strategies are developed and evaluated using backtests on historical pricing data, there
is always a tendency, intentional or not, to overfit to the past. As a result, strategies that show fantastic performance on
historical data often flounder when deployed with real capital.
Quantopian is an online platform that allows users to develop, backtest, and trade algorithmic investing strategies. By
pooling all strategies developed on our platform we constructed a huge and unique data set of over 800.000 trading algorith
ms. Although we do not have access to source code, we have returns and portfolio allocations as well as the time the
algorithm was last edited. This allows us to compare returns over the period the author had access to and potentially
overfit on, as well as true out-of-sample data that accumulated since then. In this talk I will shed light on the prevalence
of backtest overfitting and debunk several common myths in quantitative finance based on empirical findings. Moreover, I'll
show how I trained a machine learning classifier on this dataset to predict whether an algorithm is overfit or not and how
its future performance will likely unfold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment