Skip to content

Instantly share code, notes, and snippets.

@choiway

choiway/blog.md Secret

Created August 12, 2018 21:35
Show Gist options
  • Save choiway/445f9ca33be75d5941a661b0f2fb20ab to your computer and use it in GitHub Desktop.
Save choiway/445f9ca33be75d5941a661b0f2fb20ab to your computer and use it in GitHub Desktop.

Using Genetic Algorithms To Search For Trading Signals

Wayne H. Choi
waynechoi@gmail.com

Motivation

When applying machine learning to financial markets most quantitative research focus on optimization algorithms for user defined features. This approach raises two issues. One is that the user needs to have deep domain knowledge in order to generate and define features. The second, related issue occurs when the act of gaining domain knowledge prejudices the researcher about what features may or may not work. In this walkthrough, we consider an alternative approach where we start with the assumption that the researcher has no knowledge of what works but in the entire universe of trading strategies there is a strategy that does work. This assumption frames the problem as a search problem as opposed to an optimization problem. Now the question for the researcher is how to approximate the entire universe of trading strategies which may be both intellectually and computationally impossible to generate. For similar problems, genetic algorithms have been used with good results. The following is a walk through of using a genetic algorithm in conjunction with grammatical evolution to generate trading strategies and find the fittest one.

Solution Walkthrough

In biology a genotype is the set of genes an organism carries around while a phenotype is its observable characteristics.

The implementation of genetic algorithms generally start with a randomly generated pool of genotypes that are scored for fitness. Once scored, you generate a new pool of the fittest genotypes from the initial pool, pick random pairs from the pool, mate the picked pairs and mutate the mated genotype. Each time we run this process we create a new generation of genotypes that are fitter than the previous generation. The final result is a pool of genotypes that can be ranked by score where the highest ranked genotype is the best solution to the problem.

In order to use genetic algorithms we need to be able to map our genotypes to trading strategies. Here we use grammatical evolution to map our genotypes or program fragments that execute the backtest of the strategy. Grammatical evolution allows us to evolve trading strategies based on our specified grammar and domain knowledge.

The following is a walkthrough of how we take an array (genotype) like [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] to a trading strategy represented as strategy-a:AAPL:12::strategy-b:GOOG:256 (phenotype).

Generate genotypes

We randomly generate 10,000 genotypes as an array of integers with length 12. This initial pool is used to generate the first generation of phenotypes.

Generate Strategies (Phenotypes)

For this example we make the following assumptions

https://gist.github.com/639dfc30c76a6f1c6ab703d016f672c2

The target ticker is the security for which we are trying to find winning trading strategies for. Tickers symbols represent the pool of securities that we'll use in our trading strategies. Statements represent building blocks from which to generate our trading strategies. Strategy fragments represent our trading strategies with parameters separated by colons and braced between arrow braces. The genotype represents the sample array while <code> initializes our phenotype.

To generate a trade fragment we need four integers so our genotype of 12 integers can generate a strategy made up of 3 different trade fragments. To illustrate how we apply grammatical evolution we start with the first four integers of the assumed genotype or [70, 20, 84, 226].

Map Statement

https://gist.github.com/cd0c2573ffba0ae56f99209d2b692db4

We initialize our strategy phenotype with <code>. The rule for statement is that <code> can be replaced with either ::<code> or <stmnt>::<code> based on the first integer of our genotype. In this example, we take the first integer 71 and calculate 71 % 2 = 1 since length of statements is two. Since 71 % 2 leaves us with 1 we replace <code> with <stmnt>::<code> or the second item in the statements array.

The string, <stmnt>, is what we use to encode the actual strategy fragment. The string <stmnt>::<code> allows for multiple strategy fragments which would make up the overall strategy.

Map Strategy

https://gist.github.com/09d26ce6f8fc6c98169b332fe83113e8

If <stmnt> exists, we replace it with a strategy fragment which includes the code for the trading strategy, a ticker symbol for the security that the strategy will be applied to and a numeric parameter. In picking the strategy fragment, we take the integer in the genotype and take the modulus of the length of the strategies.

Map Ticker

https://gist.github.com/daba00db3ea0e73cd8671dc669208e1a

If <ticker> exists, we replace it with a ticker symbol from the ticker symbols list.

Map Parameter

https://gist.github.com/4eea10eeb271723c98a14fe207d10511

If <param> exists, we replace it with the value of the integer in the genotype. Here we just use the value of the integer. Since we are just using the value of the integer, careful consideration should be given to the range of integers from which we randomly pick.

Run trading strategies

For each phenotype, we decode the strategy and run a backtest. We'll use the results of the backtest as metadata to help us rank the fitness of the phenotype and the probability that it will mate for the next generation.

Rank Strategies

In determining how to rank the phenotypes our primary goal is to ensure that fitter phenotypes have a higher chance of mating. We chose to rank the phenotypes in rank order based on the percentage of winning trades. In generating the mating population we use the phenotypes rank as a proxy for fitness and then use that rank to determine each phenotypes population in the pool. By way of example, if we limit the potential phenotypes for the mating pool to the 500 fittest phenotypes, the mating pool will have 125,250 phenotypes of which 500 will be the fittest phenotype. As a result, the fittest phenotype will have a 0.04% chance of being selected when randomly selecting phenotypes for the next generation.

https://gist.github.com/d9b16c98c4ed28adf3705f9156a3c10f

Crossover

To perform the crossover we randomly pick two genotypes from the mating pool then pick a random slice point in the genotype. In our example, since each genotype is an array of length 12 we pick a random number between 0 and 11. We slice the genotype to the left of the slice point of the first genotype that was picked and slice to the right of the slice point of the second genotype we picked. We then splice the sliced genotypes to generate the offspring.

https://gist.github.com/47178856e94832b15e786d330eeb4463

Mutation

In order to maintain diversity, we cycle through each offspring array and probabilistically mutate each integer. A higher probability results in a higher probability of mutating that integer. Traditionally, in order to ensure that the algorithm is working toward higher fitness the probability of mutation is usually set low; less than 5%. However, since we're working with a probabilistic outcome as opposed to a discrete outcome we increase the probability of mutation to 20% to increase diversity. If a mutation is triggered we either increment the integer by 1 or decrement it by 1 based on a 50% probability.

Rinse / Repeat

Once we have our new pool, we generate strategies and create a new pool. In this example we run three generations.

Results

The following table summarizes the results of this walkthrough when applied to two years of hourly Bitcoin data from different exchanges.

Strategy Kelly Cum Pnl W Kelly Total Trades Generation Pct Winners
stdevd:krakenUSD:200::abovema:btcnCNY:70 26.394604 0.79026383 0.72086895 574 1 0.6620209
stdevd:krakenUSD:94::abovema:btceUSD:220 17.309988 0.5761709 0.5373664 649 1 0.6533128
abovema:btcnCNY:250::stdevd:krakenUSD:87 17.297611 0.8965364 0.5966406 721 3 0.6518724
abovema:coinbaseEUR:221::stdevd:krakenUSD:82 6.5052958 0.60363126 0.32287654 1041 3 0.6311239
stdevd:krakenUSD:200::abovema:bitstampUSD:195 9.935446 0.6481855 0.41496494 875 3 0.63085717
stdevd:krakenUSD:118::abovema:coinbaseUSD:227 6.5914254 0.54767185 0.3035028 964 2 0.6255187
stdevd:krakenUSD:182 7.4401746 1.6357833 0.62182724 1658 3 0.6206273
stdevd:krakenUSD:214 6.7178283 1.5198021 0.5525266 1629 1 0.62062615
stdevd:krakenUSD:142 7.404855 1.675344 0.629923 1691 3 0.62034297

These results are raw and haven't been tested for randomness and does not include trading costs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment