choiway/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Using Genetic Algorithms To Search For Trading Signals

Wayne H. Choi

waynechoi@gmail.com
Motivation

When applying machine learning to financial markets most quantitative research focus on optimization algorithms for user defined features.  This approach raises two issues. One is that the user needs to have deep domain knowledge in order to generate and define features. The second, related issue occurs when the act of gaining domain knowledge prejudices the researcher about what features may or may not work. In this walkthrough, we consider an alternative approach where we start with the assumption that the researcher has no knowledge of what works but in the entire universe of trading strategies there is a strategy that does work. This assumption frames the problem as a search problem as opposed to an optimization problem. Now the question for the researcher is how to approximate the entire universe of trading strategies which may be both intellectually and computationally impossible to generate. For similar problems, genetic algorithms have been used with good results. The following is a walk through of using a genetic algorithm in conjunction with grammatical evolution to generate trading strategies and find the fittest one.
Solution Walkthrough


In biology a genotype is the set of genes an organism carries around while a phenotype is its observable characteristics.

The implementation of genetic algorithms generally start with a randomly generated pool of genotypes that are scored for fitness. Once scored, you generate a new pool of the fittest genotypes from the initial pool, pick random pairs from the pool, mate the picked pairs and mutate the mated genotype. Each time we run this process we create a new generation of genotypes that are fitter than the previous generation. The final result is a pool of genotypes that can be ranked by score where the highest ranked genotype is the best solution to the problem.
In order to use genetic algorithms we need to be able to map our genotypes to trading strategies. Here we use grammatical evolution to map our genotypes or program fragments that execute the backtest of the strategy. Grammatical evolution allows us to evolve trading strategies based on our specified grammar and domain knowledge.
The following is a walkthrough of how we take an array (genotype) like [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] to a trading strategy represented as strategy-a:AAPL:12::strategy-b:GOOG:256 (phenotype).
Generate genotypes

We randomly generate 10,000 genotypes as an array of integers with length 12. This initial pool is used to generate the first generation of phenotypes.
Generate Strategies (Phenotypes)

For this example we make the following assumptions
https://gist.github.com/639dfc30c76a6f1c6ab703d016f672c2
The target ticker is the security for which we are trying to find winning trading strategies for. Tickers symbols represent the pool of securities that we'll use in our trading strategies. Statements represent building blocks from which to generate our trading strategies. Strategy fragments represent our trading strategies with parameters separated by colons and braced between arrow braces. The genotype represents the sample array while <code> initializes our phenotype.
To generate a trade fragment we need four integers so our genotype of 12 integers can generate a strategy made up of 3 different trade fragments. To illustrate how we apply grammatical evolution we start with the first four integers of the assumed genotype or [70, 20, 84, 226].
Map Statement

https://gist.github.com/cd0c2573ffba0ae56f99209d2b692db4
We initialize our strategy phenotype with <code>. The rule for statement is that <code> can be replaced with either ::<code> or <stmnt>::<code> based on the first integer of our genotype. In this example, we take the first integer 71 and calculate 71 % 2 = 1 since length of statements is two. Since 71 % 2 leaves us with 1 we replace <code> with <stmnt>::<code> or the second item in the statements array.
The string, <stmnt>, is what we use to encode the actual strategy fragment. The string <stmnt>::<code> allows for multiple strategy fragments which would make up the overall strategy.
Map Strategy

https://gist.github.com/09d26ce6f8fc6c98169b332fe83113e8
If <stmnt> exists, we replace it with a strategy fragment which includes the code for the trading strategy, a ticker symbol for the security that the strategy will be applied to and a numeric parameter. In picking the strategy fragment, we take the integer in the genotype and take the modulus of the length of the strategies.
Map Ticker

https://gist.github.com/daba00db3ea0e73cd8671dc669208e1a
If <ticker> exists, we replace it with a ticker symbol from the ticker symbols list.
Map Parameter

https://gist.github.com/4eea10eeb271723c98a14fe207d10511
If <param> exists, we replace it with the value of the integer in the genotype. Here we just use the value of the integer. Since we are just using the value of the integer, careful consideration should be given to the range of integers from which we randomly pick.
Run trading strategies

For each phenotype, we decode the strategy and run a backtest. We'll use the results of the backtest as metadata to help us rank the fitness of the phenotype and the probability that it will mate for the next generation.
Rank Strategies

In determining how to rank the phenotypes our primary goal is to ensure that fitter phenotypes have a higher chance of mating. We chose to rank the phenotypes in rank order based on the percentage of winning trades. In generating the mating population we use the phenotypes rank as a proxy for fitness and then use that rank to determine each phenotypes population in the pool. By way of example, if we limit the potential phenotypes for the mating pool to the 500 fittest phenotypes, the mating pool will have 125,250 phenotypes of which 500 will be the fittest phenotype. As a result, the fittest phenotype will have a 0.04% chance of being selected when randomly selecting phenotypes for the next generation.
https://gist.github.com/d9b16c98c4ed28adf3705f9156a3c10f
Crossover

To perform the crossover we randomly pick two genotypes from the mating pool then pick a random slice point in the genotype. In our example, since each genotype is an array of length 12 we pick a random number between 0 and 11. We slice the genotype to the left of the slice point of the first genotype that was picked and slice to the right of the slice point of the second genotype we picked. We then splice the sliced genotypes to generate the offspring.
https://gist.github.com/47178856e94832b15e786d330eeb4463
Mutation

In order to maintain diversity, we cycle through each offspring array and probabilistically mutate each integer. A higher probability results in a higher probability of mutating that integer. Traditionally, in order to ensure that the algorithm is working toward higher fitness the probability of mutation is usually set low; less than 5%. However, since we're working with a probabilistic outcome as opposed to a discrete outcome we increase the probability of mutation to 20% to increase diversity. If a mutation is triggered we either increment the integer by 1 or decrement it by 1 based on a 50% probability.
Rinse / Repeat

Once we have our new pool, we generate strategies and create a new pool. In this example we run three generations.
Results

The following table summarizes the results of this walkthrough when applied to two years of hourly Bitcoin data from different exchanges.


Strategy
Kelly
Cum Pnl
W Kelly
Total Trades
Generation
Pct Winners


stdevd:krakenUSD:200::abovema:btcnCNY:70
26.394604
0.79026383
0.72086895
574
1
0.6620209


stdevd:krakenUSD:94::abovema:btceUSD:220
17.309988
0.5761709
0.5373664
649
1
0.6533128


abovema:btcnCNY:250::stdevd:krakenUSD:87
17.297611
0.8965364
0.5966406
721
3
0.6518724


abovema:coinbaseEUR:221::stdevd:krakenUSD:82
6.5052958
0.60363126
0.32287654
1041
3
0.6311239


stdevd:krakenUSD:200::abovema:bitstampUSD:195
9.935446
0.6481855
0.41496494
875
3
0.63085717


stdevd:krakenUSD:118::abovema:coinbaseUSD:227
6.5914254
0.54767185
0.3035028
964
2
0.6255187


stdevd:krakenUSD:182
7.4401746
1.6357833
0.62182724
1658
3
0.6206273


stdevd:krakenUSD:214
6.7178283
1.5198021
0.5525266
1629
1
0.62062615


stdevd:krakenUSD:142
7.404855
1.675344
0.629923
1691
3
0.62034297


These results are raw and haven't been tested for randomness and does not include trading costs.
Strategy	Kelly	Cum Pnl	W Kelly	Total Trades	Generation	Pct Winners
stdevd:krakenUSD:200::abovema:btcnCNY:70	26.394604	0.79026383	0.72086895	574	1	0.6620209
stdevd:krakenUSD:94::abovema:btceUSD:220	17.309988	0.5761709	0.5373664	649	1	0.6533128
abovema:btcnCNY:250::stdevd:krakenUSD:87	17.297611	0.8965364	0.5966406	721	3	0.6518724
abovema:coinbaseEUR:221::stdevd:krakenUSD:82	6.5052958	0.60363126	0.32287654	1041	3	0.6311239
stdevd:krakenUSD:200::abovema:bitstampUSD:195	9.935446	0.6481855	0.41496494	875	3	0.63085717
stdevd:krakenUSD:118::abovema:coinbaseUSD:227	6.5914254	0.54767185	0.3035028	964	2	0.6255187
stdevd:krakenUSD:182	7.4401746	1.6357833	0.62182724	1658	3	0.6206273
stdevd:krakenUSD:214	6.7178283	1.5198021	0.5525266	1629	1	0.62062615
stdevd:krakenUSD:142	7.404855	1.675344	0.629923	1691	3	0.62034297