phizaz/talk.md

## talk.md

      
    Raw
  

              talk.md
            
          
    Q-Learning for algorithm trading

Q-Learning background


by Konpat


Q-Learninng is a reinforcement learning algorithm, Q-Learning does not require the model and the full understanding of the nature of its environment, in which it will learn by trail and errors, after which it will be better over time. And thus proved to be asymtotically optimal.


you need first to understand the Markov Decision Process, which is a graph consisting of (states, actions, rewards) denoting {S}, {A}, {R}
State (S)
Action (A) is a function of State A(S) => set of actions
Reward (R) is a function of State and Action R(S, A)
they can be non-deterministic it can be probability distribution !
that means taking an action might not always get you to the same destination state.
Q-value (Q) is a discounted expeceted reward for a given state-action pair.
By "discount" it means you can discount the reward if it's too far away in the future ... this way you can "favor" more on the "faster" way.
Q-learning tries it best to predict this Q-value.
Upon finding Q value for every (state, action) pair your can easily get the "best known policy", but it is, most of the time, not the optimal policy.
It can deal with "uncertainty"
It can deal with delayed gratification.
It is proved mathematically that if you let it learn long enough it will be able to predict the "true" Q-value and thus get the "true" best policy possible.
Normally, every reinforcement learning works on "time" scale.

The talk begins ...


by David Samuel

Note: It's quite hard to follow the talk because of my limited Engish listening skills. So I missed quite a big deal of the talk.
History


"outcry" trading pits ... phone by broker
from...simple rules - simple trading  simple algorithmic
to...more complex low latency trading (high frequency trading, HFT)  and machine learning and statisctical lagorithms

Basics of traidng (bid vs ask)


Imagine the SET market if you are familiar with it. This one is reasonably easy to follow.

Algo vs human system trading


algo = predict the next tick
human = predict a strategy ! (much more complex plays)

Human and Algorithmic trading can coexist because they operate on different time scale ... algo trying be "fast" and "high frequency" ... human "cannot be as fast" but they can plan more on the long play.
Reinforcement learning exmaples

Note: Q-learning summary is on the top of this note !

https://github.com/karpathy/reinforcejs
epsilon is from epsilon-greedy algorithm which determines "how much you want your bot to commit to exploration" this is part of the "exploration-exploitation" dilemma.
As of training, you can let your bot learns in different environments.
By stepping up the "hardness" for each environment, your bot will faster adapt to the harshesh environment.
For example, if you want your bot to learn the "stock going-up-and-down pattern". You best bet is not to put a lot of time series and let it learn. But, by stepping up the complexity of the environment it strives in.
You can start more humbly by letting it learn as follows:

fixed-length up and down patterns.
poisson up and down patterns (introducing some unexpected changes).
introducing more unexpeceted patters.
finally, let it learn with actual stock prices.


Deep Reinforcement Learning (DQN)


how to define the loss function regarding q-learning algorithm ? Deep-mind Atari paper shows us a way to do it.
define topology of the neural network
Read this: https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
and the paper from deep mind: https://arxiv.org/abs/1312.5602


For Trading (short, flat, long decisions)


define the MDP
Implementation side: Using ReLU in this case, keras does the job !
MDP: modeled as short <=> flat <=> long
where to start ? https://github.com/hackthemarket/gym-trading as a framework let's say !
open ai has a lot to offer for reinforcement learning : https://gym.openai.com/