Skip to content

Instantly share code, notes, and snippets.

@phizaz
Created June 20, 2017 04:03
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save phizaz/95a88ebd6e8823816535a6d1ef32289c to your computer and use it in GitHub Desktop.
Save phizaz/95a88ebd6e8823816535a6d1ef32289c to your computer and use it in GitHub Desktop.
BKKMLMEETUP: Q-Learning for Trading

Q-Learning for algorithm trading

Q-Learning background

by Konpat

Q-Learninng is a reinforcement learning algorithm, Q-Learning does not require the model and the full understanding of the nature of its environment, in which it will learn by trail and errors, after which it will be better over time. And thus proved to be asymtotically optimal.

  • you need first to understand the Markov Decision Process, which is a graph consisting of (states, actions, rewards) denoting {S}, {A}, {R}
  • State (S)
  • Action (A) is a function of State A(S) => set of actions
  • Reward (R) is a function of State and Action R(S, A)
  • they can be non-deterministic it can be probability distribution !
  • that means taking an action might not always get you to the same destination state.
  • Q-value (Q) is a discounted expeceted reward for a given state-action pair.
  • By "discount" it means you can discount the reward if it's too far away in the future ... this way you can "favor" more on the "faster" way.
  • Q-learning tries it best to predict this Q-value.
  • Upon finding Q value for every (state, action) pair your can easily get the "best known policy", but it is, most of the time, not the optimal policy.
  • It can deal with "uncertainty"
  • It can deal with delayed gratification.
  • It is proved mathematically that if you let it learn long enough it will be able to predict the "true" Q-value and thus get the "true" best policy possible.
  • Normally, every reinforcement learning works on "time" scale.

The talk begins ...

by David Samuel

Note: It's quite hard to follow the talk because of my limited Engish listening skills. So I missed quite a big deal of the talk.

History

  1. "outcry" trading pits ... phone by broker
  2. from...simple rules - simple trading simple algorithmic
  3. to...more complex low latency trading (high frequency trading, HFT) and machine learning and statisctical lagorithms

Basics of traidng (bid vs ask)

Imagine the SET market if you are familiar with it. This one is reasonably easy to follow.

Algo vs human system trading

  1. algo = predict the next tick
  2. human = predict a strategy ! (much more complex plays)

Human and Algorithmic trading can coexist because they operate on different time scale ... algo trying be "fast" and "high frequency" ... human "cannot be as fast" but they can plan more on the long play.

Reinforcement learning exmaples

Note: Q-learning summary is on the top of this note !

  1. https://github.com/karpathy/reinforcejs
  2. epsilon is from epsilon-greedy algorithm which determines "how much you want your bot to commit to exploration" this is part of the "exploration-exploitation" dilemma.
  3. As of training, you can let your bot learns in different environments.
  4. By stepping up the "hardness" for each environment, your bot will faster adapt to the harshesh environment.
  5. For example, if you want your bot to learn the "stock going-up-and-down pattern". You best bet is not to put a lot of time series and let it learn. But, by stepping up the complexity of the environment it strives in.
  6. You can start more humbly by letting it learn as follows:
    1. fixed-length up and down patterns.
    2. poisson up and down patterns (introducing some unexpected changes).
    3. introducing more unexpeceted patters.
    4. finally, let it learn with actual stock prices.

Deep Reinforcement Learning (DQN)

  1. how to define the loss function regarding q-learning algorithm ? Deep-mind Atari paper shows us a way to do it.
  2. define topology of the neural network
  3. Read this: https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
  4. and the paper from deep mind: https://arxiv.org/abs/1312.5602

For Trading (short, flat, long decisions)

  1. define the MDP
  2. Implementation side: Using ReLU in this case, keras does the job !
  3. MDP: modeled as short <=> flat <=> long
  4. where to start ? https://github.com/hackthemarket/gym-trading as a framework let's say !
  5. open ai has a lot to offer for reinforcement learning : https://gym.openai.com/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment