by Konpat
Q-Learninng is a reinforcement learning algorithm, Q-Learning does not require the model and the full understanding of the nature of its environment, in which it will learn by trail and errors, after which it will be better over time. And thus proved to be asymtotically optimal.
- you need first to understand the Markov Decision Process, which is a graph consisting of (states, actions, rewards) denoting {S}, {A}, {R}
- State (S)