JayGwod/note1.md

## note1.md

      
    Raw
  

              note1.md
            
          
    Introduction to Reinforcement Learning

What is artificial intelligence?

Definition of intelligence: To be able to learn to make decisions to achieve goals
What is reinforcement learning?

People and animals learn by interacting with our environment
Reinforcement learning is based on the rewuard hypothesis:

Any goal can be formalized as the outcome of maximizing a cumulative reward

There are distinct reasons to learn:

Find solutions
Adapt online, deal with unforeseen circumstances

Agent and Environment

At each step $t$ the agent:

Receives observation $O_t$ (and reward $R_t)
Executes action $A_t$


The environment:

Receives action $A_t$

Emits observation $O_{t+1}$ (and reward $R_{t+1}$)

Rewards


A reward $R_t$ is a scalar feedback signal
Indicates how well agent is doing at step $t$ -- defines the goal
The agent's job is to maximize cumulative reward
$$ G_t = R_{t+1} + R_{t+2} + R_{t+3} + ... $$
We call this the return


Agent State


The history is the full sequence of observations, actions, rewards
This history is used to construct the agent state $S_t$


Full observability

Suppose the agent sees the full environment state

observation = environment state
The agent state could just be this observation:
$$ S_t = O_t = environment\ state $$