Instantly share code, notes, and snippets.

# JayGwod/note1.md

Last active April 22, 2022 13:37
Show Gist options
• Save JayGwod/2f5d5884abc7b5169eec73b7d42b5098 to your computer and use it in GitHub Desktop.
[DeepMind x UCL RL Lecture Series]#强化学习, #RL

# Introduction to Reinforcement Learning

## What is artificial intelligence?

Definition of intelligence: To be able to learn to make decisions to achieve goals

## What is reinforcement learning?

People and animals learn by interacting with our environment

Reinforcement learning is based on the rewuard hypothesis:

Any goal can be formalized as the outcome of maximizing a cumulative reward

There are distinct reasons to learn:

1. Find solutions
2. Adapt online, deal with unforeseen circumstances

## Agent and Environment

At each step \$t\$ the agent:

• Receives observation \$O_t\$ (and reward \$R_t)
• Executes action \$A_t\$

The environment:

• Emits observation \$O_{t+1}\$ (and reward \$R_{t+1}\$)

## Rewards

• A reward \$R_t\$ is a scalar feedback signal
• Indicates how well agent is doing at step \$t\$ -- defines the goal
• The agent's job is to maximize cumulative reward \$\$ G_t = R_{t+1} + R_{t+2} + R_{t+3} + ... \$\$
• We call this the return

## Agent State

• The history is the full sequence of observations, actions, rewards
• This history is used to construct the agent state \$S_t\$

## Full observability

Suppose the agent sees the full environment state

• observation = environment state
• The agent state could just be this observation: \$\$ S_t = O_t = environment\ state \$\$