Skip to content

Instantly share code, notes, and snippets.

@JayGwod
Last active April 22, 2022 13:37
Show Gist options
  • Save JayGwod/2f5d5884abc7b5169eec73b7d42b5098 to your computer and use it in GitHub Desktop.
Save JayGwod/2f5d5884abc7b5169eec73b7d42b5098 to your computer and use it in GitHub Desktop.
[DeepMind x UCL RL Lecture Series]#强化学习, #RL

Introduction to Reinforcement Learning

What is artificial intelligence?

Definition of intelligence: To be able to learn to make decisions to achieve goals

What is reinforcement learning?

People and animals learn by interacting with our environment

Reinforcement learning is based on the rewuard hypothesis:

Any goal can be formalized as the outcome of maximizing a cumulative reward

There are distinct reasons to learn:

  1. Find solutions
  2. Adapt online, deal with unforeseen circumstances

Agent and Environment

At each step $t$ the agent:

  • Receives observation $O_t$ (and reward $R_t)
  • Executes action $A_t$

The environment:

  • Receives action $A_t$
  • Emits observation $O_{t+1}$ (and reward $R_{t+1}$)

Rewards

  • A reward $R_t$ is a scalar feedback signal
  • Indicates how well agent is doing at step $t$ -- defines the goal
  • The agent's job is to maximize cumulative reward $$ G_t = R_{t+1} + R_{t+2} + R_{t+3} + ... $$
  • We call this the return

Agent State

  • The history is the full sequence of observations, actions, rewards
  • This history is used to construct the agent state $S_t$

Full observability

Suppose the agent sees the full environment state

  • observation = environment state
  • The agent state could just be this observation: $$ S_t = O_t = environment\ state $$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment