Skip to content

Instantly share code, notes, and snippets.

@Paulescu
Last active January 10, 2022 16:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Paulescu/9a2270e4801162f1dbb3e183716c4fc5 to your computer and use it in GitHub Desktop.
Save Paulescu/9a2270e4801162f1dbb3e183716c4fc5 to your computer and use it in GitHub Desktop.
import random
def train(n_episodes: int):
"""
Pseudo-code of a Reinforcement Learning agent training loop
"""
# python object that wraps all environment logic. Typically you will
# be using OpenAI gym here.
env = load_env()
# python object that wraps all agent policy (or value function)
# parameters, and action generation methods.
agent = get_rl_agent()
for episode in range(0, n_episodes):
# random start of the environmnet
state = env.reset()
# epsilon is parameter that controls the exploitation-exploration trade-off.
# it is good practice to set a decaying value for epsilon
epsilon = get_epsilon(episode)
done = False
while not done:
if random.uniform(0, 1) < epsilon:
# Explore action space
action = env.action_space.sample()
else:
# Exploit learned values (or policy)
action = agent.get_best_action(state)
# environment transitions to next state and maybe rewards the agent.
next_state, reward, done, info = env.step(action)
# adjust agent parameters. We will see how later in the course.
agent.update_parameters(state, action, reward, next_state)
state = next_state
@dmitry-kabanov
Copy link

I am reading your hands-on series now. Looks great!

I think

    # epsilon is parameter that controls the exploitation-exploration trade-off.
    # it is good practice to set a decaying value for epsilon

should be replaced with

    # epsilon is parameter that controls the exploration-exploitation trade-off.
    # it is good practice to set a decaying value for epsilon

This will make it easier to understand, as it will be more consistent with the actual code in lines 28-33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment