Skip to content

Instantly share code, notes, and snippets.

@icyblade
Created January 9, 2017 08:00
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save icyblade/d9ebf2eb7b4676cc9859c029d1c802b2 to your computer and use it in GitHub Desktop.
Save icyblade/d9ebf2eb7b4676cc9859c029d1c802b2 to your computer and use it in GitHub Desktop.
Solution of FrozenLake-v0 using Q table
#! coding: utf8
import os
import gym
import numpy as np
from gym import wrappers
env = gym.make('FrozenLake-v0')
os.system('rm -rf /tmp/frozenlake_v0_q_table')
env = wrappers.Monitor(env, '/tmp/frozenlake_v0_q_table')
nb_epoch = 10000
Q = np.zeros([
env.observation_space.n,
env.action_space.n
]) # zero initialization
lr = 0.7 # learning rate
gamma = 0.99 # discount
rewards = []
for epoch in xrange(nb_epoch):
observation_previous = env.reset()
r = 0
while True:
action = np.argmax(
Q[observation_previous, :] +
np.random.randn(1, env.action_space.n)*(1.0/(epoch+1))
)
observation, reward, done, info = env.step(action)
gradient = (
reward + gamma*np.max(Q[observation, :]) -
Q[observation_previous, action]
)
Q[observation_previous, action] += lr*gradient
r += reward
observation_previous = observation
if done:
env.close()
break
rewards.append(r)
print(np.mean(rewards))
@MartinThoma
Copy link

What is the effect of

env = wrappers.Monitor(env, '/tmp/frozenlake_v0_q_table')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment