Skip to content

Instantly share code, notes, and snippets.

@rbrigden
Last active April 25, 2018 05:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rbrigden/6364095c1ed82029f195200c203fe92d to your computer and use it in GitHub Desktop.
Save rbrigden/6364095c1ed82029f195200c203fe92d to your computer and use it in GitHub Desktop.
Template for solving question 4 for of quiz 13
import numpy as np
import copy
# NOTE: a = 1 is a(-), a = 0 is a(+)
gamma = 0.9
action_space = 2
state_space = 4
eps = 1e-9
# Action conditioned reward function
R = np.array([[0, 0],
[-1, -1],
[-1, -1],
[-1, -1]])
# Policy
pi = np.array([[1./6,5./6],
[1./6,5./6],
[1./6,5./6],
[1./6,5./6]])
# Values
V = np.zeros((state_space, 1))
# Action Values
Q = np.zeros((state_space, action_space))
# Transition function
T = np.zeros((action_space, state_space, state_space))
T[0, :, :] = np.array([[1, 0, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[0, 0, 1, 0]])
T[1, :, :] = np.array([[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment