Skip to content

Instantly share code, notes, and snippets.

@rbrigden rbrigden/quiz13.py
Last active Apr 25, 2018

Embed
What would you like to do?
Template for solving question 4 for of quiz 13
import numpy as np
import copy
# NOTE: a = 1 is a(-), a = 0 is a(+)
gamma = 0.9
action_space = 2
state_space = 4
eps = 1e-9
# Action conditioned reward function
R = np.array([[0, 0],
[-1, -1],
[-1, -1],
[-1, -1]])
# Policy
pi = np.array([[1./6,5./6],
[1./6,5./6],
[1./6,5./6],
[1./6,5./6]])
# Values
V = np.zeros((state_space, 1))
# Action Values
Q = np.zeros((state_space, action_space))
# Transition function
T = np.zeros((action_space, state_space, state_space))
T[0, :, :] = np.array([[1, 0, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[0, 0, 1, 0]])
T[1, :, :] = np.array([[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.