Skip to content

Instantly share code, notes, and snippets.

@robsannaa
Created November 29, 2019 12:34
Show Gist options
  • Save robsannaa/ea8010c8bc9f6eda96531f1faab3ff93 to your computer and use it in GitHub Desktop.
Save robsannaa/ea8010c8bc9f6eda96531f1faab3ff93 to your computer and use it in GitHub Desktop.
for i in range(1000):
current_state = np.random.randint(0,12)
playable_actions = []
for j in range(12):
if R[current_state, j] > 0:
playable_actions.append(j)
next_state = np.random.choice(playable_actions)
TD = R[current_state, next_state] + gamma*Q[next_state, np.argmax(Q[next_state,])]
- Q[current_state, next_state]
Q[current_state, next_state] = Q[current_state, next_state] + alpha*TD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment