Skip to content

Instantly share code, notes, and snippets.

@MikeShi42
Created September 18, 2018 03:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MikeShi42/5fbc81f8954ffd1fa104fbd7a9c6e366 to your computer and use it in GitHub Desktop.
Save MikeShi42/5fbc81f8954ffd1fa104fbd7a9c6e366 to your computer and use it in GitHub Desktop.
for _ in range(5000):
observations += [observation.tolist()] # Record the observations for normalization and replay
if done: # If the simulation was over last iteration, exit loop
break
# Pick an action according to the policy matrix
outcome = np.dot(policy, observation)
action = 1 if outcome > 0 else 0
# Make the action, record reward
observation, reward, done, info = env.step(action)
score += reward
return score, observations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment