Skip to content

Instantly share code, notes, and snippets.

@MikeShi42
Created September 18, 2018 02:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MikeShi42/c6ea4f19bf628cc40dc9c76087f5d4fb to your computer and use it in GitHub Desktop.
Save MikeShi42/c6ea4f19bf628cc40dc9c76087f5d4fb to your computer and use it in GitHub Desktop.
CartPole Main.py Checkpoint 1
import gym
import numpy as np
env = gym.make('CartPole-v1')
def play(env, policy):
observation = env.reset()
done = False
score = 0
observations = []
for _ in range(5000):
observations += [observation.tolist()] # Record the observations for normalization and replay
if done: # If the simulation was over last iteration, exit loop
break
# Pick an action according to the policy matrix
outcome = np.dot(policy, observation)
action = 1 if outcome > 0 else 0
# Make the action, record reward
observation, reward, done, info = env.step(action)
score += reward
return score, observations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment