Skip to content

Instantly share code, notes, and snippets.

@tanzhenyu
Created August 23, 2019 15:50
Show Gist options
  • Save tanzhenyu/c79748f2387d6921a305a7149e7102b5 to your computer and use it in GitHub Desktop.
Save tanzhenyu/c79748f2387d6921a305a7149e7102b5 to your computer and use it in GitHub Desktop.
ppo main loop
model, env = ppo()
obs = env.reset()
reward = 0
while True:
action, _, _ = model.get_pi_logpi_vf(obs.reshape(1, -1))
obs, r, d, _ = env.step(action.numpy()[0])
reward += r
env.render()
if d:
print('episode reward {}'.format(reward))
reward = 0
obs = env.reset()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment