Skip to content

Instantly share code, notes, and snippets.

@tilarids
Created July 1, 2016 21:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tilarids/978eaefadcf3593e5ab091355aada8cf to your computer and use it in GitHub Desktop.
Save tilarids/978eaefadcf3593e5ab091355aada8cf to your computer and use it in GitHub Desktop.

Vanilla policy gradients with ValueFunction to estimate value for the specific state (I use current observation, previous observation and previous action as a state). This same algorithm works fine without ValueFunction if you don't stop the learning process at step 200 and continue learning after that. OpenAI Gym's monitor stops the game at step 200 so you can't use monitor at the same time as training on more than 200 steps.

Reproducing:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment