Skip to content

Instantly share code, notes, and snippets.

@tilarids
Created August 30, 2016 02:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tilarids/a2d7a20395f61a4369d73369f32b7635 to your computer and use it in GitHub Desktop.
Save tilarids/a2d7a20395f61a4369d73369f32b7635 to your computer and use it in GitHub Desktop.
TRPO (described in http://arxiv.org/abs/1502.05477) with an additional neural network to predict value (used for advantage calculation).
More details and steps to reproduce: https://github.com/tilarids/reinforcement_learning_playground
Commit used to produce the result: https://github.com/tilarids/reinforcement_learning_playground/commit/df2b1c68735f31c6ed2b943a1e0309385b53cd0e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment