Skip to content

Instantly share code, notes, and snippets.

@tilarids tilarids/README
Created Aug 30, 2016

Embed
What would you like to do?
TRPO (described in http://arxiv.org/abs/1502.05477) with an additional neural network to predict value (used for advantage calculation).
More details and steps to reproduce: https://github.com/tilarids/reinforcement_learning_playground
Commit used to produce the result: https://github.com/tilarids/reinforcement_learning_playground/commit/df2b1c68735f31c6ed2b943a1e0309385b53cd0e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.