tilarids/README

## README
TRPO (described in http://arxiv.org/abs/1502.05477) with an additional neural network to predict value (used for advantage calculation).

More details and steps to reproduce: https://github.com/tilarids/reinforcement_learning_playground
Commit used to produce the result: https://github.com/tilarids/reinforcement_learning_playground/commit/df2b1c68735f31c6ed2b943a1e0309385b53cd0e
	TRPO (described in http://arxiv.org/abs/1502.05477) with an additional neural network to predict value (used for advantage calculation).

	More details and steps to reproduce: https://github.com/tilarids/reinforcement_learning_playground
	Commit used to produce the result: https://github.com/tilarids/reinforcement_learning_playground/commit/df2b1c68735f31c6ed2b943a1e0309385b53cd0e