wojzaremba/gist:0cac4286be1b8101cc75a3edd25a7d1c

## gistfile1.txt
It's TRPO with neural network as value function.

It takes a current observation, previous observation, and previous action as the input.

https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"
	It's TRPO with neural network as value function.

	It takes a current observation, previous observation, and previous action as the input.

	https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

	Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"