Wojciech Zaremba wojzaremba

## gist:3f901fcd50d7aa7a38b81aa16cbd899c
This repo implements recurrent neural network that optimizes TRPO loss function. Moreover, we use
a neural network as value function.

https://github.com/wojzaremba/trpo_rnn , commit_id da6fb44bd2980cd26dd057aff01f55a533a742fa

Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0",
"ReversedAddition-v0", "ReversedAddition3-v0"

## gist:0cac4286be1b8101cc75a3edd25a7d1c
It's TRPO with neural network as value function.

It takes a current observation, previous observation, and previous action as the input.

https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"

## TRPO with prev
Init

## gist:b474cfa2fbc93ddf3f8d0ea2ff76f362
git : https://github.com/wojzaremba/trpo  commit : 6bb9fe32d5bb3413cd76e60518d49c58b2716ad1
is an implementation of TRPO with poor man memory. It concatenates last observation and state to the current observation.

It allows to solve tasks that require very short memory (e.g. reverse). Execute this script on 3 tasks Copy-v0,
DuplicatedInput-v0 and Reverse-v0) by calling:

python run.py

It starts 3 screen instances. Training takes ~1 min.

## gist:0edf6c8a9c292b32ba196c90f91126d7
Run https://github.com/wojzaremba/trpo/blob/master/main_copy.py

from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95

## gist:637876a7c3e59e2cd22aebe35e17a835
Run https://github.com/wojzaremba/trpo/blob/master/main_duplicated.py

from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95

## vanilla_trpo
Code is located at : https://github.com/wojzaremba/trpo . This solution is based on commit : 5d86623abeb5759de155f495789bbb4afd74aae5

It takes < 1 min. on CPU to get this results.

Just run:
python main.py
	This repo implements recurrent neural network that optimizes TRPO loss function. Moreover, we use
	a neural network as value function.

	https://github.com/wojzaremba/trpo_rnn , commit_id da6fb44bd2980cd26dd057aff01f55a533a742fa

	Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0",
	"ReversedAddition-v0", "ReversedAddition3-v0"
	It's TRPO with neural network as value function.

	It takes a current observation, previous observation, and previous action as the input.

	https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7

	Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"
	git : https://github.com/wojzaremba/trpo commit : 6bb9fe32d5bb3413cd76e60518d49c58b2716ad1
	is an implementation of TRPO with poor man memory. It concatenates last observation and state to the current observation.

	It allows to solve tasks that require very short memory (e.g. reverse). Execute this script on 3 tasks Copy-v0,
	DuplicatedInput-v0 and Reverse-v0) by calling:

	python run.py

	It starts 3 screen instances. Training takes ~1 min.
	Run https://github.com/wojzaremba/trpo/blob/master/main_copy.py

	from commit : 1501754fc6e18615487fae87d2b6d58d47ca4c95
	Code is located at : https://github.com/wojzaremba/trpo . This solution is based on commit : 5d86623abeb5759de155f495789bbb4afd74aae5

	It takes < 1 min. on CPU to get this results.

	Just run:
	python main.py