Skip to content

Instantly share code, notes, and snippets.

@wojzaremba
Created April 27, 2016 02:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wojzaremba/3f901fcd50d7aa7a38b81aa16cbd899c to your computer and use it in GitHub Desktop.
Save wojzaremba/3f901fcd50d7aa7a38b81aa16cbd899c to your computer and use it in GitHub Desktop.
This repo implements recurrent neural network that optimizes TRPO loss function. Moreover, we use
a neural network as value function.
https://github.com/wojzaremba/trpo_rnn , commit_id da6fb44bd2980cd26dd057aff01f55a533a742fa
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0",
"ReversedAddition-v0", "ReversedAddition3-v0"
@joschu
Copy link

joschu commented Apr 27, 2016

I reproduced the positive results on Copy-v0 and DuplicatedInput-v0 and the negative results on the other two.

@rbrigden
Copy link

rbrigden commented Sep 6, 2017

How do you represent your state? Given that the environment returns Discrete(N) observations, I assume you are concatenating these observations and maybe bundle them with the most recent action? I am working on solving the environment and would appreciate learning about your approach/experimentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment