Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@wojzaremba
Created April 25, 2016 22:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wojzaremba/0cac4286be1b8101cc75a3edd25a7d1c to your computer and use it in GitHub Desktop.
Save wojzaremba/0cac4286be1b8101cc75a3edd25a7d1c to your computer and use it in GitHub Desktop.
It's TRPO with neural network as value function.
It takes a current observation, previous observation, and previous action as the input.
https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"
@joschu
Copy link

joschu commented Apr 27, 2016

Copy: reproduced
DuplicatedInput: reproduced
Reverse: stuck at return of 1.3
RepeatCopy: plateaus at return of 13

So I'll mark the first 2 as verified. Maybe there's some randomness affecting the last two?

Also, your file run.py did not work for me and it looks like a dangerous script to run because it mucks around with screen, whereas I'm working in screen on my machine. Can you provide a cleaner command for running all of your scripts?

@rbrigden
Copy link

rbrigden commented Sep 5, 2017

Can you describe how you preprocessed the input? Is the state/action index fed into an embedding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment