Skip to content

Instantly share code, notes, and snippets.

@wojzaremba
Created April 25, 2016 22:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wojzaremba/0cac4286be1b8101cc75a3edd25a7d1c to your computer and use it in GitHub Desktop.
Save wojzaremba/0cac4286be1b8101cc75a3edd25a7d1c to your computer and use it in GitHub Desktop.
It's TRPO with neural network as value function.
It takes a current observation, previous observation, and previous action as the input.
https://github.com/wojzaremba/trpo , commit_id a95620a26b45a930c0015f29cf4f53b9762f34b7
Execute run.py to start 4 sessions of screen that reproduce results on: "Copy-v0", "DuplicatedInput-v0", "Reverse-v0", "RepeatCopy-v0"
@rbrigden
Copy link

rbrigden commented Sep 5, 2017

Can you describe how you preprocessed the input? Is the state/action index fed into an embedding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment