Skip to content

Instantly share code, notes, and snippets.

@sygi sygi/
Last active Aug 13, 2017

What would you like to do?

The solution is an adaptation of DQN (as described in nips paper) to the RAM state.

The code can be found at, commit d8eeacb4b6b836c71c11231e9c2ba6e029f7eea1. It is based on the Nathan Sprague dqn implementation.

To run the code you should execute:

python -e 100 --env-name GAME_NAME-ram-v0 --network-type big_ram (--results-path PATH_FOR_GYM_EVALUATIONS)

The network architecture and the other things we've tried are explained in our paper. The difference in the results come from the facts that:

  • the Gym environment is a bit more challenging, with stochastic number of steps which the action is executed
  • in the paper we report the best average episode score in one epoch (of 10 000 steps), instead of best 100 average episode score
  • the Gym chooses a random seed, so there's some randomness in the process

In this experiment I used frameskip of 2 (meaning that one action is executed 4-8 times) for all of the games except Seaquest, where I set frameskip to 10 (to simulate the frameskip 30 we used in the paper at some point). All the rest of hyperparameters are set in the file.

I encourage you to play with the code, execute it on the untested games, change the network architecture, hyperparams, etc. I bet our results are using far from optimal for dqn+RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.