Create a gist now

Instantly share code, notes, and snippets.

@sygi /dqn-ram-v0.md
Last active Jun 13, 2016

The solution is an adaptation of DQN (as described in nips paper) to the RAM state.

The code can be found at https://github.com/sygi/deep_q_rl/tree/gym-only-ram, commit d8eeacb4b6b836c71c11231e9c2ba6e029f7eea1. It is based on the Nathan Sprague dqn implementation.

To run the code you should execute:

python run_gym.py -e 100 --env-name GAME_NAME-ram-v0 --network-type big_ram (--results-path PATH_FOR_GYM_EVALUATIONS)

The network architecture and the other things we've tried are explained in our paper. The difference in the results come from the facts that:

  • the Gym environment is a bit more challenging, with stochastic number of steps which the action is executed
  • in the paper we report the best average episode score in one epoch (of 10 000 steps), instead of best 100 average episode score
  • the Gym chooses a random seed, so there's some randomness in the process

In this experiment I used frameskip of 2 (meaning that one action is executed 4-8 times) for all of the games except Seaquest, where I set frameskip to 10 (to simulate the frameskip 30 we used in the paper at some point). All the rest of hyperparameters are set in the run_gym.py file.

I encourage you to play with the code, execute it on the untested games, change the network architecture, hyperparams, etc. I bet our results are using far from optimal for dqn+RAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment