The solution is an adaptation of DQN (as described in nips paper) to the RAM state.
The code can be found at https://github.com/sygi/deep_q_rl/tree/gym-only-ram, commit d8eeacb4b6b836c71c11231e9c2ba6e029f7eea1. It is based on the Nathan Sprague dqn implementation.
To run the code you should execute:
python run_gym.py -e 100 --env-name GAME_NAME-ram-v0 --network-type big_ram (--results-path PATH_FOR_GYM_EVALUATIONS)
The network architecture and the other things we've tried are explained in our paper. The difference in the results come from the facts that:
- the Gym environment is a bit more challenging, with stochastic number of steps which the action is executed
- in the paper we report the best average episode score in one epoch (of 10 000 steps), instead of best 100 average episode score
- the Gym chooses a random seed, so there's some randomness in the process
In this experiment I used frameskip of 2 (meaning that one action is executed 4-8 times) for all of the games except Seaquest, where I set frameskip to 10 (to simulate the frameskip 30 we used in the paper at some point). All the rest of hyperparameters are set in the run_gym.py
file.
I encourage you to play with the code, execute it on the untested games, change the network architecture, hyperparams, etc. I bet our results are using far from optimal for dqn+RAM.