sygi/dqn-ram-v0.md

## dqn-ram-v0.md

      
    Raw
  

              dqn-ram-v0.md
            
          
    The solution is an adaptation of DQN (as described in nips paper) to the RAM state.
The code can be found at https://github.com/sygi/deep_q_rl/tree/gym-only-ram, commit d8eeacb4b6b836c71c11231e9c2ba6e029f7eea1.
It is based on the Nathan Sprague dqn implementation.
To run the code you should execute:
python run_gym.py -e 100 --env-name GAME_NAME-ram-v0 --network-type big_ram (--results-path PATH_FOR_GYM_EVALUATIONS)

The network architecture and the other things we've tried are explained in our paper. The difference in the results come from the facts that:

the Gym environment is a bit more challenging, with stochastic number of steps which the action is executed
in the paper we report the best average episode score in one epoch (of 10 000 steps), instead of best 100 average episode score
the Gym chooses a random seed, so there's some randomness in the process

In this experiment I used frameskip of 2 (meaning that one action is executed 4-8 times) for all of the games except Seaquest, where I set frameskip to 10 (to simulate the frameskip 30 we used in the paper at some point). All the rest of hyperparameters are set in the run_gym.py file.
I encourage you to play with the code, execute it on the untested games, change the network architecture, hyperparams, etc. I bet our results are using far from optimal for dqn+RAM.