Code used to obtain these results can be found at the url https://github.com/NervanaSystems/simple_dqn, commit 31a92a9.
This code runs with Neon commit 344372b. Training and test scripts are included in scripts.txt
above.
Note that for training, the screen width and screen height must be specified as (40, 52).
Default training paramters are used as set in src/main.py
. This model was trained for 77 epochs which will take roughly 15 hours to train on a Titan X GPU.
learning_rate=0.00025
: Learning ratediscount_rate=0.99
: Discount rate for future rewardsbatch_size=32
: Batch size for neural networkoptimizer=rmsprop
: Network optimization algorithmdecay_rate=0.95
: Decay rate for RMSProp algorithmclip_error=1
: Clip error term in update.train_steps=250000
: How many training steps per epochepochs=77
: How many epochs to run
I finished reproducing this result (Gym link.)
My machine has 16
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
, 64GB memory, onUbuntu 14.04
(trusty), with aTitan Z
and NVIDIA driver352.39
. One epoch of training with this machine takes 27 minutes. 77 epochs of training add up to 35 hours.Here is my output:
After upload to Gym we also get the standard deviation: Best 100-episode average reward was 73.65 ± 4.58.
This is slightly lower than the reported 86.95 ± 5.10 by the original submission.
I also ran the code until epoch 98 instead of the reported 77. With that model checkpoint I obtain an average reward of 119.8 (submission here)