Generated using dqn+cts implementation from https://github.com/steveKapturowski/tensorflow-rl Trained for 60M agent steps with 16 agents and final epsilon values sampled from (.1, .5, .01) as in the async q learning method from Asynchronous Methods for Deep Reinforcement Learning
Evaluation was produced using an epsilon of .01 and q learning updates continued to be performed every 16 agent steps with a learning rate of 4e-7 to aid in exploration. Since I neglected to save the rmsprop variables, performance was degraded compared to the mean score around 2200-2500 observed by the .01 agents during the main training phase