Generated using dqn+cts implementation from https://github.com/steveKapturowski/tensorflow-rl Trained for 80M agent steps with 16 agents and final epsilon values sampled from (.1, .5, .01) as in the async q learning method from Asynchronous Methods for Deep Reinforcement Learning
Double DQN parameters match Deep Reinforcement Learning with Double Q-learning except for the replay buffer size and per-thread epsilon annealing schedule which were both set to 400k. Pseudocount / CTS density model parameters should match those of Unifying Count-Based Exploration and Intrinsic Motivation
Evaluation was produced using an epsilon of .01
Hi there, I am hoping to reproduce this result with your code! In terms of the command to run, so far I've got:
python main.py MontezumaRevenge-v0 --alg_type dqn-cts -n 16 --epsilon_annealing_steps=400000 --replay_size=400000
Am I missing anything? For example, how can I set the final epsilon values to sample from, or is that already built into the codebase? Thanks!