Skip to content

Instantly share code, notes, and snippets.

View pkumusic's full-sized avatar

Music Li pkumusic

  • Seattle
View GitHub Profile
Policy-distribution when evaluate
epsilon-0.05 policy when evaluate
Dueling DQN
Double DQN
Nature Deep Q-Network
Linear Double DQN
Linear DQN with memory and target network
DQN with linear layers with no memory and target network
A3C with joint density model
4.77 M steps
2500 Score. Very stable for keeping the score!
outperforms A3C+ in paper (142.5)
Speed same as original A3C. Fast. 11iter/s.
Params. Nature, ScheduledHyperParamSetter('learning_rate', [(80, 0.0003), (120, 0.0001)]),
ScheduledHyperParamSetter('entropy_beta', [(80, 0.005)]),
ScheduledHyperParamSetter('explore_factor',
[(80, 2), (100, 3), (120, 4), (140, 5)]),
Could be improved by testing other params. (Next step)
O-DDQN- Object image swap input