Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
http://arxiv.org/abs/1603.00748
Refer to code for hyperparameters.
Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
http://arxiv.org/abs/1603.00748
Refer to code for hyperparameters.
Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
http://arxiv.org/abs/1511.06581
Refer to code for hyperparameter values.
Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
http://arxiv.org/abs/1511.06581
Command line:
python duel.py CartPole-v0 --gamma 0.995
./train.sh Pong-v0 --evironment gym
.results/Pong-v0.csv
for best performing epoch (in my case it was 81)../test_gym.sh snapshots/Pong-v0_81.pkl
(replace 81 with your best epoch)../upload_gym.sh results/Pong-v0 --api_key <your_key>
to upload the results.The Simple DQN implementation uses network architecture and hyperparameters from DeepMind Nature paper.
Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
http://arxiv.org/abs/1603.00748
The command line used was:
python naf.py InvertedPendulum-v1 --batch_norm --optimizer_lr 0.0001 --noise fixed --noise_scale 0.01 --tau 1 --l2_reg 0.001 --batch_size 1000
Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
http://arxiv.org/abs/1511.06581
Refer to code for hyperparameter values.
./train.sh Breakout-v0 --evironment gym
.results/Breakout-v0.csv
for best performing epoch (in my case it was 61)../test_gym.sh snapshots/Breakout-v0_61.pkl
(replace 61 with your best epoch)../upload_gym.sh results/Breakout-v0 --api_key <your_key>
to upload the results.The Simple DQN implementation uses network architecture and hyperparameters from DeepMind Nature paper.
Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
http://arxiv.org/abs/1603.00748
The command line used was:
python naf.py Pendulum-v0 --l2_reg 0.001
Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
http://arxiv.org/abs/1511.06581
Command line:
python duel.py Acrobot-v0
Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
http://arxiv.org/abs/1511.06581
Command line:
python duel.py MountainCar-v0