Tambet Matiisen tambetm

## Breakout-v0.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / Breakout-v0.md
            
            
              Created
              May 21, 2016 04:44
            
          
Install simple_dqn.
Run ./train.sh Breakout-v0 --evironment gym.
Check results/Breakout-v0.csv for best performing epoch (in my case it was 61).
Run ./test_gym.sh snapshots/Breakout-v0_61.pkl (replace 61 with your best epoch).
Optional: run ./upload_gym.sh results/Breakout-v0 --api_key <your_key> to upload the results.

The Simple DQN implementation uses network architecture and hyperparameters from DeepMind Nature paper.

  
## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / README.md
            
            
              Created
              May 13, 2016 07:17
            
          
    Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

http://arxiv.org/abs/1603.00748
The command line used was:
python naf.py Pendulum-v0 --l2_reg 0.001


## README.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / README.md
            
            
              Last active
              October 19, 2016 20:43
            
          
    Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

http://arxiv.org/abs/1603.00748
The command line used was:
python naf.py InvertedPendulum-v1 --batch_norm --optimizer_lr 0.0001 --noise fixed --noise_scale 0.01 --tau 1 --l2_reg 0.001 --batch_size 1000


## README.md

      
              2 files
            
          
              2 forks
            
          
              1 comment
            
          
              2 stars
            
          
                tambetm
                / README.md
            
            
              Created
              May 7, 2016 12:14
            
          
    Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

http://arxiv.org/abs/1511.06581
Command line:
python duel.py CartPole-v0 --gamma 0.995


## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / README.md
            
            
              Created
              May 7, 2016 12:12
            
          
    Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

http://arxiv.org/abs/1511.06581
Command line:
python duel.py Acrobot-v0


## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / README.md
            
            
              Last active
              May 7, 2016 12:08
            
          
    Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

http://arxiv.org/abs/1511.06581
Command line:
python duel.py MountainCar-v0


## Acrobot-v0.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / Acrobot-v0.md
            
            
              Last active
              October 27, 2017 17:13
            
          
    Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

http://arxiv.org/abs/1511.06581
Refer to code for hyperparameter values.

  
## Pendulum-v0.md

      
              2 files
            
          
              1 fork
            
          
              0 comments
            
          
              1 star
            
          
                tambetm
                / Pendulum-v0.md
            
            
              Last active
              November 1, 2019 09:39
            
          
    Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

http://arxiv.org/abs/1603.00748
Refer to code for hyperparameters.

  
## CartPole-v0.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tambetm
                / CartPole-v0.md
            
            
              Last active
              May 30, 2016 08:48
            
          
    Used dueling network architecture with Q-learning, as outlined in this paper:
Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

http://arxiv.org/abs/1511.06581
Refer to code for hyperparameter values.

  
## Pong-v0.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                tambetm
                / Pong-v0.md
            
            
              Last active
              October 22, 2016 18:16
            
          
Install simple_dqn.
Run ./train.sh Pong-v0 --evironment gym.
Check results/Pong-v0.csv for best performing epoch (in my case it was 81).
Run ./test_gym.sh snapshots/Pong-v0_81.pkl (replace 81 with your best epoch).
Optional: run ./upload_gym.sh results/Pong-v0 --api_key <your_key> to upload the results.

The Simple DQN implementation uses network architecture and hyperparameters from DeepMind Nature paper.