Skip to content

Instantly share code, notes, and snippets.

View avalcarce's full-sized avatar

Alvaro avalcarce

  • Nokia Bell-Labs
  • Paris
View GitHub Profile
@avalcarce
avalcarce / README.md
Last active September 24, 2017 17:11
RL DQN solution for MountainCar-v0, CartPole-v0 and CartPole-v1 on OpenAI's Gym

Synopsis

This is a Deep Reinforcement Learning solution to some classic control problems. I've used it to solve MountainCar-v0 problem, CartPole-v0 and [CartPole-v1] (https://gym.openai.com/envs/CartPole-v1) in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. The code is fundamentally a translation of necnec's algorithm with Theano & Lasagne to Tensorflow. I've run it on Python 3.5 under Windows 7.

References

  1. Deep Learning tutorial, David Silver, Google DeepMind.
  2. necnec's algorithm
@avalcarce
avalcarce / README.md
Last active March 27, 2018 14:53
Solving MountainCar-v0

Synopsis

This is a Deep Reinforcement Learning solution to some classic control problems. I've used it to solve MountainCar-v0 problem, CartPole-v0 and [CartPole-v1] (https://gym.openai.com/envs/CartPole-v1) in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

Some of the hyperparameters used in the main.py script to solve MountainCar-v0 have been optained partly through exhaustive search, and partly via Bayesian optimization with Scikit-Optimize. The optimized hyperparameters and their values are:

  • Size of 1st fully connected layer: 198
  • Size of 2nd fully connected layer: 96
  • Learning rate: 2.33E-4
  • Period (in steps) for the update of the target network parameters as per the DQN algorithm: 999
@avalcarce
avalcarce / README.md
Created February 24, 2017 14:06
Solving CartPole-v0 with DQN

Synopsis

This is a Deep Reinforcement Learning solution to the CartPole-v0 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

Some of the hyperparameters used in the main.py script have been optainedvia Bayesian optimization with Scikit-Optimize. The optimized hyperparameters and their values are:

  • Size of 1st fully connected layer: 208
  • Size of 2nd fully connected layer: 71
  • Learning rate: 1.09E-3
  • Period (in steps) for the update of the target network parameters as per the DQN algorithm: 800
@avalcarce
avalcarce / README.md
Created February 27, 2017 09:27
Solving MountainCar-v0 with DQN in the least possible number of learning episodes for a minimum average reward of -110.

Synopsis

This is a Deep Reinforcement Learning solution to some classic control problems. I've used it to solve MountainCar-v0 problem, CartPole-v0 and [CartPole-v1] (https://gym.openai.com/envs/CartPole-v1) in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

Some of the hyperparameters used in the main.py script to solve MountainCar-v0 have been optained via Bayesian optimization with Scikit-Optimize. The optimized hyperparameters and their values are:

  • Size of 1st fully connected layer: 47
  • Size of 2nd fully connected layer: 197
  • Epsilon (as in greedy epsilon exploration) decay factor: 0.8513032459
  • Minimum epsilon: 1.872686e-05
@avalcarce
avalcarce / README.md
Created March 3, 2017 14:43
Solving CartPole-v0 with DQN and Prioritized Experience Replay

Synopsis

This is a Deep Reinforcement Learning solution to the CartPole-v0 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Deep Q Network (DQN) with Prioritized Experience Replay (PER). All hyper parameters have been chosen by hand based on past experience. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are:

@avalcarce
avalcarce / README.md
Created March 3, 2017 17:02
Solving MountainCar-v0 with DQN and Prioritized Experience Replay

Synopsis

This is a Deep Reinforcement Learning solution to the MountainCar-v0 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Deep Q Network (DQN) with Prioritized Experience Replay (PER). All hyper parameters have been chosen by hand based on past experience. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are:

@avalcarce
avalcarce / README.md
Created March 6, 2017 12:09
Solving CartPole-v1 with DQN and Prioritized Experience Replay

Synopsis

This is a Deep Reinforcement Learning solution to the CartPole-v1 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Deep Q Network (DQN) with Prioritized Experience Replay (PER). All hyper parameters have been chosen by hand based on past experience. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are:

@avalcarce
avalcarce / README.md
Created March 7, 2017 13:06
Solving MountainCar-v0 with Double DQN and Prioritized Experience Replay (with proportional prioritization)

Synopsis

This is a Deep Reinforcement Learning solution to the MountainCar-v0 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Double Deep Q Network (DQN) with Prioritized Experience Replay (PER), where the proportional prioritization variant has been implemented. All hyper parameters have been chosen by hand based on several experiments. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are:

@avalcarce
avalcarce / README.md
Created March 7, 2017 13:13
Solving CartPole-v0 with Double DQN and Prioritized Experience Replay (with proportional prioritization)

Synopsis

This is a Deep Reinforcement Learning solution to the CartPole-v0 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Double Deep Q Network (DQN) with Prioritized Experience Replay (PER), where the proportional prioritization variant has been implemented. All hyper parameters have been chosen by hand based on several experiments. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are:

@avalcarce
avalcarce / README.md
Created March 7, 2017 13:15
Solving CartPole-v1 with Double DQN and Prioritized Experience Replay (with proportional prioritization)

Synopsis

This is a Deep Reinforcement Learning solution to the CartPole-v1 environment in OpenAI's Gym. This code uses Tensorflow to model a value function for a Reinforcement Learning agent. I've run it with Tensorflow 1.0 on Python 3.5 under Windows 7.

The algorithm is a Double Deep Q Network (DQN) with Prioritized Experience Replay (PER), where the proportional prioritization variant has been implemented. All hyper parameters have been chosen by hand based on several experiments. However, the learning rate, the priorization exponent alpha and the initial importance sampling exponen beta0 have been optained via Bayesian optimization with Scikit-Optimize.

The hyperparameters are: