Skip to content

Instantly share code, notes, and snippets.

@filmo
Last active July 31, 2017 06:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save filmo/cb39d725050d9fff81544f26d4d94c36 to your computer and use it in GitHub Desktop.
Save filmo/cb39d725050d9fff81544f26d4d94c36 to your computer and use it in GitHub Desktop.
Used a 2-layer Fully Connected network with H1=100, H2=60 and ReLU
He Initialization of weights
Adam Optimizer. Initial learning rate = 0.001 Learning rate reduced using gamma of 0.50 every 350 episodes
Gamma = 0.99
Eps = 1.00
Eps Decay = 0.98
Eps Decay every new episode. (not each step)
Uniform sampling from a replay buffer with 150,000 memories. No learning for first 1500 steps. (to fill replay buffer).
I tried to implement prioritized experience replay, but couldn't get it to work. (yet)
States normalized based on emperical mean and std of historical observation.
No reward or gradient clipping.
Target Network updated every 600 steps
Smooth L1 Loss (Huber Loss) rather than MSE.
Implemented in pyTorch and python 3.5. Trained on GTX-1070
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment