Skip to content

Instantly share code, notes, and snippets.

View benelot's full-sized avatar
😄

Benjamin Ellenberger benelot

😄
  • IDSC.io
  • Bern, Switzerland
View GitHub Profile
@benelot
benelot / README.md
Created October 19, 2016 20:43 — forked from tambetm/README.md

Used normalized advantage functions (NAF) from this paper:

Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
http://arxiv.org/abs/1603.00748

The command line used was:

python naf.py InvertedPendulum-v1 --batch_norm --optimizer_lr 0.0001 --noise fixed --noise_scale 0.01 --tau 1 --l2_reg 0.001 --batch_size 1000