Used normalized advantage functions (NAF) from this paper:
Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
http://arxiv.org/abs/1603.00748
The command line used was:
python naf.py InvertedPendulum-v1 --batch_norm --optimizer_lr 0.0001 --noise fixed --noise_scale 0.01 --tau 1 --l2_reg 0.001 --batch_size 1000