A slightly modified deep Q learning approach is used from this paper. Requires chainer.
To reproduce run code below with python 2.7; It will run training and monitor of the environment. Training data and some videos will be saved in "pendulum" folder near the script file.
Continuous space is discretized with 11 different actions.
It appears that there are some convergence problems; Maybe better selection of parameters would lead to a better objective value.