I implemented the DQN model from this paper: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf.
I used a simple network with two hidden layers and an output layer, instead of the CNN described in the paper due to the relative simplicity of the Cart-Pole environment compared to Atari games.
Note, that I did not yet implement the target network described in the more recent paper here: https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
The results vary from run to run, sometimes taking 1000 episodes to solve the problem, and at other times taking only 200 episodes.