VIME: Variational Information Maximizing Exploration: Paper, Code Deep Deterministic Policy Gradient: [Paper](CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING), Code