- Levine & Koltun (2013). Guided policy search: deep RL with importance sampled policy gradient (unrelated to later discussion of guided policy search)
- Schulman, L., Moritz, Jordan, Abbeel (2015). Trust region policy optimization: deep RL with natural policy gradient and adaptive step size
- Schulman, Wolski, Dhariwal, Radford, Klimov (2017). Proximal policy optimization algorithms: deep RL with importance sampled policy gradient
- Mnih, Badia, Mirza, Graves, Lillicrap, Harley, Silver, Kavukcuoglu (2016). Asynchronous methods for deep reinforcement learning: A3C -- parallel online actor-critic
- Schulman, Moritz, L., Jordan, Abbeel (2016). High-dimensional continuous control using generalized advantage estimation: batch-mode actor-critic with blended Monte Carlo and function approximator returns
- Gu, Lillicrap, Ghahramani, Turner, L. (2017). Q-Prop: sample-efficient policygradient with an off-policy critic: policy gradient with Q-function control variate
- Lange, Riedmiller. (2010). Deep auto-encoder neural networks in reinforcement learning: early image-based Q-learning method using autoencoders to construct embeddings
- Mnih et al. (2013). Human-level control through deep reinforcement learning: Qlearning with convolutional networks for playing Atari.
- Van Hasselt, Guez, Silver. (2015). Deep reinforcement learning with double Q-learning: a very effective trick to improve performance of deep Q-learning
- Lillicrap et al. (2016). Continuous control with deep reinforcement learning: continuous Q-learning with actor network for approximate maximization
- Gu, Lillicrap, Stuskever, L. (2016). Continuous deep Q-learning with model-based acceleration: continuous Q-learning with action-quadratic value functions
- Wang, Schaul, Hessel, van Hasselt, Lanctot, de Freitas (2016). Dueling network architectures for deep reinforcement learning: separates value and advantage estimation in Q-function
- Finn et al. ICML ’16. Guided Cost Learning. Sampling based method for MaxEnt IRL that handles unknown dynamics and deep reward functions
- Wulfmeier et al. arXiv ’16. Deep Maximum Entropy Inverse Reinforcement Learning. MaxEnt inverse RL using deep reward functions
- Ho & Ermon NIPS ’16. Generative Adversarial Imitation Learning. Inverse RL method using generative adversarial networks
- Stadie, Levine, Abbeel (2015). Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models
- Osband, Blundell, Pritzel, Van Roy. (2016). Deep Exploration via Bootstrapped DQN
- Fu, Co-Reyes, Levine. (2017). EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
- Haarnoja*, Tang*, Abbeel, Levine. (2017). Reinforcement Learning with Deep Energy-Based Policies
- Tobin, Fong, Ray, Schneider, Zaremba, Abbeel. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
- Fu, Levine, Abbeel. (2016). One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
- Model-agnostic meta-learning (Finn et al. ‘17)