ewanlee/rl-papers.md

## rl-papers.md

      
    Raw
  

              rl-papers.md
            
          
    Policy Gradient


Levine & Koltun (2013). Guided policy search: deep RL with importance sampled policy gradient (unrelated to later discussion of guided policy search)
Schulman, L., Moritz, Jordan, Abbeel (2015). Trust region policy optimization: deep RL with natural policy gradient and adaptive step size
Schulman, Wolski, Dhariwal, Radford, Klimov (2017). Proximal policy optimization algorithms: deep RL with importance sampled policy gradient

Actor-Critic


Mnih, Badia, Mirza, Graves, Lillicrap, Harley, Silver, Kavukcuoglu (2016). Asynchronous methods for deep reinforcement learning: A3C -- parallel online actor-critic
Schulman, Moritz, L., Jordan, Abbeel (2016). High-dimensional continuous control using generalized advantage estimation: batch-mode actor-critic with blended Monte Carlo and function approximator returns
Gu, Lillicrap, Ghahramani, Turner, L. (2017). Q-Prop: sample-efficient policygradient with an off-policy critic: policy gradient with Q-function control variate

Q-Learning


Lange, Riedmiller. (2010). Deep auto-encoder neural networks in reinforcement learning: early image-based Q-learning method using autoencoders to construct embeddings
Mnih et al. (2013). Human-level control through deep reinforcement learning: Qlearning with convolutional networks for playing Atari.
Van Hasselt, Guez, Silver. (2015). Deep reinforcement learning with double Q-learning: a very effective trick to improve performance of deep Q-learning
Lillicrap et al. (2016). Continuous control with deep reinforcement learning: continuous Q-learning with actor network for approximate maximization
Gu, Lillicrap, Stuskever, L. (2016). Continuous deep Q-learning with model-based acceleration: continuous Q-learning with action-quadratic value functions
Wang, Schaul, Hessel, van Hasselt, Lanctot, de Freitas (2016). Dueling network architectures for deep reinforcement learning: separates value and advantage estimation in Q-function

Inverse Reinforcement Learning


Finn et al. ICML ’16. Guided Cost Learning. Sampling based method for MaxEnt IRL that handles unknown dynamics and deep reward functions
Wulfmeier et al. arXiv ’16. Deep Maximum Entropy Inverse Reinforcement Learning. MaxEnt inverse RL using deep reward functions
Ho & Ermon NIPS ’16. Generative Adversarial Imitation Learning. Inverse RL method using generative adversarial networks

Exploration


Stadie, Levine, Abbeel (2015). Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models
Osband, Blundell, Pritzel, Van Roy. (2016). Deep Exploration via Bootstrapped DQN
Fu, Co-Reyes, Levine. (2017). EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Transfer Learning & Meta-Learning


Haarnoja*, Tang*, Abbeel, Levine. (2017). Reinforcement Learning with Deep Energy-Based Policies
Tobin, Fong, Ray, Schneider, Zaremba, Abbeel. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Fu, Levine, Abbeel. (2016). One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
Model-agnostic meta-learning (Finn et al. ‘17)