Skip to content

Instantly share code, notes, and snippets.

@ewanlee
Created August 17, 2018 02:27
Show Gist options
  • Save ewanlee/6b82e9f4243b7f3b2bfb915bba07fd60 to your computer and use it in GitHub Desktop.
Save ewanlee/6b82e9f4243b7f3b2bfb915bba07fd60 to your computer and use it in GitHub Desktop.
Some reinforcement learning papers that need to be read

Policy Gradient

  • Levine & Koltun (2013). Guided policy search: deep RL with importance sampled policy gradient (unrelated to later discussion of guided policy search)
  • Schulman, L., Moritz, Jordan, Abbeel (2015). Trust region policy optimization: deep RL with natural policy gradient and adaptive step size
  • Schulman, Wolski, Dhariwal, Radford, Klimov (2017). Proximal policy optimization algorithms: deep RL with importance sampled policy gradient

Actor-Critic

  • Mnih, Badia, Mirza, Graves, Lillicrap, Harley, Silver, Kavukcuoglu (2016). Asynchronous methods for deep reinforcement learning: A3C -- parallel online actor-critic
  • Schulman, Moritz, L., Jordan, Abbeel (2016). High-dimensional continuous control using generalized advantage estimation: batch-mode actor-critic with blended Monte Carlo and function approximator returns
  • Gu, Lillicrap, Ghahramani, Turner, L. (2017). Q-Prop: sample-efficient policygradient with an off-policy critic: policy gradient with Q-function control variate

Q-Learning

  • Lange, Riedmiller. (2010). Deep auto-encoder neural networks in reinforcement learning: early image-based Q-learning method using autoencoders to construct embeddings
  • Mnih et al. (2013). Human-level control through deep reinforcement learning: Qlearning with convolutional networks for playing Atari.
  • Van Hasselt, Guez, Silver. (2015). Deep reinforcement learning with double Q-learning: a very effective trick to improve performance of deep Q-learning
  • Lillicrap et al. (2016). Continuous control with deep reinforcement learning: continuous Q-learning with actor network for approximate maximization
  • Gu, Lillicrap, Stuskever, L. (2016). Continuous deep Q-learning with model-based acceleration: continuous Q-learning with action-quadratic value functions
  • Wang, Schaul, Hessel, van Hasselt, Lanctot, de Freitas (2016). Dueling network architectures for deep reinforcement learning: separates value and advantage estimation in Q-function

Inverse Reinforcement Learning

  • Finn et al. ICML ’16. Guided Cost Learning. Sampling based method for MaxEnt IRL that handles unknown dynamics and deep reward functions
  • Wulfmeier et al. arXiv ’16. Deep Maximum Entropy Inverse Reinforcement Learning. MaxEnt inverse RL using deep reward functions
  • Ho & Ermon NIPS ’16. Generative Adversarial Imitation Learning. Inverse RL method using generative adversarial networks

Exploration

  • Stadie, Levine, Abbeel (2015). Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models
  • Osband, Blundell, Pritzel, Van Roy. (2016). Deep Exploration via Bootstrapped DQN
  • Fu, Co-Reyes, Levine. (2017). EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Transfer Learning & Meta-Learning

  • Haarnoja*, Tang*, Abbeel, Levine. (2017). Reinforcement Learning with Deep Energy-Based Policies
  • Tobin, Fong, Ray, Schneider, Zaremba, Abbeel. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
  • Fu, Levine, Abbeel. (2016). One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
  • Model-agnostic meta-learning (Finn et al. ‘17)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment