Skip to content

Instantly share code, notes, and snippets.

@deontologician
Last active July 29, 2017 07:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save deontologician/8a91d0f3bf74d2a15e1f84483e5a6872 to your computer and use it in GitHub Desktop.
Save deontologician/8a91d0f3bf74d2a15e1f84483e5a6872 to your computer and use it in GitHub Desktop.
DL/RL notes

Notes on Deep Reinforcement Learning

Meta

Deep learning is a lot of old techniques that work suddenly because GPUs are super powerful.

Supervised learning

  • Function approximation
  • Loss functions
  • Backpropagation
    • Chain rule
    • Gradients
  • Stochastic gradient descent
    • Minibatches
    • Optimizers
      • Momentum
      • Adam (use this)
  • Layers
    • Fully connected layers (aka Affine layer / FC layer)
    • Convolutional layers (CNN)
      • kernel size
      • stride
      • filters
      • padding (zero/same)
    • Recurrent Layers (RNN)
      • LSTM
      • GRU (use this)
  • Activation
    • Sigmoid
    • Tanh
    • Rectified Linear Units (ReLU) (use this)
  • Regularization
    • L2 normalization
    • Dropout (use this)
    • Parameter noise (use this)
  • Normalization
    • Batch Normalization
    • Layer Normalization
  • Initialization
    • Xavier initialization
    • He initialization (use this)    - RNN initialization (open question)
  • Problems    - Exploding/Vanishing       - LSTM/GRU for RNNs - ReLU helps at activation layer - Initialization helps - batch norm helps
    • Dead ReLUs
      • Some modified Relus possible
    • Overfitting
      • Regularization combats this

Reinforcement Learning

  • MDPs
  • Bellman equation
  • Value functions (Q function)
  • Policies
  • Actor/Critic
  • Deep Q Learning
    • experience replay
    • off-policy learning
  • UNREAL paper
    • unsupervised auxiliary tasks
  • DARLA paper

Unsupervised methods

  • Sparse Autoencoders
  • Denoising Autoencoders
  • Variational Autoencoders (use this)
    • problem: blurriness
    • problem: math very involved
  • Generative Adversarial Networks
    • problem: mode collapse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment