Deep learning is a lot of old techniques that work suddenly because GPUs are super powerful.
- Function approximation
- Loss functions
- Backpropagation
- Chain rule
- Gradients
- Stochastic gradient descent
- Minibatches
- Optimizers
- Momentum
- Adam (use this)
- Layers
- Fully connected layers (aka Affine layer / FC layer)
- Convolutional layers (CNN)
- kernel size
- stride
- filters
- padding (zero/same)
- Recurrent Layers (RNN)
- LSTM
- GRU (use this)
- Activation
- Sigmoid
- Tanh
- Rectified Linear Units (ReLU) (use this)
- Regularization
- L2 normalization
- Dropout (use this)
- Parameter noise (use this)
- Normalization
- Batch Normalization
- Layer Normalization
- Initialization
- Xavier initialization
- He initialization (use this) - RNN initialization (open question)
- Problems
- Exploding/Vanishing
- LSTM/GRU for RNNs
- ReLU helps at activation layer
- Initialization helps
- batch norm helps
- Dead ReLUs
- Some modified Relus possible
- Overfitting
- Regularization combats this
- Dead ReLUs
- MDPs
- Bellman equation
- Value functions (Q function)
- Policies
- Actor/Critic
- Deep Q Learning
- experience replay
- off-policy learning
- UNREAL paper
- unsupervised auxiliary tasks
- DARLA paper
- Sparse Autoencoders
- Denoising Autoencoders
- Variational Autoencoders (use this)
- problem: blurriness
- problem: math very involved
- Generative Adversarial Networks
- problem: mode collapse