Skip to content

Instantly share code, notes, and snippets.

@PaulEmmanuelSotir
Last active March 9, 2020 18:56
Show Gist options
  • Save PaulEmmanuelSotir/7d71017157e425ffb9eefb8159b6c165 to your computer and use it in GitHub Desktop.
Save PaulEmmanuelSotir/7d71017157e425ffb9eefb8159b6c165 to your computer and use it in GitHub Desktop.
A non-exhaustive Deep Learning techniques mind-map (december 2019, WIP)

A non-exhaustive Deep Learning techniques mind-map (december 2019)

Released under open-source MIT License.
By Pau-Emanuel SOTIR paul-emanuel@outlook.com

Neural network architectures

  • Fully connected networks = Dense neural networks (FC DNN) (Warning, could be confused with FCN: Fully Convolutional Networks or with some residual networks (e.g. "Densely Connected Networks" (DenseNet), which are only dense in terms of residual links between layer blocks))
  • CNN: Convolutionnal Neural Networks (Mainly brougth by Yahn LeCun)
    • a-trou-convolutions = upconv = upsampling convolutions = dilated convolutions (popularized by DeepLab)
    • U-net = encoder-decoder = could be interpreted as specialization of residual links in fully convlutionnal network follwed by a-trou convolutions)
    • DCN: Deformable Convolution Networks: Infered offsets aaplied to next convolution layer's kernels position. (infered from a specific layer after previous convolutions's feature maps)
  • Recurrent architectures (RNN)
    • Gated RNNs
      • LSTM
    • RCNN
  • AutoEncoders (often used as generative models)
    • Variational AutoEncoders (VAE)
  • Adversarial architectures
    • Generative adversarial netorks (often used as generative models)
  • Ensembling, stacking and siamese networks
  • Attention mechanisms

in-architecture regularization and other architecture-related techniques

  • Batch-normalization
  • Dropout = could be seen as ensembling of sampled sub-neural network architectures
    • binary or real-valued dropout
    • regular dropout or layer-wise / filter-wise / residual block dropout
  • Residual links: Adds "shortcut" links between layers (often applied to Convolutional Neural Networks) Concatenated / Additive Not-Gated / Gated = weigthed Densely connected
  • Padding
  • Pooling
  • Auxiliary losses = could be interpreted as 'special' residual links from network layers to output
  • Model size reduction
    • distillation = teacher-student methods
    • compression
    • quantization (binary / 16-Bit / 8bit / ect...)
      • inference-time
      • at training time (more difficult but could permits faster training time)

Training techniques

  • learning rate
    • cos / exp decay
    • cycles
      • one cycle policy (Used in various SOTA papers of 2019 in combination with AdamW)
      • warm restart
    • multiple learning rates at once (for parts of NN, residual-block-scale, layers-wise, ect...)
    • scheduling: multiple constant learning rates ()
    • See also adaptive training algorithms (but doesn't replace learning rate techniques above, e.g. AdaGrad)
  • loss-related techniques
    • L1 / L2 regularization = Lasso / Ridge regularization = weight decay = weight penalty
    • ...
  • Optimizers: Stochastic Gradient Descent (SGD) algorithms
    • Momentum
      • constant
      • momentum scheduling or decaying
      • adaptive: see techniques below (Adam ; )
    • RMSprop
    • Adam = RMSprop + Momentum
      • AdamW (Used in various SOTA papers of 2019 in combination with one cycle learning rate policy)
    • AdaGrad: adapts lr for each dimensions (w)
    • Natural gradient descent
    • Second degree and other optimizers (TODO: refactor this part for better understanding of these classes of algorithms)
      • Newton optimmization (too expensive for regular neural nets)
      • Hessian approximation techniques (TODO: refactor this part for better understanding of these classes of algorithms)
        • Conjugate gradient
        • Hessian-free optimization
      • Conjugate gradient = conjugate of jacobian allows approximation of the hessian if ??? (TODO: ix this / remind the thoughts about this)
  • Pretraining and weight initialization
    • Finetunning of pre-trained models
    • Greedy lawer-wise pretraining
  • Data augmentation techniques
  • Active learning and boosting methods

Visualization, debuging, model interpretation and inference explanation techniques

  • Convolution fiter visualization
    • Deep dream and its consequences..
      • Deep-art _^o^_/
  • Uncertainty estimation
    • Bla bla bla ... E.g. Variance from mutiple output inferences sampled from neural models interpreted as stacked bayesian networks
  • ...

Some Data-specific techniques

  • Images
    • Generation
    • Treatment / segmentation / denoising /
    • Classification / Detection /
  • Tabular data
    • NLP
      • ... you may also be intererested in section 4 "Generalization vs Memorization" of OpenAI's paper: Language Models are Unsupervised Multitask Learners which investingates overlaps between WebText and common NLP dataset trainsets using 8-gram bloom filters. (This paper also gives some insights in how language models learns: "We demonstrate that language models begin to learn these tasks without any explicit supervision when rained on (...) WebText"
  • Timeseries / Sound / few-dimentions frenquency domain
  • Other structured data : Graphs / trees / ect
  • Inference on heterogeneous data
  • Data embedding techniques
  • Missing data and errors
  • high dimensional sparse data (e.g. recemandation systems, consumer churn rates, telemetry, sparse boolean matrices)

Some other task-specific techniques

  • Unsupervised, weakly supervised (see also finetunning of pre-trained models in "training techniques" section)
    • self supervised learning
    • zero-shot / one-shot / few-shot learning
    • metalearning
    • fine-tuning of pretrained models
  • Deep reinforcement learning
    • MCTS with CNNs as value and policy functions
  • Online training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment