Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save chicobentojr/d20dd040ff957d24d43a94cdf92e913e to your computer and use it in GitHub Desktop.
Save chicobentojr/d20dd040ff957d24d43a94cdf92e913e to your computer and use it in GitHub Desktop.
"A Recipe for Training Neural Networks" - Andrej Karpathy

A Recipe for Training Neural Networks - Andrej Karpathy

  1. Neural net training is a leaky abstraction
  2. Neural net training fails silently

The Recipe

  1. Become one with the data
  2. Set up the end-to-end training/evaluation skeleton + get dumb baseline
    1. fix random seed
    2. simplify
      1. Data augmentation
        1. Data Augmentation | How to use Deep Learning when you have Limited Data — Part 2
        2. Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech
        3. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
          1. http://people.csail.mit.edu/junyanz/
        4. Deep Photo Style Transfer
        5. NanoNets
    3. add significant digits to your eval
    4. verify loss @ init
    5. init well
    6. human baseline
    7. input-indepent baseline
    8. overfit one batch
    9. verify decreasing training loss
    10. visualize just before the net
    11. visualize prediction dynamics
    12. use backprop to chart dependencies
    13. generalize a special case
  3. Overfit
    1. picking the model
    2. adam is safe
    3. complexify only one at a time
    4. do not trust learning rate decay defaults
  4. Regularize
    1. get more data
    2. data augment
    3. creative augmentation
      1. Learning Dexterity
      2. Playing for Data: Ground Truth from Computer Games
      3. Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
    4. pretrain
    5. stick with supervised learning
    6. smaller input dimensionality
    7. smaller model size
    8. decrease the batch size
    9. drop
      1. Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift
    10. weight decay
    11. early stopping
    12. try a larger model
  5. Tune
    1. random over grid search
    2. hyper-parameter optimization
      1. Random Search for Hyper-Parameter Optimization
  6. Squeeze out the juice
    1. ensembles
      1. Distilling the Knowledge in a Neural Network
    2. leave it training
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment