Skip to content

Instantly share code, notes, and snippets.


Bhavik N Gala bhavikngala

  • Seattle, WA.
View GitHub Profile
  1. "any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation/test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake." []

  2. Read []

  3. Search for good hyperparameters with random search (not grid search). Stage your search from coarse (wide hyperparameter ranges, training only for 1-5 epochs), to fine (narrower rangers, training for many more epochs). []

  4. During training, monitor the loss, the training/validation accuracy, and if you’re feeling fancier, the magnitude of updates in relation to parameter values (it should be ~1e-3), and when dealing with ConvNets, the first-layer weights. [

bhavikngala /
Last active Mar 27, 2021
This gist contains a list of important points from "practical deep learning for coders" and "cutting edge deep learning for coders" MOOC

This gist contains a list of points I found very useful while watching the "Practical deep learning for coders" and "Cutting edge deep learning for coders" MOOC by Jeremy Howard and team. This list may not be complete as I watched the video at 1.5x speed on marathon but I did write down as many things I found to be very useful to get a model working. A fair warning the points are in no particular order, you may find the topics are all jumbled up.

Before beginning, I want to thank Jeremy Howard, Rachel Thomas, and the entire team in making this awesome practically oriented MOOC.

  1. Progressive image resolution training: Train the network on lower res first and then increase the resolution to get better performance. This can be thought of as transfer learning from the same dataset but at a different resolution. There is one paper by NVIDIA as well that used such an approach to train GANs.

  2. Cyclical learning rates: Gradually increasing the learning rate initially helps to avoid getting stuc