bhavikngala/deep_learning_important_points.md

## deep_learning_important_points.md

      
    Raw
  

              deep_learning_important_points.md
            
          
"any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation/test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake."
[http://cs231n.github.io/neural-networks-2/#datapre]


Read [https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607]


Search for good hyperparameters with random search (not grid search). Stage your search from coarse (wide hyperparameter ranges, training only for 1-5 epochs), to fine (narrower rangers, training for many more epochs). [https://cs231n.github.io/neural-networks-3/#summary]


During training, monitor the loss, the training/validation accuracy, and if you’re feeling fancier, the magnitude of updates in relation to parameter values (it should be ~1e-3), and when dealing with ConvNets, the first-layer weights. [https://cs231n.github.io/neural-networks-3/#summary]


Read answer: [https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn?answertab=votes#tab-top]


Read CS231n CNNs: [https://cs231n.github.io/]


Very imp: [https://forums.fast.ai/t/things-jeremy-says-to-do/36682]


Mixed precision training - FP16 + FP32: check details in blog [https://medium.com/@sureshr/loc2vec-a-fast-pytorch-implementation-2b298072e1a7]


Transfer learning even when datasets are not related. Read: [https://arxiv.org/abs/1902.07208]


Backpropagation explaination: [http://neuralnetworksanddeeplearning.com/chap2.html]


"We show that transfer learning gives only minimal final performance gains, but significantly improves convergence speed. In this, weight scaling plays a key role: we identify Mean Var initialization as a simple way to use only scaling information from the pretrained weights, but to achieve substantial convergance gains, equal to i.i.d. sampling from the full empirical distribution.
We also compare the representations obtained through the imagenet pretraining, Mean Var Init, and randdom Init. At lower layers, the different approaches yield very different representations, with Random Init and Mean Var Init not learning Gabor filters." [https://arxiv.org/abs/1902.07208]. So I think even the network structure is a different from the network on which Imagenet is trained, we can still take the mean-var stats from that network and initialize the new network weights.


Find a good learning rate: [https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html]


1-cycle policy: [https://sgugger.github.io/the-1cycle-policy.html#the-1cycle-policy]


NN in numpy: [https://sgugger.github.io/a-simple-neural-net-in-numpy.html#a-simple-neural-net-in-numpy]


Convolution in numpy: [https://sgugger.github.io/convolution-in-depth.html#convolution-in-depth]


Fast.ai — An infinitely customizable training loop with Sylvain Gugger [https://www.youtube.com/watch?v=roc-dOSeehM&feature=youtu.be&t=1]


Resnet fast.ai - [https://twitter.com/jeremyphoward/status/1115036889818341376?s=08] [https://twitter.com/jeremyphoward/status/1115044602606538757?s=08]


Linux commands [https://www.youtube.com/playlist?list=PLdfA2CrAqQ5kB8iSbm5FB1ADVdBeOzVqZ]


Full stack deep learning: [https://fullstackdeeplearning.com/march2019]


Andrej karpathy: Recipe for training NN - [https://karpathy.github.io/2019/04/25/recipe/]


TODO: Stochastic weights averaging