radekosmulski/norms.md

## norms.md

      
    Raw
  

              norms.md
            
          
    Summary of Fastai.ai's in-depth discussion of types of normalization

# simulated batch of images
x = torch.rand(64, 3, 224, 224)
# or some number of layers up the convolutional stack
x = torch.rand(64, 256, 32, 32)
BatchNorm:
m = x.mean((0,2,3), keepdim=True) # keepdim=True facilitates broadcasting
v = x.var ((0,2,3), keepdim=True)
Calculate layer statistics across all examples separately for each channel.

milestone technique enabling various networks to train or train better / faster
as batch size tends towards 1, training becomes unstable to become impossible (what is the variance of a single example?)
not obvious how to use in RNNs

LayerNorm
m = x.mean((1,2,3), keepdim=True)
v = x.var ((1,2,3), keepdim=True)

Calculate the statistics for each example separately, across all channels

"batch normalization cannot be applied to online learning tasks or to extremely large distributed models where the minibatches have to be small"


can be applied in RNNs
can throw out potentially useful information

InstanceNorm
m = x.mean((2,3), keepdim=True)
v = x.var ((2,3), keepdim=True)

No concept of running stats anymore. Calculate stats for each example for each channel separately.

throws out even more information than LayerNorm
applies to fast stylization

GroupNorm

In this paper, we present Group Normalization (GN) as
a simple alternative to BN. GN divides the channels into
groups and computes within each group the mean and variance for normalization. GN’s computation is independent
of batch sizes, and its accuracy is stable in a wide range
of batch sizes. On ResNet-50 trained in ImageNet, GN has
10.6% lower error than its BN counterpart when using a
batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants.


supports small batches
can be used in transfer learning where the target task requires small batches (e.g. image segmentation)

Summary of all norms (from the GroupNorm paper):

  
## norms.png

      
    Raw
  

              norms.png