Batch Normalization 和 Layer Normalization 是深度学习中常用的两种归一化方法,都可以用以下公式表示:
其主要区别就在于
-
Batch Normalization 是在整个 mini-batch 上计算 feature vector 的均值和方差。
例如,假设输入是
$(N, C)$ ,那么$\mathrm{E}[x]$ 和$\mathrm{Var}[x]$ 的形状就是$(C,)$ ;如果输入是$(N, C, H, W)$ ,那么$\mathrm{E}[x]$ 和$\mathrm{Var}[x]$ 的形状仍然是$(C,)$ 。 -
Layer Normalization 是计算 mini-batch 内每个 Sample 自己的均值和方差。
例如,假设输入是
$(N, C)$ ,那么$\mathrm{E}[x]$ 和$\mathrm{Var}[x]$ 的形状就是$(N,)$ 。
我们可以使用 Batch Normalization 来计算 Layer Normalization:将输入视作 batch size = 1,然后将原 batch size 维度放到新的 feature (channel) 维度上,即:
此时,对
具体代码如下:
from functools import reduce
from operator import mul
def layer_norm(input, normalized_shape, weight=None, bias=None, eps=1e-05):
"""Layer Normalization implemented by Batch Normalization"""
# We need to reshape input with shape [*, normalized_shape[0], normalized_shape[1], ..., normalized_shape[-1]]
# to input with shape [1, n, normalized_shape[0]*normalized_shape[1]*...*normalized_shape[-1]]
L = reduce(mul, normalized_shape, 1)
C = input.numel() // L
input_reshaped = input.view(1, C, L)
# Do a batch normalization over (N, C, L)
output = F.batch_norm(input_reshaped, None, None, weight, bias, training=True, eps=eps)
return output.view(input.shape)
与 PyTorch 内置的 Layer Normalization 进行对比测试:
>>> import torch
>>> import torch.nn.functional as F
>>> x = torch.rand(2, 3, 4)
>>> F.layer_norm(x, [3, 4])
tensor([[[-0.3145, 1.1580, -0.5216, -1.0527],
[-1.6420, 1.9395, 1.4807, -0.0406],
[-0.5166, -0.5084, 0.0056, 0.0125]],
[[-0.3949, -1.0793, -0.2180, 0.8326],
[ 0.5099, 1.7204, -0.2157, 0.6653],
[-1.3991, 1.3354, -0.1912, -1.5653]]])
>>> layer_norm(x, [3, 4])
tensor([[[-0.3145, 1.1580, -0.5216, -1.0527],
[-1.6420, 1.9395, 1.4807, -0.0406],
[-0.5166, -0.5084, 0.0056, 0.0125]],
[[-0.3949, -1.0793, -0.2180, 0.8326],
[ 0.5099, 1.7204, -0.2157, 0.6653],
[-1.3991, 1.3354, -0.1912, -1.5653]]])