goddoe/gist:dfd50258c0d6e996da265e0e94e77695

## gistfile1.txt
Batch Norm:

(+) Stable if the batch size is large
(+) Robust (in train) to the scale & shift of input data
(+) Robust to the scale of weight vector
(+) Scale of update decreases while training
(-) Not good for online learning
(-) Not good for RNN, LSTM
(-) Different calculation between train and test
Weight Norm:

(+) Smaller calculation cost on CNN
(+) Well-considered about weight initialization
(+) Implementation is easy
(+) Robust to the scale of weight vector
(-) Compared with the others, might be unstable on training
(-) High dependence to input data
Layer Norm:

(+) Effective to small mini batch RNN
(+) Robust to the scale of input
(+) Robust to the scale and shift of weight matrix
(+) Scale of update decreases while training
(-) Might be not good for CNN (Batch Norm is better in some cases)
	Batch Norm:

	(+) Stable if the batch size is large
	(+) Robust (in train) to the scale & shift of input data
	(+) Robust to the scale of weight vector
	(+) Scale of update decreases while training
	(-) Not good for online learning
	(-) Not good for RNN, LSTM
	(-) Different calculation between train and test
	Weight Norm:

	(+) Smaller calculation cost on CNN
	(+) Well-considered about weight initialization
	(+) Implementation is easy
	(+) Robust to the scale of weight vector
	(-) Compared with the others, might be unstable on training
	(-) High dependence to input data
	Layer Norm:

	(+) Effective to small mini batch RNN
	(+) Robust to the scale of input
	(+) Robust to the scale and shift of weight matrix
	(+) Scale of update decreases while training
	(-) Might be not good for CNN (Batch Norm is better in some cases)