Skip to content

Instantly share code, notes, and snippets.

@ragulpr
Created February 27, 2019 00:22
Show Gist options
  • Save ragulpr/50b7011e7348944bee1ee160db2fbe0a to your computer and use it in GitHub Desktop.
Save ragulpr/50b7011e7348944bee1ee160db2fbe0a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Is Keras BatchNorm respect Mask?\n",
"\n",
"I've been under the assumption that it *does*, and thought I tested it thoroughly before. I also said so in a [reddit comment](https://www.reddit.com/r/MachineLearning/comments/7yco19/d_does_zero_padding_affect_normalization_output/duff8ls/)\n",
"\n",
"Looking deeper into it, I still *wish it was so*, but really; I can find no proof that it is so. This is a (poor) attempt to answer it.\n",
"\n",
"### Background (as of 2019-02-27)\n",
"* Basically very few tests are testing effect of `mask`. In particular...\n",
"* https://github.com/keras-team/keras/blob/master/tests/keras/layers/normalization_test.py does not test `mask`\n",
"* https://github.com/keras-team/keras/blob/master/keras/layers/normalization.py does not mention `mask`\n",
"* https://github.com/keras-team/keras/blob/d48e97079914d897e82ddcb1a45261ce4415b8ea/keras/backend/tensorflow_backend.py#L1913 does not...\n",
"\n",
"## Expected effect of BN not respecting *mask*\n",
"A very crude intuitive idea[1] of batchnorm is that one centers/scales via `y = (x-x.mean())/x.std()`, so if `x` has nonsense-values where `mask` would tell is it does, this nonsense should probagate into the centering/scaling of coefficients.\n",
"\n",
"\n",
"[1] Note, it's definitely not implemented like this."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using TensorFlow backend.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"keras.__version__ 2.2.0\n",
"theano.__version__ 1.0.1+unknown\n",
"tf.__version__ 1.0.1+unknown\n"
]
}
],
"source": [
"import keras.backend as K\n",
"import keras.layers as L\n",
"from keras.layers.normalization import BatchNormalization\n",
"\n",
"from keras.models import Sequential\n",
"from keras.optimizers import adam\n",
"\n",
"import numpy as np\n",
"\n",
"import keras\n",
"print('keras.__version__',keras.__version__)\n",
"try:\n",
" import theano\n",
" print('theano.__version__',theano.__version__)\n",
"except:\n",
" pass\n",
"import tensorflow as tf\n",
"print('tf.__version__',theano.__version__)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"do_mask = True\n",
"\n",
"def bn_runner(add_nonsense,use_mask, lr = 0.1):\n",
" np.random.seed(1)\n",
"\n",
" model = Sequential()\n",
" model.add(L.InputLayer(input_shape=(1,)))\n",
" mask_value = 3000.1337 \n",
"\n",
" if use_mask:\n",
" model.add(L.Masking(mask_value=mask_value))\n",
"\n",
" model.add(BatchNormalization(axis=-1, momentum=0.95, epsilon=.1))\n",
"\n",
" model.compile(loss='mse', optimizer=adam(lr=lr))\n",
"\n",
" bn_coefs_before = model.layers[-1].get_weights()\n",
" # [gamma, beta, mean, std]\n",
" \n",
" n = 200\n",
"\n",
" x = np.zeros((n,1))\n",
" y = np.zeros((n,1))#np.random.normal(0,1,(n,1)) \n",
" \n",
" if add_nonsense:\n",
" # If `use_mask` we assume this wont be seen/affect stuff\n",
" x[-20:] = mask_value\n",
" \n",
" model.fit(x,y,epochs=200,batch_size = 25, verbose=0)\n",
"\n",
" bn_coefs_after = model.layers[-1].get_weights()\n",
" \n",
" predicted_unique_vals = np.unique(model.predict(np.zeros_like(x)))\n",
" return '[gamma, beta, mean, std] init',bn_coefs_before,'after',bn_coefs_after,'pred output',predicted_unique_vals\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:From /usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
"Instructions for updating:\n",
"Colocations handled automatically by placer.\n",
"WARNING:tensorflow:From /usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
"Instructions for updating:\n",
"Use tf.cast instead.\n"
]
},
{
"data": {
"text/plain": [
"('[gamma, beta, mean, std] init',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([1.], dtype=float32)],\n",
" 'after',\n",
" [array([-1.01612344e-10], dtype=float32),\n",
" array([0.00447437], dtype=float32),\n",
" array([297.11792], dtype=float32),\n",
" array([817455.56], dtype=float32)],\n",
" 'pred output',\n",
" array([0.00447437], dtype=float32))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bn_runner(add_nonsense=True,use_mask=False,lr = 0.1) # assume bn will learn very large vals"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('[gamma, beta, mean, std] init',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([1.], dtype=float32)],\n",
" 'after',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32)],\n",
" 'pred output',\n",
" array([0.], dtype=float32))"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bn_runner(add_nonsense=True,use_mask=True,lr = 0.1) # assume bn centers at 0"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('[gamma, beta, mean, std] init',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([1.], dtype=float32)],\n",
" 'after',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32)],\n",
" 'pred output',\n",
" array([0.], dtype=float32))"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bn_runner(add_nonsense=False,use_mask=False,lr = 0.1) # Best if learned same as above"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('[gamma, beta, mean, std] init',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([1.], dtype=float32)],\n",
" 'after',\n",
" [array([1.], dtype=float32),\n",
" array([0.], dtype=float32),\n",
" array([297.11792], dtype=float32),\n",
" array([817455.56], dtype=float32)],\n",
" 'pred output',\n",
" array([-0.32862207], dtype=float32))"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bn_runner(add_nonsense=True,use_mask=False,lr = 0.) # Best if learned sameish as first"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Results: Inconclusive\n",
"Seems like mask is respected but impossible to find in keras-repo why this would be the case.\n",
"Confounding factor could be how `mask` is respected by loss etc etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@AnirudhDagar
Copy link

Hi, how can I do something similar in PyTorch? I have input to my BatchNorm1d layer for example of shape 8,1630,50,171 where batch size=8, 1630 is the dimension along which I have padding. So for example along that dim I have data like [0,1,2,...1112,0,0,0,0,...0] so after 1112 it is padded with zeros to make it of length 1630 and similarly for all in the batch size.
I also have a corresponding mask.

@ragulpr
Copy link
Author

ragulpr commented Aug 28, 2019

@AnirudhDagar, unless things have happened over in the Torch (and in particular, the CUDA)-community since last year when I checked, Batchnorm does not support masking. Unfortunately, there's many numerical gotchas and BatchNorm has highly optimized low level implementations, so it seems like it wouldn't be very feasible to just write it using the basic Pytorch python API (I've tried, it was slow).

@AnirudhDagar
Copy link

Thanks for the explanation :)

@drussellmrichie
Copy link

This is old but I found this post from reddit, and this is just to say that Keras BatchNormalization does support masking now:

https://keras.io/api/layers/normalization_layers/batch_normalization/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment