Skip to content

Instantly share code, notes, and snippets.

@petermchale
Last active October 24, 2022 01:47
Show Gist options
  • Save petermchale/ddb3e3ffa427238f683a40983765b964 to your computer and use it in GitHub Desktop.
Save petermchale/ddb3e3ffa427238f683a40983765b964 to your computer and use it in GitHub Desktop.
How conv1d works in Keras
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how `conv1d` works in Keras and is based upon Jussi Huotari's [excellent article](http://www.jussihuotari.com/2017/12/20/spell-out-convolution-1d-in-cnns/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1D Convolution in Numpy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start by manually computing the result of convolving a 1D filter with an input sequence. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Conv1d input: [0 3 4 5]\n",
"[0 3] * [1 2] = [0 6]\n",
"[3 4] * [1 2] = [3 8]\n",
"[4 5] * [1 2] = [ 4 10]\n",
"Conv1d output: [6, 11, 14]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"conv1d_filter = np.array([1,2])\n",
"toyX = np.array([0, 3, 4, 5])\n",
"print('Conv1d input:', toyX)\n",
"\n",
"result = []\n",
"for i in range(3):\n",
" print(toyX[i:i+2], \"*\", conv1d_filter, \"=\", toyX[i:i+2] * conv1d_filter)\n",
" result.append(np.sum(toyX[i:i+2] * conv1d_filter))\n",
"\n",
"print(\"Conv1d output:\", result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, suppose we are given the input and output and asked to determine the convolution kernel. One approach is to use the \"Convolution Theorem\", e.g. as done [here](https://stackoverflow.com/questions/8478276/finding-the-convolution-kernel-in-matlab). \n",
"\n",
"Another approach is to guess the kernel, estimate how far the corresponding predicted output is from the true output and shift the elements of the kernel in the direction that makes that \"loss\" smaller. In other words, use gradient descent..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing a 1D convolution filter using Keras"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape of conv filter (2, 1, 1)\n",
"\n",
"conv weights\n",
"------------\n",
" 0 [-0.72896165 0.5631139 ] \t {'loss': [10.098454475402832]}\n",
" 20 [0.27028498 1.5623608 ] \t {'loss': [3.7698917388916016]}\n",
" 40 [0.9788118 2.2267246] \t {'loss': [0.9194195866584778]}\n",
" 60 [0.98986524 2.041007 ] \t {'loss': [0.11650880426168442]}\n",
" 80 [0.9991782 1.9600047] \t {'loss': [0.1810016632080078]}\n",
"100 [0.99584895 1.9986793 ] \t {'loss': [0.02970091439783573]}\n",
"120 [1.0025234 1.9819707] \t {'loss': [0.01458613108843565]}\n",
"140 [0.9962751 2.0011797] \t {'loss': [0.008762359619140625]}\n",
"160 [0.9995124 1.996057 ] \t {'loss': [0.031674545258283615]}\n",
"180 [1.0029459 1.9817123] \t {'loss': [0.014955838210880756]}\n"
]
}
],
"source": [
"from keras import backend as K\n",
"from keras.models import Sequential\n",
"from keras.optimizers import Adam\n",
"from keras.layers import Conv1D\n",
"\n",
"K.clear_session()\n",
"\n",
"# input to Keras' conv1d has shape number_examples X length_of_sequence X number_input_channels\n",
"toyX = np.array([0, 3, 4, 5]).reshape(1,4,1)\n",
"\n",
"# just re-using the convolutional output in the preceding example\n",
"toyY = np.array([6, 11, 14]).reshape(1,3,1)\n",
"\n",
"# https://keras.io/layers/convolutional/#conv1d\n",
"# before Keras 2.0 the layers were called ConvolutionalND for N = 1, 2, 3, and since Keras 2.0 they are just called ConvND.\n",
"toyModel = Sequential([ \n",
" Conv1D(filters=1, kernel_size=2, strides=1, padding='valid', use_bias=False, input_shape=(4,1), name='c1d')\n",
"])\n",
"\n",
"# https://github.com/keras-team/keras/blob/dcb5e3cf152977209c7122f6b07d6c86fb4cf1e6/keras/losses.py#L99\n",
"toyModel.compile(optimizer=Adam(lr=5e-2), loss='mae')\n",
"\n",
"# kernel has dimension kernel_size X number_input_channels X number_output_channels\n",
"print('shape of conv filter', toyModel.layers[0].get_weights()[0].shape)\n",
"print('')\n",
"\n",
"print('conv weights')\n",
"print('------------')\n",
"for epoch in range(200):\n",
" history = toyModel.fit(toyX, toyY, verbose=0)\n",
" if epoch%20 == 0:\n",
" print(\"{:3d} {} \\t {}\".format(epoch, \n",
" toyModel.layers[0].get_weights()[0][:,0,0], \n",
" history.history))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The convolution weights gravitate towards the expected values. \n",
"\n",
"Key to the success of this approach is having enough data. Concretely, each position of the filter in the input sequence corresponds to an equation involving all of the components in the filter. To uniquely solve for those components, it is necessary to have at least as many equations as there are components. The number of equations is determined by the length of the input sequence and by the number of those sequences. \n",
"\n",
"For example, in our case, the filter has 2 components, so it is necessary to have at least 2 equations. There are 3 unique positions in a single example, so there are 3 equations for 2 unknowns, and so the system of equations is overdetermined, and it is likely that there is only one solution, which gradient descent finds easily. \n",
"\n",
"If we had less data, e.g. the sequence length was shorter, or if the model was more complex, e.g. the filter was longer, then the system of equations would have been under-determined, with many possible solutions, corresponding to local optima in the space of parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finding a 1D convolution filter when there is more than one input channel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An input sequence may have more than one \"channel\". Can gradient descent find the corresponding filter, given an input and output sequence? "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape of conv filter (3, 2, 1)\n",
"conv weights\n",
"------------\n",
" 0 [[ 0.00561469 -0.2564376 ]\n",
" [ 0.6226566 0.52996373]\n",
" [-0.7588456 0.62725854]] \t {'loss': [10.279986381530762]}\n",
"100 [[0.2864395 0.4370375 ]\n",
" [0.3900503 0.67185867]\n",
" [0.96525764 0.6868093 ]] \t {'loss': [0.6920822262763977]}\n",
"200 [[0.3153597 0.41625863]\n",
" [0.26372367 0.7739245 ]\n",
" [1.8895061 0.11112271]] \t {'loss': [0.1427873820066452]}\n",
"300 [[0.53008467 0.28652686]\n",
" [0.1805524 0.8582995 ]\n",
" [1.9257807 0.06928745]] \t {'loss': [0.11246975511312485]}\n",
"400 [[0.69915605 0.17858563]\n",
" [0.12399938 0.9194845 ]\n",
" [1.9783874 0.06550355]] \t {'loss': [0.40661606192588806]}\n",
"500 [[0.77650887 0.13201742]\n",
" [0.08307225 0.91956806]\n",
" [1.984738 0.03154069]] \t {'loss': [0.05280423164367676]}\n",
"600 [[0.86430496 0.05792169]\n",
" [0.04980132 0.9653948 ]\n",
" [1.9934295 0.01223389]] \t {'loss': [0.18477745354175568]}\n",
"700 [[0.90510756 0.03846198]\n",
" [0.03247704 0.9632356 ]\n",
" [1.9716064 0.00365077]] \t {'loss': [0.14356191456317902]}\n",
"800 [[0.94708073 0.04317535]\n",
" [0.02377948 0.9836444 ]\n",
" [2.0122368 0.01379483]] \t {'loss': [0.14440886676311493]}\n",
"900 [[ 9.4742602e-01 1.2048477e-02]\n",
" [ 1.3440009e-02 9.7038239e-01]\n",
" [ 1.9877554e+00 -2.2073463e-04]] \t {'loss': [0.034459590911865234]}\n",
"prediction on training data: [[[13.965635 ]\n",
" [19.93774 ]\n",
" [13.795604 ]\n",
" [14.953772 ]\n",
" [ 9.876331 ]\n",
" [ 3.9535434]]]\n"
]
}
],
"source": [
"K.clear_session()\n",
"\n",
"toyX = np.array([[0, 0], [3, 6], [4, 7], [5, 8], [1, 2], [4, 5], [2, 0], [0, 1]]).reshape(1,8,2) # single input sequence with 8 steps and 2 channels\n",
"toyY = np.array([14, 20, 14, 15, 10, 4]).reshape(1,6,1) # obtained by convolving the input sequence with the filter [[1,0], [0,1], [2,0]]\n",
"toyModel = Sequential([\n",
" Conv1D(filters=1, kernel_size=3, strides=1, padding='valid', use_bias=False, input_shape=(8,2), name='c1d')\n",
"])\n",
"toyModel.compile(optimizer=Adam(lr=5e-2), loss='mae')\n",
"\n",
"print('shape of conv filter', toyModel.layers[0].get_weights()[0].shape)\n",
"print('conv weights')\n",
"print('------------')\n",
"for epoch in range(1000):\n",
" history = toyModel.fit(toyX, toyY, verbose=0)\n",
" if epoch%100 == 0:\n",
" # kernel has dimension: kernel_size X number_input_channels X number_output_channels\n",
" print(\"{:3d} {} \\t {}\".format(epoch, \n",
" toyModel.layers[0].get_weights()[0][:,:,0], \n",
" history.history))\n",
"print('prediction on training data:', toyModel.predict(toyX))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Gradient descent found the correct filter. \n",
"\n",
"Again, that was possible because the system of equations for the filter components is over-determined, i.e., there are enough data to constrain the problem. Notice that I increased the length of the input sequence relative to the single-channel example in order to compensate for the fact that the filter now has a greater number of components. An alternative approach would have been to increase the number of examples fed to the training algorithm."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finding multiple 1D convolution filters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The final element of complexity is to consider multiple filters, each mapping the input sequence to a seperate output sequence. Gradient descent in Keras gets the job done here too, even though the system of equations is under-determined..."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape of conv filter (2, 1, 2)\n",
"conv weights\n",
"------------\n",
" 0 [[-0.6103726 -0.8287409 ]\n",
" [-0.15447119 -0.89825535]] \t {'loss': [20.767751693725586]}\n",
"100 [[1.0252689 3.6927207]\n",
" [2.0115454 3.7466788]] \t {'loss': [0.6135830879211426]}\n",
"200 [[0.9909381 2.9846911]\n",
" [1.9965166 3.9881284]] \t {'loss': [0.08037328720092773]}\n",
"300 [[0.9794749 2.9899173]\n",
" [1.9887626 3.9881496]] \t {'loss': [0.05458291247487068]}\n",
"400 [[0.9978222 3.0115416]\n",
" [1.999234 4.0161667]] \t {'loss': [0.03423619270324707]}\n",
"500 [[0.9978582 2.9963021]\n",
" [1.9992551 3.9971995]] \t {'loss': [0.011512279510498047]}\n",
"600 [[0.99785465 3.0019803 ]\n",
" [1.9992567 4.0074735 ]] \t {'loss': [0.016494354233145714]}\n",
"700 [[0.9978543 3.0014691]\n",
" [1.9992565 3.9961095]] \t {'loss': [0.01576344110071659]}\n",
"800 [[1.0117548 3.0033073]\n",
" [2.0094244 4.0087605]] \t {'loss': [0.032050926238298416]}\n",
"900 [[1.0040337 3.0018487]\n",
" [1.9964517 3.995188 ]] \t {'loss': [0.02338552474975586]}\n",
"prediction on training data: [[[ 6.011879 12.008711]\n",
" [11.044497 24.983978]\n",
" [14.05801 31.97767 ]]]\n"
]
}
],
"source": [
"K.clear_session()\n",
"\n",
"toyX = np.array([0, 3, 4, 5]).reshape(1,4,1)\n",
"\n",
"# convolve two filters, [1, 2] and [3, 4], with the above sequence to yield:\n",
"toyY = np.array([[6, 12], [11, 25], [14, 32]]).reshape(1,3,2) \n",
"\n",
"toyModel = Sequential([\n",
" Conv1D(filters=2, kernel_size=2, strides=1, padding='valid', use_bias=False, input_shape=(4,1), name='c1d')\n",
"])\n",
"toyModel.compile(optimizer=Adam(lr=5e-2), loss='mae')\n",
"\n",
"print('shape of conv filter', toyModel.layers[0].get_weights()[0].shape)\n",
"print('conv weights')\n",
"print('------------')\n",
"for epoch in range(1000):\n",
" history = toyModel.fit(toyX, toyY, verbose=0)\n",
" if epoch%100 == 0:\n",
" # kernel has dimension: kernel_size X number_input_channels X number_output_channels\n",
" print(\"{:3d} {} \\t {}\".format(epoch, \n",
" toyModel.layers[0].get_weights()[0][:,0,:], \n",
" history.history))\n",
"print('prediction on training data:', toyModel.predict(toyX))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.3"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment