Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save CreatCodeBuild/d24d5ec0888ad27d5f7ff9ecda87729f to your computer and use it in GitHub Desktop.
Save CreatCodeBuild/d24d5ec0888ad27d5f7ff9ecda87729f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Motivation\n",
"I create this notebook to point out common mistakes among students who have done the __\"Build Your First Neural Network\"__ project\n",
"\n",
"__Erros in this notebook are intentional. Do not fix them.__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Linear ALgebra refresher\n",
"Before we dive into deep neural networks, let's review some linear algebra and some numpy gotcha\n",
"\n",
"__This toturial focus on numpy programming, not math.__ \n",
"__For a mathamatical refresher, see [this great tutorial](https://www.youtube.com/watch?v=sYlOjyPyX3g&t=419s)__"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from numpy import array, dot"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[14]\n",
"[[1 4 9]]\n"
]
}
],
"source": [
"# First\n",
"# dot(A, B) != A * B\n",
"A = array([[1,2,3]])\n",
"B = array([1,2,3])\n",
"dot_C = dot(A, B)\n",
"element_wise_c = A * B\n",
"print(dot_C)\n",
"print(element_wise_c)\n",
"\n",
"# dot is the dot product\n",
"# * is the element-wise product"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "ValueError",
"evalue": "shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-8-b76d9186a652>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;31m# this line will fail, because it make no sense to dot product two matrices with shape (1, 3) and (1, 3)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m \u001b[0mdot_C\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mValueError\u001b[0m: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)"
]
}
],
"source": [
"# But, these is a gotcha\n",
"A = array([[1,2,3]])\n",
"B = array([[1,2,3]])\n",
"\n",
"# this line will fail, because it make no sense to dot product two matrices with shape (1, 3) and (1, 3)\n",
"dot_C = dot(A, B)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[14]]\n"
]
}
],
"source": [
"# However\n",
"A = array([[1,2,3]])\n",
"B = array([[1,2,3]])\n",
"\n",
"# This is the same as if B = [1,2,3]. But actually, this is the correct way. \n",
"dot_C = dot(A, B.T)\n",
"print(dot_C)\n",
"\n",
"# dot([[1,2,3]], [1,2,3]) works probably just because it's a numpy short hand.\n",
"# Because mathamatically speaking, it make less sense."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 4]\n",
" [2 5]\n",
" [3 6]]\n",
"[[ 7 0]\n",
" [ 8 1]\n",
" [ 9 2]\n",
" [10 3]]\n"
]
}
],
"source": [
"# Now, let's consider bigger matrix\n",
"A = array([\n",
" [1,2,3],\n",
" [4,5,6]\n",
" ])\n",
"B = array([\n",
" [7,8,9,10],\n",
" [0,1,2,3]\n",
" ])\n",
"\n",
"\n",
"# First let's see that is a transpose matrix .T\n",
"print(A.T)\n",
"print(B.T)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X:\n",
"[[ 7 12 17 22]\n",
" [14 21 28 35]\n",
" [21 30 39 48]]\n",
"\n",
"Y:\n",
"[[ 7 14 21]\n",
" [12 21 30]\n",
" [17 28 39]\n",
" [22 35 48]]\n"
]
}
],
"source": [
"X = dot(A.T, B)\n",
"Y = dot(B.T, A)\n",
"\n",
"print('X:')\n",
"print(X)\n",
"print()\n",
"print('Y:')\n",
"print(Y)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ True True True]\n",
" [ True True True]\n",
" [ True True True]\n",
" [ True True True]]\n",
"[[ True True True True]\n",
" [ True True True True]\n",
" [ True True True True]]\n",
"[[ True True True]\n",
" [ True True True]\n",
" [ True True True]\n",
" [ True True True]]\n",
"[[ True True True True]\n",
" [ True True True True]\n",
" [ True True True True]]\n"
]
}
],
"source": [
"# You see, X and Y are just transpose of each other\n",
"print(X.T == Y)\n",
"print(X == Y.T)\n",
"print(Y == X.T)\n",
"print(Y.T == X)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 7 12 17 22]\n",
" [14 21 28 35]\n",
" [21 30 39 48]]\n"
]
}
],
"source": [
"# Also consider\n",
"A = array([\n",
" [1, 4],\n",
" [2, 5],\n",
" [3, 6],\n",
" ])\n",
"B = array([\n",
" [7,8,9,10],\n",
" [0,1,2,3]\n",
" ])\n",
"\n",
"X = dot(A, B)\n",
"print(X)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "ValueError",
"evalue": "shapes (2,4) and (3,2) not aligned: 4 (dim 1) != 3 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-34-aba7d292d99b>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# But this will fail\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mY\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mY\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mValueError\u001b[0m: shapes (2,4) and (3,2) not aligned: 4 (dim 1) != 3 (dim 0)"
]
}
],
"source": [
"# But this will fail, because the shape doesn't match\n",
"Y = dot(B, A)\n",
"print(Y)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[11 16 21]\n",
" [19 26 33]\n",
" [27 36 45]]\n",
"[[ 50 122]\n",
" [ 14 32]]\n"
]
}
],
"source": [
"# Therefore, dot(A, B) != dot(B, A)\n",
"# also consider\n",
"A = array([\n",
" [1, 4],\n",
" [2, 5],\n",
" [3, 6],\n",
" ])\n",
"B = array([\n",
" [7, 8, 9],\n",
" [1, 2, 3]\n",
" ])\n",
"X = dot(A, B)\n",
"Y = dot(B, A)\n",
"print(X)\n",
"print(Y)\n",
"\n",
"# As you can see, \n",
"# just because 2 operations with the same arguments but different order get computed without error,\n",
"# it doesn't mean the semantics are the same.\n",
"\n",
"# But, for *, A * B == B * A because it's just element-wise!"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[70 80 90]\n",
" [10 20 30]]\n",
"[[70 80 90]\n",
" [10 20 30]]\n",
"[[70 80 90]\n",
" [10 20 30]]\n",
"[[70 80 90]\n",
" [10 20 30]]\n"
]
}
],
"source": [
"# Now, let's consider scalar or 1 unit vector/matrix\n",
"\n",
"# salar\n",
"A = 10\n",
"B = array([\n",
" [7, 8, 9],\n",
" [1, 2, 3]\n",
" ])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[70 80 90]\n",
" [10 20 30]]\n",
"[[70 80 90]\n",
" [10 20 30]]\n"
]
},
{
"ename": "ValueError",
"evalue": "shapes (1,) and (2,3) not aligned: 1 (dim 0) != 2 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-46-00b31614be64>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 8\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 9\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 10\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mValueError\u001b[0m: shapes (1,) and (2,3) not aligned: 1 (dim 0) != 2 (dim 0)"
]
}
],
"source": [
"# 1 unit vector\n",
"A = array([10])\n",
"B = array([\n",
" [7, 8, 9],\n",
" [1, 2, 3]\n",
" ])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A))"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[70 80 90]\n",
" [10 20 30]]\n",
"[[70 80 90]\n",
" [10 20 30]]\n"
]
},
{
"ename": "ValueError",
"evalue": "shapes (1,1) and (2,3) not aligned: 1 (dim 1) != 2 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-47-61d8df2d1573>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 8\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 9\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 10\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mValueError\u001b[0m: shapes (1,1) and (2,3) not aligned: 1 (dim 1) != 2 (dim 0)"
]
}
],
"source": [
"# 1 unit matrix\n",
"A = array([[10]])\n",
"B = array([\n",
" [7, 8, 9],\n",
" [1, 2, 3]\n",
" ])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A))"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[70 80 90]]\n",
"[[70 80 90]]\n",
"[[70 80 90]]\n",
"[[70 80 90]] \n",
"\n",
"1 unit vector\n",
"[[70 80 90]]\n",
"[[70 80 90]]\n",
"[70 80 90]\n"
]
},
{
"ename": "ValueError",
"evalue": "shapes (1,3) and (1,) not aligned: 3 (dim 1) != 1 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-56-9e972ee5436b>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 16\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 17\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mValueError\u001b[0m: shapes (1,3) and (1,) not aligned: 3 (dim 1) != 1 (dim 0)"
]
}
],
"source": [
"# But, what if B has only one layer?\n",
"# saclar\n",
"A = array(10)\n",
"B = array([[7, 8, 9]])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A), '\\n')\n",
"\n",
"# 1-unit vector\n",
"print('1 unit vector')\n",
"A = array([10])\n",
"B = array([[7, 8, 9]])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A))"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 unit matrix\n",
"[[70 80 90]]\n",
"[[70 80 90]]\n",
"[[70 80 90]]\n"
]
},
{
"ename": "ValueError",
"evalue": "shapes (1,3) and (1,1) not aligned: 3 (dim 1) != 1 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-55-6183a2d96180>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mValueError\u001b[0m: shapes (1,3) and (1,1) not aligned: 3 (dim 1) != 1 (dim 0)"
]
}
],
"source": [
"# 1-unit matrix\n",
"print('1 unit matrix')\n",
"A = array([[10]])\n",
"B = array([[7, 8, 9]])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Have you noticed that [10] dot [[7,8,9]] != [[10]] dot [[7,8,9]]"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[70 80 90]\n",
"[70 80 90]\n",
"[70 80 90]\n",
"[70 80 90] \n",
"\n",
"[70 80 90]\n",
"[70 80 90]\n"
]
},
{
"ename": "ValueError",
"evalue": "shapes (1,) and (3,) not aligned: 1 (dim 0) != 3 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-59-714314867e51>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 11\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 12\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m \u001b[1;33m*\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 13\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 14\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mValueError\u001b[0m: shapes (1,) and (3,) not aligned: 1 (dim 0) != 3 (dim 0)"
]
}
],
"source": [
"# Now, what if B is also a vector?\n",
"A = array(10)\n",
"B = array([7, 8, 9])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A), '\\n')\n",
"\n",
"A = array([10])\n",
"B = array([7, 8, 9])\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(B, A))"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"And I will stop here. That's enough numpy matrix multiplication\n"
]
}
],
"source": [
"# The above is another gotcha!!!\n",
"# matrix dot product only make sense when it's a matrix,\n",
"# that is, has shape (x, y), even if x or y could be 1!\n",
"\n",
"print(\"And I will stop here. That's enough numpy matrix multiplication\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now, let's dive into deep neural network\n",
"__Just simple fully connected neural nets. No convolution, no recurrence__ \n",
"For example, if layer A and layer B are adjacent and B is received input from A.\n",
"\n",
"A has 10 nodes and B has 15 nodes.\n",
"\n",
"Then there are 150 connections between A and B. Each connection is an edge if you think of it as a computational graph.\n",
"\n",
"Therefore, B received 150 values(scalars) from A. Each node in B received 10 values from A. Each node in A ouput 15 values to B.\n",
"\n",
"Therefore, no matter it's forward pass or backward pass, 150 values get computed between A and B each pass.\n",
"\n",
"The forward pass computes the activation. The backward pass computes the gradient and updating values.\n",
"\n",
"These 150 edges contains 150 weights, 1 weight each."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]\n",
" [ 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30]\n",
" [ 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45]\n",
" [ 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60]\n",
" [ 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75]\n",
" [ 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90]\n",
" [ 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105]\n",
" [ 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120]\n",
" [ 9 18 27 36 45 54 63 72 81 90 99 108 117 126 135]\n",
" [ 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150]]\n",
"[[ 1 2 3 4 5 6 7 8 9 10]\n",
" [ 2 4 6 8 10 12 14 16 18 20]\n",
" [ 3 6 9 12 15 18 21 24 27 30]\n",
" [ 4 8 12 16 20 24 28 32 36 40]\n",
" [ 5 10 15 20 25 30 35 40 45 50]\n",
" [ 6 12 18 24 30 36 42 48 54 60]\n",
" [ 7 14 21 28 35 42 49 56 63 70]\n",
" [ 8 16 24 32 40 48 56 64 72 80]\n",
" [ 9 18 27 36 45 54 63 72 81 90]\n",
" [ 10 20 30 40 50 60 70 80 90 100]\n",
" [ 11 22 33 44 55 66 77 88 99 110]\n",
" [ 12 24 36 48 60 72 84 96 108 120]\n",
" [ 13 26 39 52 65 78 91 104 117 130]\n",
" [ 14 28 42 56 70 84 98 112 126 140]\n",
" [ 15 30 45 60 75 90 105 120 135 150]]\n"
]
}
],
"source": [
"# Therefore, assuming\n",
"A = array([[1,2,3,4,5,6,7,8,9,10]])\n",
"B = array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]])\n",
"# it only make semantic sense to write\n",
"whatever_been_computed = dot(A.T, B)\n",
"print(whatever_been_computed)\n",
"# or\n",
"whatever_been_computed = dot(B.T, A)\n",
"print(whatever_been_computed)\n",
"\n",
"# no matter it's computing the weights, gradient, error, or what ever intermedia values between 2 layers."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1 2 3 4 5 6 7 8 9 10]\n"
]
},
{
"ename": "ValueError",
"evalue": "shapes (10,) and (15,) not aligned: 10 (dim 0) != 15 (dim 0)",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-67-c890709b7c01>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[1;31m# and this will fall and it hurts\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[0mdot\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mB\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mValueError\u001b[0m: shapes (10,) and (15,) not aligned: 10 (dim 0) != 15 (dim 0)"
]
}
],
"source": [
"# You can't store A and B in plain vectors\n",
"A = array([1,2,3,4,5,6,7,8,9,10])\n",
"B = array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])\n",
"\n",
"# Because A.T make zero impact\n",
"print(A.T)\n",
"# and this will fall and it hurts\n",
"dot(A.T, B)"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]]\n",
"[[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]]\n",
"[[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]]\n",
"[[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]]\n",
"[[ 1]\n",
" [ 2]\n",
" [ 3]\n",
" [ 4]\n",
" [ 5]\n",
" [ 6]\n",
" [ 7]\n",
" [ 8]\n",
" [ 9]\n",
" [10]\n",
" [11]\n",
" [12]\n",
" [13]\n",
" [14]\n",
" [15]]\n",
"[[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]]\n"
]
}
],
"source": [
"# But, if one of you layer has only 1 node\n",
"A = array([[1]])\n",
"B = array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]])\n",
"\n",
"print(A * B)\n",
"print(B * A)\n",
"print(dot(A, B))\n",
"print(dot(A.T, B))\n",
"print(dot(B.T, A)) # oh, and this one's transpose is just\n",
"print(dot(B.T, A).T) # which == all of the first 4 equations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__And the above exmple is the reason some students start to do crazy combinations of `*`, `dot` and `.T`__\n",
"\n",
"Eventually, They will probably gets the shapes matched. But, does that mean they understand how it actually work?\n",
"\n",
"Also, even if the shapes match, the value could be wrong."
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 7 16 13]\n",
" [14 26 23]\n",
" [21 36 33]]\n",
"[[ 49 56 63]\n",
" [112 148 154]\n",
" [189 228 249]]\n"
]
}
],
"source": [
"# For example\n",
"A = array([\n",
" [1,2,3],\n",
" [4,5,6]\n",
" ])\n",
"B = array([\n",
" [7,8,9],\n",
" [0,2,1]\n",
" ])\n",
"\n",
"print(dot(A.T, B))\n",
"print(dot((A * B).T, B))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### End\n",
"I hope this will help you to clear your mind and understand what exactly happend when you are implementing neural netowrks with numpy as your matrix manipulation tool"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
@aluxian
Copy link

aluxian commented Feb 21, 2017

Erros in this notebook are intentional. Do not fix them.

Why?

@CreatCodeBuild
Copy link
Author

@aluxian Sorry that I didn't respond you in time.
This notebook meant to show some incorrect ways to use NumPy and some NumPy gotchas. Therefore, I intentionally put errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment