Skip to content

Instantly share code, notes, and snippets.

@jcausey-astate
Created October 20, 2018 18:20
Show Gist options
  • Save jcausey-astate/4cc7ca4a9b2384fcabc9b4e92575d615 to your computer and use it in GitHub Desktop.
Save jcausey-astate/4cc7ca4a9b2384fcabc9b4e92575d615 to your computer and use it in GitHub Desktop.
An example one- and two-layer neural network from scratch in Python, with helpful references and example links.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Simple Neural Network in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook is based on the following tutorials and examples, and it borrows heavily from their figures, code, and discussion.\n",
"* [Neural Network Implementation](https://peterroelants.github.io/posts/neural-network-implementation-part01/)\n",
"* [Build a Neural Network](https://enlight.nyc/projects/neural-network/)\n",
"* [A Neural Network in 11 lines of Python (Part 1)](https://iamtrask.github.io/2015/07/12/basic-python-network/)\n",
"* [How to build your own Neural Network from scratch in Python](https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6)\n",
"* [Artificial neural networks with Math.](https://medium.com/deep-math-machine-learning-ai/chapter-7-artificial-neural-networks-with-math-bb711169481b)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np, matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is a \"Neuron\" in computing terms anyway?\n",
"\n",
"![Neuron](images/artificial_neuron.jpeg)\n",
"\n",
"(Source: [Artificial neural networks with Math](https://medium.com/deep-math-machine-learning-ai/chapter-7-artificial-neural-networks-with-math-bb711169481b))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each neuron is the sum of the incoming inputs multiplied by the weights for each, then with (maybe) a bit of added bias. In math speak:\n",
"\n",
"$$\n",
"Y = \\sum_{i=0}^{N}{(x_i \\cdot w_i)} + b\n",
"$$\n",
"\n",
"Where $N$ is the number of inputs, each $x_i$ is one input value, and each $w_i$ is one weight; $Y$ is the output value for the neuron. \n",
"\n",
"The value of $Y$ could be anything in the range $(-\\infty,\\infty)$.\n",
"\n",
"But, that range is too wide. We generally want to use an *activation function* to \"compress\" the output range.\n",
"\n",
"Common activation functions include:\n",
"\n",
"| name | formula |\n",
"|----------|--------------------------|\n",
"| Sigmoid | $y = \\frac{1}{1+e^{-x}}$ |\n",
"| Tanh | $y = \\textrm{tanh}(x)$ |\n",
"| Relu | $y = \\textrm{max}(0, x)$ |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Example Problem\n",
"Let's try to train a network to predict the outcome of:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| x0 | x1 | x2 | **Output** |\n",
"|--------|---|---|------------|\n",
"| 0 | 0 | 1 | **0** |\n",
"| 1 | 1 | 1 | **1** |\n",
"| 1 | 0 | 1 | **1** |\n",
"| 0 | 1 | 1 | **0** |\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Sigmoid activation function\n",
"\n",
"![Sigmoid](images/Sigmoid-function.png)\n",
"\n",
"The sigmoid function scales its input into the range $(0,1)$.\n",
"\n",
"The formula is:\n",
"\n",
"$$\n",
"y = \\frac{1}{1+e^{-x}}\n",
"$$\n",
"\n",
"And in Python:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(x):\n",
" return 1/(1+np.exp(-x))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x: -10 y: 0.0000\n",
"x: -8 y: 0.0003\n",
"x: -6 y: 0.0025\n",
"x: -4 y: 0.0180\n",
"x: -2 y: 0.1192\n",
"x: 0 y: 0.5000\n",
"x: 2 y: 0.8808\n",
"x: 4 y: 0.9820\n",
"x: 6 y: 0.9975\n",
"x: 8 y: 0.9997\n"
]
}
],
"source": [
"# Let's try it...\n",
"for i in range(-10, 10, 2):\n",
" print(\"x: %3i y: %.04f\" % (i, sigmoid(i)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we set up our input matrix ($X$) and output matrix ($y$).\n",
"\n",
"$X$ represents the random variables, $y$ represents the dependent variable(s)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# input matrix (ordered so that the output \"ones\" are at the bottom)\n",
"X = np.array([ [0,0,1],\n",
" [0,1,1],\n",
" [1,0,1],\n",
" [1,1,1] ])\n",
" \n",
"# output matrix (order must match X):\n",
"y = np.array([ [0],\n",
" [0],\n",
" [1],\n",
" [1] ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Simple One-Layer Artificial Neural Network\n",
"\n",
"For this example, we will just use a single \"hidden layer\" that is also the *output layer*.\n",
"\n",
"That means we just have one set of weights $W_0$ that will be multiplied by the inputs and summed. Then, we run the result through the Sigmoid function to produce the output.\n",
"\n",
"The network diagram looks like this:\n",
"\n",
"![One layer network](images/one_layer_nn.png)\n",
"\n",
"This network only has one \"layer\", and that is the layer producing the output. So, we directly observe the results coming out of this layer -- the output layer. Thus, we would not say that this network has any *hidden layers* at all. But, it still behaves like a neural network, and it is simple, so it's a good place to start."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initial Weights are random!\n",
"We start by creating the weights matrix $W_0$ with random values that fall in the range $(-1,1)$ with mean $\\mu = 0$."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# initialize weights randomly with mean 0 (range (-1,1))\n",
"W0 = 2*np.random.random((3,1)) - 1 # The (3,1) is the shape: 3 rows, 1 column as shown in the diagram above."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.07718115]\n",
" [ 0.71523045]\n",
" [ 0.6842432 ]]\n"
]
}
],
"source": [
"# Let's see it:\n",
"print(W0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forward propagation\n",
"Let `l` mean \"layer\" in our short variable names here.\n",
"\n",
"We start by loading the inputs into layer 0."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"l0 = X"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we calculate the values at the \"hidden layer\" (layer 1, before activation) by dot-product of the input layer (`l0`) with the weights for the first hidden layer (`W0`)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.6842432 ]\n",
" [1.39947366]\n",
" [0.60706206]\n",
" [1.32229251]]\n"
]
}
],
"source": [
"l1 = np.dot(l0, W0) # Note, we're not using any bias here. Bias = 0\n",
"print(l1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then we apply the activation layer (nonlinearity):"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"output = sigmoid(l1)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.66468508]\n",
" [0.80210035]\n",
" [0.64727033]\n",
" [0.78956287]]\n"
]
}
],
"source": [
"# Let's see it:\n",
"print(output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's pretty bad. But the second value isn't *too* bad. The last one is *really really bad*."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.66468508]\n",
" [-0.80210035]\n",
" [ 0.35272967]\n",
" [ 0.21043713]]\n"
]
}
],
"source": [
"# Now calculate the error:\n",
"output_error = y - output\n",
"print(output_error)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The error reflects that observation, some are worse than others, and one is not too bad."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Backpropagation\n",
"To train the network, we need to *backpropogate* the error through the hidden layers, updating the weights as we go.\n",
"\n",
"But, to figure out how much each weight contributed, since it has passed through a nonlinearity (the activation function), we need to \"undo\" that transformation by applying the derivative of the activation function.\n",
"\n",
"The derivative of sigmoid turns out to be:\n",
"\n",
"$$\n",
"\\textrm{sigmoid}^\\prime(x) = \\textrm{sigmoid}(x) * (1-\\textrm{sigmoid}(x))\n",
"$$\n",
"\n",
"To see why, [read this article](https://beckernick.github.io/sigmoid-derivative-neural-network/) and [this one](https://www.analyticsindiamag.com/beginners-guide-neural-network-math-python/)."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def sigmoid_deriv(x):\n",
" # According to the equation above, this function would look like:\n",
" # return sigmoid(x) * (1 - sigmoid(x))\n",
" # BUT: Remember, we passed the result of l1 = l0 * W0 through the sigmoid\n",
" # function already to produce our `output`. So the value that \n",
" # will be passed to this function (x, which is `output`): \n",
" # `output` == sigmoid(l1); we don't need to re-do the sigmoid!\n",
" return x * (1-x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So to figure out how much error was due to each weight, we can do:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.14814423]\n",
" [-0.1273217 ]\n",
" [ 0.08053222]\n",
" [ 0.03496483]]\n"
]
}
],
"source": [
"# Put the output error in terms of the hidden layer output (i.e. undo the sigmoid function):\n",
"output_delta = output_error * sigmoid_deriv(output)\n",
"print(output_delta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now figure out how much of that error was due to each of the weights in $W_0$:\n",
"\n",
"This works by dotting the inputs (transposed) by the weights, to give an amount of error associated with each weight."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0.11549706]\n",
" [-0.09235687]\n",
" [-0.15996887]]\n"
]
}
],
"source": [
"err_per_weight = np.dot(l0.T, output_delta)\n",
"print(err_per_weight)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we simply adjust the weights by this amount:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Weights now: \n",
" [[0.03831591]\n",
" [0.62287358]\n",
" [0.52427433]]\n"
]
}
],
"source": [
"W0 += err_per_weight\n",
"print(\"Weights now: \\n\", W0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Note*: Normally, you would want to move more slowly, so you would include a *learning rate* in the calculation of the error. The learning rate is a number in the range $(0,1)$ that scales down the amount of error that gets backpropogated on each update. Larger learning rates (nearer to 1) converge faster by taking larger \"steps\", smaller learning rates converge more slowly. There are plenty of resources online to help you get a sense for the tradeoffs involved. Here, there is no learning rate, so it is essentially clamped to 1.\n",
"\n",
"Now, we do the whole process over and over again to move down the error gradient closer to the correct answer. \n",
"\n",
"This is the concept of *gradient descent*.\n",
"\n",
"### Let's bundle up the forward propagation step:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def forward(X, W0):\n",
" l0 = X\n",
" l1 = np.dot(l0, W0) # Note, we're not using any bias here. Bias = 0\n",
" output = sigmoid(l1)\n",
" return output"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.6281467 ]\n",
" [0.75898959]\n",
" [0.63705166]\n",
" [0.76592879]]\n"
]
}
],
"source": [
"output = forward(X, W0)\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.6281467 ],\n",
" [-0.75898959],\n",
" [ 0.36294834],\n",
" [ 0.23407121]])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y - output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, that is much better!\n",
"\n",
"Let's bundle up the backpropagation step as well. This function will update the weight matrix in-place."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"def backprop(l0, W0, output, y):\n",
" output_error = y - output\n",
" output_delta = output_error * sigmoid_deriv(output)\n",
" err_per_weight = np.dot(l0.T, output_delta)\n",
" W0 += err_per_weight"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define a function to compute the Mean Square Error (MSE), so we can track progress."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"def MSE(truth, estimate):\n",
" return np.mean(np.square(truth-estimate))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE after second training step: 0.289\n"
]
}
],
"source": [
"print(\"MSE after second training step: %.3f\" % MSE(y, forward(X,W0)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now create a \"Train\" function that will do a forward and backward pass."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"def train(X, y, W0):\n",
" output = forward(X, W0)\n",
" backprop(X, W0, output, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's iterate 100 more times to watch it train:\n",
"\n",
"To be fair to our later comparision, we will re-initialize the weights so that we are starting over from scratch..."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"i: 0 MSE before: 0.22029 after: 0.19996\n",
"i: 10 MSE before: 0.09415 after: 0.08781\n",
"i: 20 MSE before: 0.05207 after: 0.04961\n",
"i: 30 MSE before: 0.03418 after: 0.03298\n",
"i: 40 MSE before: 0.02486 after: 0.02417\n",
"i: 50 MSE before: 0.01930 after: 0.01887\n",
"i: 60 MSE before: 0.01566 after: 0.01537\n",
"i: 70 MSE before: 0.01312 after: 0.01291\n",
"i: 80 MSE before: 0.01125 after: 0.01109\n",
"i: 90 MSE before: 0.00983 after: 0.00970\n",
"i: 99 MSE before: 0.00881 after: 0.00871\n"
]
}
],
"source": [
"W0 = 2*np.random.random((3,1)) - 1\n",
"\n",
"for i in range(100):\n",
" mse_before = MSE(y, forward(X,W0))\n",
" train(X, y, W0)\n",
" mse_after = MSE(y, forward(X,W0))\n",
" if i % 10 == 0 or i == 99:\n",
" print(\"i: %3i MSE before: %0.5f after: %0.5f\" % (i, mse_before, mse_after))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Two-Layer Network\n",
"\n",
"Now, let's design a 2-layer neural network and compare its results on the same problem.\n",
"\n",
"Here is the diagram. Note that the matrix dimensions required for the weights is shown below the weight boxes. Also, there is an activation function (Sigmoid) at each neuron, although it is not shown in the diagram for simplicity.\n",
"\n",
"![Two layer ANN](images/two_layer_nn.png)\n",
"\n",
"This network *does* have a single **_hidden layer_**. The outputs from the hidden layer become inputs to the output layer. This is the simplest neural network architecture that you will find in practice."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are the initial weights for this network. Again, we randomly generate them just as before. Note the sizes of the matrices."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"W0 = 2 * np.random.random((3,2)) - 1\n",
"W1 = 2 * np.random.random((2,1)) - 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here they are:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.36481251 -0.37203586]\n",
" [-0.81881763 0.65717183]\n",
" [-0.15073624 0.62975796]]\n",
"\n",
"[[-0.81306733]\n",
" [ 0.94892801]]\n"
]
}
],
"source": [
"print(W0)\n",
"print()\n",
"print(W1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use a list `[W0, W1]` to represent the weights for the whole network in function calls for simplicity:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"NN2 = [W0, W1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define the forward and backward propogration functions:\n",
"\n",
"In this version, we will put the \"network\" first in the arg list."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"def _forward(NN, X):\n",
" '''\n",
" This version returns the intermediate values\n",
" as well as the outputs. We need it for backprop\n",
" later.\n",
" '''\n",
" l0 = X\n",
" l1 = sigmoid( np.dot(l0, NN[0]) )\n",
" l2 = sigmoid( np.dot(l1, NN[1]) )\n",
" output = l2\n",
" return (output, l1, l0)\n",
"\n",
"def forward(NN, X):\n",
" '''\n",
" The forward function makes one forward pass \n",
" and returns only the outputs.\n",
" '''\n",
" return _forward(NN, X)[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is simple enough. It starts just like before --- note that we are chaining the dot product and the sigmoid for brevity, but the steps are the same.\n",
"\n",
"The backpropagation function is a bit more complex, but it follows from the process above: Undo the sigmoid, determine the error due to each layer's weights, update the weights."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"def backprop(NN, X, y):\n",
" # forward pass and get intermediate values\n",
" output, l1, l0 = _forward(NN, X) \n",
" # the backward pass looks like this:\n",
" # output_err --(sig')--> W1 --(sig')--> W0\n",
" # (err........>delta) (err..>delta)\n",
" \n",
" # Calculate the errors and deltas\n",
" output_err = y - output\n",
" w1_delta = output_err * sigmoid_deriv(output)\n",
" \n",
" w1_err = np.dot(w1_delta, NN[1].T)\n",
" w0_delta = w1_err * sigmoid_deriv(l1)\n",
" \n",
" # Update the weights\n",
" NN[0] += np.dot(X.T, w0_delta)\n",
" NN[1] += np.dot(l1.T, w1_delta)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now create a train function that will do a forward and a backward pass:\n",
"\n",
"*Note*: Since our new backprop function actually computes the forward pass first, there is no need to call `forward()` separately here -- this version of `backprop` is essentially `train`!"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"def train(NN, X, y):\n",
" backprop(NN, X, y)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"i: 0 MSE before: 0.26109 after: 0.25533\n",
"i: 10 MSE before: 0.22680 after: 0.22415\n",
"i: 20 MSE before: 0.19386 after: 0.18958\n",
"i: 30 MSE before: 0.14395 after: 0.13847\n",
"i: 40 MSE before: 0.09312 after: 0.08885\n",
"i: 50 MSE before: 0.05867 after: 0.05614\n",
"i: 60 MSE before: 0.03882 after: 0.03738\n",
"i: 70 MSE before: 0.02735 after: 0.02650\n",
"i: 80 MSE before: 0.02036 after: 0.01982\n",
"i: 90 MSE before: 0.01583 after: 0.01547\n",
"i: 99 MSE before: 0.01301 after: 0.01275\n"
]
}
],
"source": [
"for i in range(100):\n",
" mse_before = MSE(y, forward(NN2,X))\n",
" train(NN2, X, y)\n",
" mse_after = MSE(y, forward(NN2,X))\n",
" if i % 10 == 0 or i == 99:\n",
" print(\"i: %3i MSE before: %0.5f after: %0.5f\" % (i, mse_before, mse_after))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait a minute -- shouldn't the more complex network be more accurate? Well, each weight is a *parameter* in the model. With more parameters to train, more training is required.\n",
"\n",
"Let's do 900 more epochs to make it an even 1000:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"i: 100 MSE before: 0.01275 after: 0.01249\n",
"i: 200 MSE before: 0.00365 after: 0.00362\n",
"i: 300 MSE before: 0.00196 after: 0.00195\n",
"i: 400 MSE before: 0.00130 after: 0.00130\n",
"i: 500 MSE before: 0.00096 after: 0.00096\n",
"i: 600 MSE before: 0.00076 after: 0.00076\n",
"i: 700 MSE before: 0.00062 after: 0.00062\n",
"i: 800 MSE before: 0.00053 after: 0.00053\n",
"i: 900 MSE before: 0.00045 after: 0.00045\n",
"i: 999 MSE before: 0.00040 after: 0.00040\n"
]
}
],
"source": [
"for i in range(100,1000):\n",
" mse_before = MSE(y, forward(NN2,X))\n",
" train(NN2, X, y)\n",
" mse_after = MSE(y, forward(NN2,X))\n",
" if i % 100 == 0 or i == 999:\n",
" print(\"i: %3i MSE before: %0.5f after: %0.5f\" % (i, mse_before, mse_after))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Deeper models (more parameters) also means the model has potential to learn a more complex function, so let's make one up. \n",
"\n",
"## (Slightly) more complex function\n",
"\n",
"For this problem, we will simulate a dataset that represents values produced according to the \n",
"linear function $y = 2x_0 + 1x_1 + 0x_3 + d$. In fact, we will let $d$ be zero, since our network\n",
"doesn't have a bias parameter, so we need the y-intercept to be at 0. \n",
"This gives us two variables that \"matter\", and one that does not.\n",
"\n",
"We will generate samples by first generating values for the $x$ variables, then we will compute the \n",
"corresponding $y$ outputs, and add a bit of random noise so that the values look more like a natural\n",
"process."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"# Define the vector of input samples as x0, x1, x2, \n",
"# with 100 samples each, sampled from a uniform distribution \n",
"# between 0 and 1.\n",
"X2 = np.random.uniform(low=0, high=1, size=(100,3))\n",
"\n",
"# Define a function f that represents the line that generates the output\n",
"# (response) variable \"perfectly\", without noise. We will add noise after \n",
"# calculating the output.\n",
"def f(x): \n",
" result = np.array(x).astype(float)\n",
" result[:,0] *= 2.0 # 2x_0\n",
" result[:,1] *= 1.0 # 1x_1\n",
" result[:,2] *= 0.0 # 0x_2 (this variable doesn't matter at all)\n",
" return np.sum(x, axis=1) # sum to create the y values from 2x_0 + 1x_1 + 0x_2 + 0\n",
"\n",
"# Now add some Gaussian noise to the \"perfect\" response values:\n",
"noise_stdev = 0.2 # sigma of the noise; we'll keep it small\n",
"# Gaussian noise error for each sample in x\n",
"noise = np.random.randn(X2.shape[0]) * noise_stdev\n",
"# Create the response variable `y`:\n",
"y2 = f(X2) + noise\n",
"# Make the y response a column vector:\n",
"y2 = np.reshape(y2,(100,1))"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.89004455 0.69871283 0.82819791]\n",
" [0.05595717 0.01774589 0.63341429]\n",
" [0.26457044 0.6574532 0.90494819]\n",
" [0.20133986 0.8807249 0.16668785]\n",
" [0.42231825 0.97681259 0.54426161]\n",
" [0.5442228 0.53389683 0.01207421]\n",
" [0.18495185 0.80140111 0.94399568]\n",
" [0.73185868 0.30016923 0.35056698]\n",
" [0.4664781 0.4363943 0.37252552]]\n",
"\n",
"[[2.50751857]\n",
" [0.36895071]\n",
" [1.8154392 ]\n",
" [1.31503355]\n",
" [1.73178501]\n",
" [0.93544818]\n",
" [1.83700519]\n",
" [1.40086866]\n",
" [1.52521015]]\n"
]
}
],
"source": [
"print(X2[1:10,:])\n",
"print()\n",
"print(y2[1:10,:])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encapsulate the Networks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To compare the two networks, it will be better to have them encapsulated a bit more (we over-wrote the weights and training functions when we created the second one earlier).\n",
"\n",
"To do this, we will create Python classes for each."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"class SingleLayerNN:\n",
" \"\"\"\n",
" Single layer neural network\n",
" \"\"\"\n",
" def __init__(self, X, y, activations=(\"sigmoid\",), learning_rate=1.):\n",
" if not isinstance(activations, tuple):\n",
" activations = (activations,) # Make sure that we have a tuple, even if a string is passed\n",
" self.learning_rate = learning_rate\n",
" self.X = X\n",
" self.y = y\n",
" self.W0 = 2 * np.random.random((3,1)) - 1\n",
" self.l0 = X\n",
" self.output = None\n",
" available_activations = {\n",
" \"sigmoid\": self.sigmoid,\n",
" \"linear\": self.linear,\n",
" \"tanh\": self.tanh\n",
" }\n",
" print(\"Using {} activation.\".format(activations[0]))\n",
" self.activation0 = available_activations[activations[0]]\n",
"\n",
" def sigmoid(self, x, deriv=False):\n",
" if deriv:\n",
" return x * (1-x)\n",
" return 1/(1+np.exp(-x))\n",
"\n",
" def linear(self, x, deriv=False):\n",
" if deriv:\n",
" return 1\n",
" return x\n",
"\n",
" def tanh(self, x, deriv=False):\n",
" if deriv:\n",
" return (1 - np.square(x))\n",
" else:\n",
" return np.tanh(x) \n",
" \n",
" def forward(self, X=None):\n",
" if X is None:\n",
" X = self.X\n",
" self.l0 = X\n",
" self.output = self.activation0(np.dot(self.l0, self.W0))\n",
" return self.output\n",
"\n",
" def backprop(self):\n",
" o_err = self.learning_rate * (self.y - self.output)\n",
" w0_delta = o_err * self.activation0(self.output, True)\n",
" self.W0 += np.dot(self.l0.T, w0_delta)\n",
" \n",
" def train(self, epochs = 1):\n",
" for i in range(epochs):\n",
" self.forward()\n",
" self.backprop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now the 2-layer network:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"class TwoLayerNN:\n",
" \"\"\"\n",
" Two layer neural network\n",
" \"\"\"\n",
" def __init__(self, X, y, activations=(\"sigmoid\", \"sigmoid\"), learning_rate=1.):\n",
" self.learning_rate = learning_rate\n",
" self.X = X\n",
" self.y = y\n",
" self.W0 = 2 * np.random.random((3,2)) - 1\n",
" self.W1 = 2 * np.random.random((2,1)) - 1\n",
" self.l0 = X\n",
" self.l1 = None\n",
" self.output = None\n",
" available_activations = {\n",
" \"sigmoid\": self.sigmoid,\n",
" \"linear\": self.linear,\n",
" \"tanh\": self.tanh\n",
" }\n",
" print(\"Using activations: {}.\".format(\", \".join(activations)))\n",
" self.activation0 = available_activations[activations[0]]\n",
" self.activation1 = available_activations[activations[1]]\n",
"\n",
" def sigmoid(self, x, deriv=False):\n",
" if deriv:\n",
" return x * (1-x)\n",
" return 1/(1+np.exp(-x))\n",
"\n",
" def linear(self, x, deriv=False):\n",
" if deriv:\n",
" return np.ones_like(x)\n",
" return x\n",
" \n",
" def tanh(self, x, deriv=False):\n",
" if deriv:\n",
" return (1 - np.square(x))\n",
" else:\n",
" return np.tanh(x)\n",
"\n",
" def forward(self, X=None):\n",
" if X is None:\n",
" X = self.X\n",
" self.l0 = X\n",
" self.l1 = self.activation0(np.dot(self.l0, self.W0))\n",
" self.output = self.activation1(np.dot(self.l1, self.W1))\n",
" return self.output\n",
"\n",
" def backprop(self):\n",
" # Calculate error and deltas\n",
" o_err = self.learning_rate * (self.y - self.output)\n",
" w1_delta = o_err * self.activation1(self.output, True)\n",
"\n",
" w1_err = np.dot(w1_delta, self.W1.T)\n",
" w0_delta = w1_err * self.activation0(self.l1, True)\n",
" # Apply updates to weights\n",
" self.W0 += np.dot(self.X.T, w0_delta)\n",
" self.W1 += np.dot(self.l1.T, w1_delta)\n",
"\n",
" def train(self, epochs = 1):\n",
" for i in range(epochs):\n",
" self.forward()\n",
" self.backprop()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using sigmoid activation.\n",
"Using activations: sigmoid, sigmoid.\n",
"0.008829302126346193\n",
"0.15354573348953232\n"
]
}
],
"source": [
"# Just check to see that the classes work on the \"easy\" training set:\n",
"onelayer = SingleLayerNN(X,y)\n",
"twolayer = TwoLayerNN(X,y)\n",
"\n",
"onelayer.train(100)\n",
"twolayer.train(100)\n",
"\n",
"print(MSE(y, onelayer.forward(X)))\n",
"print(MSE(y, twolayer.forward(X)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, create networks for the `X2`->`y2` (linear polynomial) dataset. One single-layer and one double layer.\n",
"\n",
"Then, train them both for 1000 epochs and print the MSE at the end."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"lines_to_next_cell": 0
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using sigmoid activation.\n",
"Using activations: sigmoid, sigmoid.\n",
"0.627656628448089\n",
"0.6277107258122931\n"
]
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2)\n",
"poly_twolayer = TwoLayerNN(X2, y2)\n",
"\n",
"poly_onelayer.train(100)\n",
"poly_twolayer.train(100)\n",
"\n",
"print(MSE(y2, poly_onelayer.forward(X2)))\n",
"print(MSE(y2, poly_twolayer.forward(X2)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What do the predictions look like, versus the truth?"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y, predicted, error\n",
"1.542 1.000 0.5424\n",
"2.508 1.000 1.5075\n",
"0.369 1.000 -0.6309\n",
"1.815 1.000 0.8154\n",
"1.315 1.000 0.3150\n",
"1.732 1.000 0.7318\n",
"0.935 1.000 -0.0646\n",
"1.837 1.000 0.8370\n",
"1.401 1.000 0.4009\n",
"1.525 1.000 0.5252\n",
"1.604 1.000 0.6035\n",
"1.683 1.000 0.6833\n",
"1.481 1.000 0.4813\n",
"1.982 1.000 0.9823\n",
"1.630 1.000 0.6296\n",
"1.527 1.000 0.5273\n",
"1.090 1.000 0.0899\n",
"1.687 1.000 0.6866\n",
"2.555 1.000 1.5550\n",
"1.779 1.000 0.7795\n"
]
}
],
"source": [
"preds = poly_onelayer.forward(X2)\n",
"print(\"y, predicted, error\")\n",
"for i in range(20):\n",
" print(\"%0.3f %0.3f %0.4f\" % (y2[i,0], preds[i,0], y2[i,0]-preds[i,0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice something interesting? All the predictions are in the range $[0,1]$. Well, of course they are! We pass the output layer through a Sigmoid activation, so we squash them to that range on purpose. But the real $y$ values are not in that range. What we need here is a linear regression, not a value in $[0,1]$.\n",
"\n",
"To accomplish this, you need a *linear* activation layer on the output layer. The linear layer is:\n",
"$$\n",
"y = x\n",
"$$\n",
"(pass the input directly through). It's derivative is 1. (Or, in matrix notation, it is a matrix of 1's the same shape as the input.)\n",
"\n",
"It is implemented in the classes above, with the ability to choose it by passing the string `\"linear\"` to the `activations` parameter of the constructor at whichever layer we want to use the linear activation. For our one-layer network, we will try the linear activation at the only place we can --- after layer0. On the two-layer network, we will only use the linear activation after the second (output) layer.\n",
"\n",
"So let's try again:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: nan\n",
"Two layer: nan\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:48: RuntimeWarning: invalid value encountered in add\n",
"/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:53: RuntimeWarning: invalid value encountered in multiply\n"
]
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2, \"linear\")\n",
"poly_twolayer = TwoLayerNN(X2, y2, (\"tanh\", \"linear\"))\n",
"\n",
"poly_onelayer.train(1000)\n",
"poly_twolayer.train(1000)\n",
"\n",
"print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
"print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, what's going on with the single layer network? The `nan` is happening because the weights are \"exploding\". Look at the weights after training:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[nan]\n",
" [nan]\n",
" [nan]]\n"
]
}
],
"source": [
"print(poly_onelayer.W0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The issue is that for the linear activation layer, we need to \"take our time\" when we adjust the weights. Let's introduce a *learning rate* that will serve to control the amount of error that is actually backpropagated on each epoch. The classes above already have this capability, as a parameter on their constructor, so let's set the learning rate to 0.01 and try again."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: sigmoid, linear.\n",
"One layer: 0.029093942503251993\n",
"Two layer: 0.06462544272282988\n"
]
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
"poly_twolayer = TwoLayerNN(X2, y2, activations=(\"sigmoid\", \"linear\"), learning_rate=0.01)\n",
"\n",
"poly_onelayer.train(100)\n",
"poly_twolayer.train(100)\n",
"\n",
"print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
"print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y, predicted(1L), error(1L), predicted(2L), error(2L)\n",
"1.542 1.401 0.1419 1.510 0.0325\n",
"2.508 2.430 0.0775 2.052 0.4555\n",
"0.369 0.665 -0.2961 0.801 -0.4317\n",
"1.815 1.800 0.0152 1.755 0.0604\n",
"1.315 1.280 0.0348 1.572 -0.2566\n",
"1.732 1.963 -0.2311 1.925 -0.1936\n",
"0.935 1.138 -0.2023 1.395 -0.4598\n",
"1.837 1.902 -0.0650 1.831 0.0064\n",
"1.401 1.407 -0.0060 1.491 -0.0906\n",
"1.525 1.289 0.2359 1.449 0.0765\n",
"1.604 1.325 0.2781 1.426 0.1778\n",
"1.683 1.601 0.0822 1.508 0.1751\n",
"1.481 1.290 0.1917 1.449 0.0327\n",
"1.982 2.184 -0.2013 2.007 -0.0251\n",
"1.630 1.960 -0.3303 1.839 -0.2093\n",
"1.527 1.589 -0.0614 1.700 -0.1731\n",
"1.090 1.168 -0.0781 1.374 -0.2838\n",
"1.687 1.376 0.3103 1.540 0.1462\n",
"2.555 2.495 0.0600 2.095 0.4598\n",
"1.779 1.645 0.1340 1.788 -0.0085\n"
]
}
],
"source": [
"preds1 = poly_onelayer.forward(X2)\n",
"preds2 = poly_twolayer.forward(X2)\n",
"print(\"y, predicted(1L), error(1L), predicted(2L), error(2L)\")\n",
"for i in range(20):\n",
" print(\"%0.3f %0.3f %0.4f %0.3f %0.4f\" % (y2[i,0], preds1[i,0], y2[i,0]-preds1[i,0], preds2[i,0], y2[i,0]-preds2[i,0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's much better. \n",
"\n",
"Let's adjust one more thing: What if we allow the hidden layer in the 2-layer network to potentially produce negative outputs. We can change the activation on the hidden layer from Sigmoid to Tanh which gives outputs in the range $(-1,1)$."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909394017088954\n",
"Two layer: 0.0358061125295244\n"
]
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
"poly_twolayer = TwoLayerNN(X2, y2, activations=(\"tanh\", \"linear\"), learning_rate=0.01)\n",
"\n",
"poly_onelayer.train(100)\n",
"poly_twolayer.train(100)\n",
"\n",
"print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
"print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interesting! The two layer network seems to be converging faster now; it perhaps comparible to the 1 layer network. \n",
"\n",
"Let's repeat the trial and see if that is the case:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.029093936579457257\n",
"Two layer: 0.03405943270583079\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.029093942015501598\n",
"Two layer: 0.08096332049593732\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.029093939299101458\n",
"Two layer: 0.03277308905519685\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393763373747\n",
"Two layer: 0.039892649300335814\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.029093936992770134\n",
"Two layer: 0.037743835989540536\n",
"\n"
]
}
],
"source": [
"for i in range(5):\n",
" poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
" poly_twolayer = TwoLayerNN(X2, y2, activations=(\"tanh\", \"linear\"), learning_rate=0.01)\n",
"\n",
" poly_onelayer.train(100)\n",
" poly_twolayer.train(100)\n",
"\n",
" print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
" print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks pretty consistent. Now, lets move the number of epochs up to 1000:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.06052073592087871\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.04601340269436041\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.0650053194663348\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.06470985216466407\n",
"\n",
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.05775805105618975\n",
"\n"
]
}
],
"source": [
"for i in range(5):\n",
" poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
" poly_twolayer = TwoLayerNN(X2, y2, activations=(\"tanh\", \"linear\"), learning_rate=0.01)\n",
"\n",
" poly_onelayer.train(1000)\n",
" poly_twolayer.train(1000)\n",
"\n",
" print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
" print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 2-layer network didn't perform so well now. \n",
"\n",
"Let's keep track of the training progress as we go. \n",
"This time, we will one epoch at a time and record the progress after each time, for a total of 3000 epochs.\n",
"Then, we will plot the MSE over time."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.060749503736278854\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 360x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
"poly_twolayer = TwoLayerNN(X2, y2, activations=(\"tanh\", \"linear\"), learning_rate=0.01)\n",
"\n",
"training_stats = []\n",
"\n",
"for checkpoint in range(3000):\n",
" # Train 20 epochs at a time, then record progress; total of 50 * 20 = 1000 epochs\n",
" poly_onelayer.train(1)\n",
" poly_twolayer.train(1)\n",
" training_stats.append([MSE(y2, poly_onelayer.forward(X2)), MSE(y2, poly_twolayer.forward(X2))])\n",
"\n",
"print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
"print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))\n",
"print()\n",
"\n",
"# Plot the progress\n",
"training_stats = np.array(training_stats)\n",
"plt.figure(figsize=(5, 3))\n",
"plt.plot(training_stats[:,1], 'b', label='Two layer')\n",
"plt.plot(training_stats[:,0], 'r', label='One layer')\n",
"plt.xlabel('Checkpoint', fontsize=12)\n",
"plt.ylabel('MSE', fontsize=12)\n",
"plt.title('MSE')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two observations:\n",
"1. Both networks initially converged **fast**. Well, this was a simple function, after all.\n",
"2. The 2-layer network is behaving strangely. There is a periodic jump in MSE that decreases over time. Let's plot the same training if we use Sigmoid activation instead of tanh:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: sigmoid, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.16086398349472605\n",
"\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 360x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
"poly_twolayer = TwoLayerNN(X2, y2, activations=(\"sigmoid\", \"linear\"), learning_rate=0.01)\n",
"\n",
"training_stats = []\n",
"\n",
"for checkpoint in range(3000):\n",
" # Train 20 epochs at a time, then record progress; total of 50 * 20 = 1000 epochs\n",
" poly_onelayer.train(1)\n",
" poly_twolayer.train(1)\n",
" training_stats.append([MSE(y2, poly_onelayer.forward(X2)), MSE(y2, poly_twolayer.forward(X2))])\n",
"\n",
"print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
"print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))\n",
"print()\n",
"\n",
"# Plot the progress\n",
"training_stats = np.array(training_stats)\n",
"plt.figure(figsize=(5, 3))\n",
"plt.plot(training_stats[:,1], 'b', label='Two layer')\n",
"plt.plot(training_stats[:,0], 'r', label='One layer')\n",
"plt.xlabel('Checkpoint', fontsize=12)\n",
"plt.ylabel('MSE', fontsize=12)\n",
"plt.title('MSE')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is much smoother. What you see is most likely an artifact of the hidden layer's outputs \"alternating\" across the negative / positive boundary, since that is a thing tanh can do. Sigmoid is always positive, so there is no single point where a sign can flip suddenly.\n",
"\n",
"Another possible way to combat the observed behavior of the 2-layer network is to lower the learning rate even more. Let's try that (setting the hidden layer activation back to \"tanh\"):"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using linear activation.\n",
"Using activations: tanh, linear.\n",
"One layer: 0.02909393655222622\n",
"Two layer: 0.028925180466813725\n",
"\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAVAAAADjCAYAAADTwUy2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAHU1JREFUeJzt3Xl0VfW99/H3NzEQmZWkDgxCNaiANoagoqI4IGhVnFigVsWJVS2ot1pLq4/XYrVOl6rV6vWpYFGrPmKpw9UrTrd6nTAMKoMKDmiU1hgEBMoQ/D5/7J14iBlOTvYZkv15rbXX2fP5nm34uPdvn/075u6IiEjL5WW7ABGRtkoBKiKSIgWoiEiKFKAiIilSgIqIpEgBKiKSIgWoiEiKFKDSZpnZJ2a22cyK6s1fYGZuZv3MrLeZPWZmX5nZGjNbZGYTwvX6heutqzeMy8oHkjZnu2wXINJKHwOnAX8AMLN9gE4Jy+8H3gZ2AzYB+wA719tHD3evSX+p0t7oDFTauvuBsxKmzwZmJkwPBe5z9/XuXuPuC9z9mYxWKO2WAlTaujeAbma2t5nlA+OBB+otv9PMxptZ36xUKO2WAlTag9qz0JHAUuDzhGVjgVeA/wN8bGYLzWxove2/MrPVCcPeGala2jy1gUp7cD/wMtCfbS/fcfevgSnAlPBm0y3A38ysd8JqRWoDlVToDFTaPHdfQXAz6Vjgr02s9xVBgO4K7JiZ6qQ9U4BKe3EecIS7r0+caWY3mtlgM9vOzLoCFwLL3b06K1VKu6IAlXbB3T9094oGFnUCZgOrgY8Ivs50Qr11Vtf7HujP01yutBOmDpVFRFKjM1ARkRQpQEVEUqQAFRFJUUYC1Mz6mNlLZrbEzBab2SUNrGNmdruZLTezd8ysLBO1iYikKlNfpK8BLnP3+eFXSeaZ2XPuviRhnWOAknA4ALgrfBURyUkZCVB3XwmsDMe/MbOlQC8gMUDHADM9+FrAG2bWw8x2CbdtUFFRkffr1y+NlYtIHM2bN+8rdy9ubr2MP8ppZv2A/YA36y3qBXyWMF0Zzms0QPv160dFRUNf/RMRSZ2ZrUhmvYzeRDKzLsBjwKXuvjbFfUw0swozq6iqqoq2QBGRFshYgJpZAUF4PujuDT2v/DnQJ2G6N9v2qgOAu9/j7uXuXl5c3OwZtohI2mTqLrwB9wJL3X1aI6s9AZwV3o0/EFjTVPuniEi2ZaoN9GDgTOBdM1sYzvs10BfA3e8GniboTWc5sAE4J0O1ibQpW7ZsobKyko0bN2a7lDavsLCQ3r17U1BQkNL2mboL/7+ANbOOAz9LZx1XXAEDBsD556fzXUTSq7Kykq5du9KvXz+CiztJhbtTXV1NZWUl/fv3T2kfsXoS6dFH4eWXs12FSOts3LiRnj17Kjxbyczo2bNnq87kYxWgeXmgzqekPVB4RqO1xzF2Afrtt9muQqRtq66uprS0lNLSUnbeeWd69epVN7158+ZW7/+QQw5h4cKFza+YA2L1m0gKUJHW69mzZ13AXXPNNXTp0oXLL788y1Ulr6amhu22iyb6YnUGaqYAFUmX3/3ud/zxj38EYPLkyRx99NEAzJkzh7PPPhuABx54gH322YfBgwfz61//utl9Tpw4kfLycgYNGsTUqVPr9nfqqafWrfPMM88wduzYuvFhw4ZRVlbGuHHjWL8++IWX3r17M2XKFPbbbz9mz54d2WeO3Rmo2kClPbn0Uoj6are0FG69teXbDR8+nDvvvJOLLrqI+fPns2XLFrZu3corr7zCoYceSmVlJVdddRUVFRV0796do446iqeeeorjjjuu0X3ecMMN7LjjjtTU1HD44Ydz6qmnctRRRzFp0iSqq6vp2bMnM2bM4Nxzz+XLL7/khhtu4IUXXqBTp05cd9113HbbbXVB/YMf/IAFCxakelgaFKszUF3Ci6TP0KFDeeutt1i9ejVdunRh6NChzJ8/n1deeYXhw4fz5ptvcsQRR1BUVERBQQGnn346LzfztZiHHnqIsrIyysrKWLp0KUuWLCEvL48zzjiDv/zlL6xatYp58+Zx9NFH89prr7FkyRIOOuggSktLefDBB/nkk0/q9jVu3LjIP3PszkAVoNKepHKmmC4dO3akV69ezJw5k4MPPpgBAwbwwgsvsGLFCgYMGMC7777bov0tW7aM2267jblz59KjRw9+8pOf1H3l6Nxzz+WUU04BgmDMz8/H3Rk9ejT3339/g/vr3Llz6z5gA3QGKiKRGT58OLfccguHHnpo3SV9eXk5AAcccAAvvfQS1dXV1NTU8PDDD3PYYYc1uq+1a9fStWtXunXrxsqVK3n22WfrlvXp04eioiJuuOEGJkyYAMBBBx3E3//+dz766CMA1q9fz7Jly9L3YYnZGahuIomk1/Dhw7n55ps58MADKSwspKCggOHDhwPBjZxrr72WESNG4O4cf/zx/PjHP250X2VlZQwcOJC99tqL3XbbjYMPPnib5aeffjpr165lwIABAOy0007ce++9jBs3ru7rVNdffz0lJSVp+rRt/GeNy8vLvSX9gQ4ZArvuCk8+mcaiRNJs6dKl7L333tkuI+t++tOfMmzYsLo7/Klq6Hia2Tx3L29u21idgeoSXqR9KC0tZYcdduD222/Pah0KUBFpc3LlSaVY3URSG6iIRClWAaov0otIlGIXoDoDFZGoKEBFRFKkABWRFqusrGTMmDGUlJSw++67c8kll0TSlR3AhAkTmDVrViT7SrdYBahuIom0nrtz8sknc+KJJ7Js2TI++OAD1q1bx5VXXpnt0pJSU1MT2b5iFaC6iSTSei+++CKFhYWcc07wu4/5+fn8/ve/Z/r06WzYsIH77ruPk08+mdGjR1NSUsIVV1xRt+2cOXPqupsbO3Ys69ata/K9pk6dytChQxk8eDATJ07E3fnwww8pKyurW2fZsmV10/PmzeOwww5jyJAhjBo1ipUrgx/2HTFiBJdeeinl5eXcdtttkR0LfQ9UpC3LQn92ixcvZsiQIdvM69atG3379mX58uVA8D3NBQsW0LFjR/bcc08mT57M9ttvz29/+1uef/55OnfuzI033si0adO4+uqrG32vSZMm1S0/88wzeeqppzj++OPp3r07CxcupLS0lBkzZnDOOeewZcsWJk+ezOOPP05xcTGPPPIIV155JdOnTwdg8+bNtOTJxWQoQEUkckceeSTdu3cHYODAgaxYsYLVq1ezZMmSumfaN2/ezLBhw5rcz0svvcRNN93Ehg0bWLVqFYMGDeL444/n/PPPZ8aMGUybNo1HHnmEuXPn8v7777No0SJGjhwJwNatW9lll13q9qXu7FpJbaDS7mShP7uBAwd+7ybP2rVr+fTTT9ljjz2YP38+HTt2rFuWn59PTU0N7s7IkSN56KGHknqfjRs3ctFFF1FRUUGfPn245ppr6rqzO+WUU/jNb37DEUccwZAhQ+jZsydffPEFgwYN4vXXX29wf+rOrpXUBirSekceeSQbNmxg5syZQHCmd9lllzFhwgQ6derU6HYHHnggr776at1l/vr16/nggw8aXb82LIuKili3bt02oV1YWMioUaO48MIL69pi99xzT6qqquoCdMuWLSxevLh1H7YZsQtQnYGKtI6ZMXv2bB599FFKSkoYMGAAhYWFXH/99U1uV1xczH333cdpp53Gvvvuy7Bhw3jvvfcaXb9Hjx5ccMEFDB48mFGjRjF06NBtlp9xxhnk5eXV/fZShw4dmDVrFr/85S/50Y9+RGlpKa+99lrrP3ATYtWd3QknQGUlzJ+fxqJE0kzd2QVuueUW1qxZw7XXXtuq/ag7uySpDVSkfTjppJP48MMPefHFF7NaR6wCVG2gIu1DlD9N3BpqAxURSZECVKQNasv3LnJJa49jRgLUzKab2ZdmtqiR5SPMbI2ZLQyHxh9NaAUFqLQHhYWFVFdXK0Rbyd2prq6msLAw5X1kqg30PuAOYGYT67zi7selswjdRJL2oHfv3lRWVlJVVZXtUtq8wsJCevfunfL2GQlQd3/ZzPpl4r2aoptI0h4UFBTQv3//bJch5FYb6DAze9vMnjGzQel4A13Ci0iUcuVrTPOB3dx9nZkdC/wNKGloRTObCEwE6Nu3b4veRAEqIlHKiTNQd1/r7uvC8aeBAjMramTde9y93N3Li4uLW/Q+agMVkSjlRICa2c5mZuH4/gR1VUf9PmoDFZEoZeQS3sweAkYARWZWCfw7UADg7ncDpwIXmlkN8C9gvKfhOxq6hBeRKGXqLvxpzSy/g+BrTmmlABWRKOXEJXymKEBFJEqxClDdRBKRKMUqQHUTSUSiFLsA1RmoiERFASoikqJYBajaQEUkSrEKULWBikiUYhegOgMVkagoQEVEUhSrAFUbqIhEKVYBqjNQEYlS7AJUN5FEJCqxC1CdgYpIVBSgIiIpilWAmukSXkSiE6sAzQs/rUJURKIQywDVZbyIREEBKiKSolgFaPCzdQpQEYlGrAJUbaAiEqVYBqjOQEUkCgpQEZEUKUBFRFIUqwDVTSQRiVKsAlQ3kUQkSrEMUJ2BikgUmg1QMxtbb3rPetOXRl1UuihARSRKyZyB3ltv+vV601MjqiXt1AYqIlFKJkCthdM5S22gIhKlZAK0ftw0N52zdAkvIlFK6iaSBfLMLL+h6SS2n25mX5rZoib2f7uZLTezd8ysLPmPkDwFqIhEKZkA7QLUAFuAzUCPhOktQOck9nEfMLqJ5ccAJeEwEbgriX22mNpARSRK2yWxTv/Wvom7v2xm/ZpYZQww090deMPMepjZLu6+srXvnUhnoCISpWYD1N1XNDTfzHZw968jqqMX8FnCdGU4Ly0BqptIIhKFZL4HepaZjUqYLjezz4CvzOz9+t8LTTczm2hmFWZWUVVV1aJtdQYqIlFKpg30cuAfCdP3AM8D+4avN0dQx+dAn4Tp3uG873H3e9y93N3Li4uLW/QmClARiVIyAdoHeBfAzPoA+wCXuftiYApwQAR1PAGcFd6NPxBYE3X7J+gmkohEK5mbSDVAB2AjcBDwnruvCpdtALZvbgdm9hAwAigys0rg34ECAHe/G3gaOBZYHu7znBZ9iiSpDVREopRMgP4duM7M/gxMBp5MWLYX217eN8jdT2tmuQM/S6KWVtElvIhEKZlL+EuA/YBXCc4Ob0xYdibw32moKy0UoCISpWTOQPOBCQTPvDvQ3cy6h8v+mKa60kJtoCISpWQC9BO2fd69fuchThCyOU9toCISpWQu4d8GlgFXAf0Ibv4kDh3SVVzUdAkvIlFqNkDdfT/gVGBHgnbQp4HxQAd33+ruW9NbYnQUoCISpaR6Y3L3Re7+C4Iz0GnAccDKdPWalC4KUBGJUkt/E6kEOAwYBiwAonoWPiN0E0lEotTsTSQz2xE4DTgb6ArcDxzq7p+mubbI6SaSiEQpmbvwXwAfEwTnG+G8Pcxsj9oV3P3FNNQWOV3Ci0iUkgnQfwCFwAXhUJ8DP4yyqHRRgIpIlJLpD7RfBurICLWBikiUWnoTqU1TG6iIRCmWAaozUBGJggJURCRFClARkRTFKkB1E0lEohSrANVNJBGJUiwDVGegIhIFBaiISIpiFaBqAxWRKMUqQNUGKiJRimWA6gxURKKgABURSVGsAlRtoCISpVgFqNpARSRKsQxQnYGKSBQUoCIiKVKAioikKFYBqptIIhKlWAWobiKJSJQyFqBmNtrM3jez5WY2pYHlE8ysyswWhsP5UdegS3gRiVIyv8rZamaWD9wJjAQqgbfM7Al3X1Jv1UfcfVK66lCAikiUMnUGuj+w3N0/cvfNwMPAmAy9d53aAN26NdPvLCLtUaYCtBfwWcJ0ZTivvlPM7B0zm2VmfaIuYrvwfFsBKiJRyKWbSE8C/dx9X+A54M8NrWRmE82swswqqqqqWvQGtQFaU9O6QkVEIHMB+jmQeEbZO5xXx92r3X1TOPknYEhDO3L3e9y93N3Li4uLW1REbYBu2dKizUREGpSpAH0LKDGz/mbWARgPPJG4gpntkjB5ArA06iIKCoJXnYGKSBQychfe3WvMbBLwLJAPTHf3xWY2Fahw9yeAi83sBKAGWAVMiLoOXcKLSJQyEqAA7v408HS9eVcnjP8K+FU6a9AlvIhEKZduIqVdXl4w6AxURKIQqwCF4CxUASoiUYhlgOoSXkSiELsALSjQGaiIRCN2AapLeBGJSiwDVJfwIhKF2AWoLuFFJCqxC1BdwotIVGIZoLqEF5EoxDJAdQYqIlGIXYCqDVREohK7ANUlvIhEJZYBqjNQEYlC7AK0QwfYvDnbVYhIexC7AO3UCTZsyHYVItIexC5AO3eG9euzXYWItAcKUBGRFMUuQHUJLyJRiV2A6gxURKIS2wB1z3YlItLWxTJAv/0WNm1qfl0RkabELkC7dQte16zJbh0i0vbFLkB79QpeKyuzW4eItH2xC9C+fYPXTz/Nbh0i0vbFK0BvvZU9F/8VM3jnnWwXIyJtXbwC9I476PzfjzF8ONx5J7z4ou7Gi0jq4hWg4XeY7ror+EL9kUdCeTn84Q/wxRfZLk5E2ppYBujAgbBkCdx9d9A36MUXQ+/esP/+cMUV8NRT8NVX2S5WRHKdeRu+hi0vL/eKiorkNxg5Etatg9df32b20qUwaxbMmQNz537X3d1OO8E++8DAgbDbbsHQty/06QM9ewa924tI+2Nm89y9vNn1YhWg550Hs2dDVRXk5ze4yr/+BW+8AQsWwLvvBsN77zX8+Ge3bkGQ1g49egQnuZ07Q5cu3x/v1Cnoj7RDB+jYsfnxDh2CMs1SPEAikpJkA3S7TBQDYGajgduAfOBP7n5DveUdgZnAEKAaGOfun0RaxHHHwfTpcMEFMHRokFB527ZibA8cDhy+A3BoMLgHAbpqFVRXw9dfwzfrYP06WLc+fP0g6KRk06Zg+HITbP32+yU4LU/DPAvKTHbIzwfL+267/PzgtXYeBKFcf4DvDoflgTWwHhbNPmrnJzLb9rX+/NpDZ/XnJ7tevflR7KO1tSa7fX0t+Z9qQ+u2dvtgQVKzInm/1m6/+4VHs3PZrsnvJEkZCVAzywfuBEYClcBbZvaEuy9JWO084Gt338PMxgM3AuMiLWTMmCA8Z8wIhiQZ0CUc+kZaUJIc2BoOItJi83af03YDFNgfWO7uHwGY2cPAGCAxQMcA14Tjs4A7zMw8yjaGvDy45x64/fbgWc5NmzL7PaY23FzSntX+Z2npa6a2b6ze5uY1NT8d+83levcetFPyO26BTAVoL+CzhOlK4IDG1nH3GjNbA/QEor8fXlgYDCIkXGZntQppi9rc15jMbKKZVZhZRVVVVbbLEZEYy1SAfg70SZjuHc5rcB0z2w7oTnAzaRvufo+7l7t7eXFxcZrKFRFpXqYC9C2gxMz6m1kHYDzwRL11ngDODsdPBV6MtP1TRCRiGWkDDds0JwHPEnyNabq7LzazqUCFuz8B3Avcb2bLgVUEISsikrMy9j1Qd38aeLrevKsTxjcCYzNVj4hIa7XpJ5HMrApY0cLNikjHnf3oqc7otZVaVWf0Wlrrbu7e7E2WNh2gqTCzimQe0co21Rm9tlKr6oxeumptc19jEhHJFQpQEZEUxTFA78l2AUlSndFrK7WqzuilpdbYtYGKiEQljmegIiKRiE2AmtloM3vfzJab2ZRs1wNgZp+Y2btmttDMKsJ5O5rZc2a2LHzdIZxvZnZ7WP87ZlaWxrqmm9mXZrYoYV6L6zKzs8P1l5nZ2Q29VxrqvMbMPg+P6UIzOzZh2a/COt83s1EJ89P6t2FmfczsJTNbYmaLzeyScH5OHdMm6szFY1poZnPN7O2w1t+E8/ub2Zvh+z4SPvmImXUMp5eHy/s19xmS4u7tfiB4+ulD4IdAB+BtYGAO1PUJUFRv3k3AlHB8CnBjOH4s8AxBp0EHAm+msa5DgTJgUap1ATsCH4WvO4TjO2SgzmuAyxtYd2D4370j0D/8e8jPxN8GsAtQFo53BT4I68mpY9pEnbl4TA3oEo4XAG+Gx+r/AePD+XcDF4bjFwF3h+PjgUea+gzJ1hGXM9C6/kjdfTNQ2x9pLhoD/Dkc/zNwYsL8mR54A+hhZrukowB3f5ngcdrW1DUKeM7dV7n718BzwOgM1NmYMcDD7r7J3T8GlhP8XaT9b8PdV7r7/HD8G2ApQfeNOXVMm6izMdk8pu7u68LJgnBw4AiC/oTh+8e09ljPAo40M2viMyQlLgHaUH+kTf1hZIoDc8xsnplNDOft5O4rw/F/ALU9wWb7M7S0rmzWOym89J1ee1ncRD0ZrTO8dNyP4IwpZ49pvTohB4+pmeWb2ULgS4L/mXwIrHb3mgbed5v+hoHa/oZbVWtcAjRXHeLuZcAxwM/M7NDEhR5cY+Tc1yRyta7QXcDuQCmwEviP7JbzHTPrAjwGXOruaxOX5dIxbaDOnDym7r7V3UsJusfcH9gr0zXEJUCT6Y8049z98/D1S2A2wR/BP2svzcPXL8PVs/0ZWlpXVup193+G/7C+Bf4v312OZbVOMysgCKUH3f2v4eycO6YN1Zmrx7SWu68GXgKGETR31HaSlPi+jfU33Kpa4xKgyfRHmlFm1tnMutaOA0cDi9i2X9SzgcfD8SeAs8I7tAcCaxIu/zKhpXU9CxxtZjuEl3xHh/PSql678EkEx7S2zvHh3dj+QAkwlwz8bYRtbfcCS919WsKinDqmjdWZo8e02Mx6hOPbE/xg5VKCID01XK3+MW2ov+HGPkNyorwzlssDwZ3NDwjaSa7MgXp+SHD3721gcW1NBO0yLwDLgOeBHf27u453hvW/C5SnsbaHCC7VthC0CZ2XSl3AuQSN8suBczJU5/1hHe+E/zh2SVj/yrDO94FjMvW3ARxCcHn+DrAwHI7NtWPaRJ25eEz3BRaENS0Crk74dzU3PD6PAh3D+YXh9PJw+Q+b+wzJDHoSSUQkRXG5hBcRiZwCVEQkRQpQEZEUKUBFRFKkABURSZECVHJC2OPPA2nYr5vZHhHvs6+ZrTOz/Cj3K22PAlQyysxON7OKMIBWmtkzZnZItutqCXf/1N27uPvW5tY1s35hiGfsJ8Qlc/QfVTLGzH5O0G3bTwmeoNlM0JvQGGB9FksTSYnOQCUjzKw7MBX4mbv/1d3Xu/sWd3/S3X8RrtbBzGaa2TdhJ7nlCdvvamaPmVmVmX1sZhcnLMs3s1+b2YfhtvPMrE+9EjCzQ8zsMzMbEU67mV1sZh+Z2VdmdrOZ5YXL8szsKjNbYUGnzTPDz/C9s0oz+x8zu9bMXg3ff46ZFYVv+3L4ujo86x4W7ZGVbFKASqYMI3icbnYT65xA0HdkD4JHBu+AIMyAJwkee+0FHAlcat/1Hv5z4DSCxwe7ETzuuCFxx2Y2muDRz1Pc/X8SFp0ElBN0zDwm3BZgQjgcTvB4YJfaehpxOnAO8AOCToQvD+fX9rDVI7zsf72JfUgbowCVTOkJfOXf9dXYkP9196fDtsX7gR+F84cCxe4+1d03u/tHBL0CjQ+Xnw9c5e7ve+Btd69O2O9Y4D8JnnOu31HEjR50UPwpcCtBEAOcAUzzoFPgdcCvCDqdaKzZa4a7f+Du/yLoFb20meMh7YDaQCVTqoEiM9uuiRD9R8L4BqAwDKzdgF3NbHXC8nzglXC8D0FnEI25lKCH90UNLEvsTHcFsGs4vms4nbhsO77r9Li52rs0UY+0EzoDlUx5HdjEdz+x0BKfAR+7e4+Eoau7H5uwfPcmth8LnGjhj6TVk9hW2hf4Ihz/giC4E5fVAP9sYe3qracdU4BKRrj7GuBq4E4zO9HMOplZgZkdY2Y3NbP5XOAbM/ulmW0f3jQabGZDw+V/Aq41s5KwD819zaxnwvZfELSbXmJmF9bb9y/C/jX7AJcAj4TzHwL+LezTsgtwPcEPkTXVBNGQKuBbgnZUaWcUoJIx7v4fBDd8riIIls+AScDfmtluK3AcQbvix8BXBKHZPVxlGkG74xxgLUGnwNvX28enBCE6xczOT1j0ODCPoO/L/wq3BZhO0A77cvieG4HJLfzIuPsG4DrgVTNbHXaQLO2E+gOV2DIzB0rcfXm2a5G2SWegIiIpUoCKiKRIl/AiIinSGaiISIoUoCIiKVKAioikSAEqIpIiBaiISIoUoCIiKfr/Xu3fHZVX7MsAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 360x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"poly_onelayer = SingleLayerNN(X2, y2, activations=(\"linear\"), learning_rate=0.01)\n",
"poly_twolayer = TwoLayerNN(X2, y2, activations=(\"tanh\", \"linear\"), learning_rate=0.001)\n",
"\n",
"training_stats = []\n",
"\n",
"for checkpoint in range(3000):\n",
" # Train 20 epochs at a time, then record progress; total of 50 * 20 = 1000 epochs\n",
" poly_onelayer.train(1)\n",
" poly_twolayer.train(1)\n",
" training_stats.append([MSE(y2, poly_onelayer.forward(X2)), MSE(y2, poly_twolayer.forward(X2))])\n",
"\n",
"print(\"One layer: \", MSE(y2, poly_onelayer.forward(X2)))\n",
"print(\"Two layer: \", MSE(y2, poly_twolayer.forward(X2)))\n",
"print()\n",
"\n",
"# Plot the progress\n",
"training_stats = np.array(training_stats)\n",
"plt.figure(figsize=(5, 3))\n",
"plt.plot(training_stats[:,1], 'b', label='Two layer')\n",
"plt.plot(training_stats[:,0], 'r', label='One layer')\n",
"plt.xlabel('Checkpoint', fontsize=12)\n",
"plt.ylabel('MSE', fontsize=12)\n",
"plt.title('MSE')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This also cleans up the training error curve by taking more gradual steps. \n",
"Lets zoom in on the first 100 epochs:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 360x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(5, 3))\n",
"plt.plot(training_stats[:,1], 'b', label='Two layer')\n",
"plt.plot(training_stats[:,0], 'r', label='One layer')\n",
"plt.xlabel('Checkpoint', fontsize=12)\n",
"plt.ylabel('MSE', fontsize=12)\n",
"plt.title('MSE')\n",
"plt.xlim(0, 100)\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Huge Caveat!\n",
"\n",
"Remember that what we've been examining here is the **training error** -- that is, we're measuring the loss against the training set itself. \n",
"We have no idea whether or not we may be *overfitting* the training data. To do that, you need to use training and validation sets (and cross-validation) like in our previous discussions!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n"
]
}
],
"metadata": {
"jupytext": {
"text_representation": {
"extension": ".py",
"format_name": "light",
"format_version": "1.3",
"jupytext_version": "0.8.1"
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment