Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save MaverickMeerkat/0419ce93dbc457fd611ec69cfd81cd64 to your computer and use it in GitHub Desktop.
Save MaverickMeerkat/0419ce93dbc457fd611ec69cfd81cd64 to your computer and use it in GitHub Desktop.
Implement a NN - Part 3: Multiclass using Softmax.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true,
"authorship_tag": "ABX9TyPaRzqt+DvEuRH18iyZujos",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/MaverickMeerkat/0419ce93dbc457fd611ec69cfd81cd64/implement-a-nn-part-3-multiclass-using-softmax.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"In this notebook we will revisit the manual implementation of a NN, but for the multiclass classification problem.\n",
"\n",
"Like always we will start by loading the necessary libraries. \n",
"\n",
"Note that we are only using `torchvision` for the MNIST dataset - loading and handling it. We are not using any of the `pytorch` capabilities for actual training. "
],
"metadata": {
"id": "ukx_6oldn1E_"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np # for doing all the math and matrix work\n",
"import matplotlib.pyplot as plt # for a bit of graphing\n",
"\n",
"from torchvision import datasets, transforms # for the MNIST dataset"
],
"metadata": {
"id": "M4CDQn8en9s0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Set plotting DPI for bigger plots, and a random seed for reproducibility."
],
"metadata": {
"id": "uzGYsyc8yrMs"
}
},
{
"cell_type": "code",
"source": [
"plt.rcParams['figure.dpi'] = 120 # set plotting dpi\n",
"\n",
"# set seed for reproducibility\n",
"random_seed = 247\n",
"np.random.seed(random_seed)"
],
"metadata": {
"id": "HQpxM713yphQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Multiclass Data: MNIST"
],
"metadata": {
"id": "cS-VrgU6Sq4b"
}
},
{
"cell_type": "markdown",
"source": [
"We will use the famous MNIST data, which is a set of 28x28 pixels grayscale images of hand written digits from 0 to 9. The training set has 60K images, which we want to classify to the 0-9 digits. \n",
"\n",
"We download and save the data in the \"data\" folder. We will transform it into a tensor (d-dim array) instead of PIL image object. "
],
"metadata": {
"id": "KnvcrC7aStYo"
}
},
{
"cell_type": "code",
"source": [
"training_data = datasets.MNIST(\n",
" root=\"data\",\n",
" train=True,\n",
" download=True,\n",
" transform=transforms.ToTensor()\n",
")"
],
"metadata": {
"id": "98dSnsU3Ss9_"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Here's an example of an image:"
],
"metadata": {
"id": "_ZfeITunUJh5"
}
},
{
"cell_type": "code",
"source": [
"x0 = training_data[0][0] # 1st observation, take the \"x\", i.e. the features\n",
"plt.imshow(x0[0], cmap=\"gray\")"
],
"metadata": {
"id": "EdlGKForUJFO",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 450
},
"outputId": "99cfe2f8-a999-455f-bc84-dc45d4212124"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.image.AxesImage at 0x7f65cfb53bb0>"
]
},
"metadata": {},
"execution_count": 4
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 720x480 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"This looks like a 5, but could be a 3? Let's verify:"
],
"metadata": {
"id": "hJAuymh5UWVw"
}
},
{
"cell_type": "code",
"source": [
"training_data[0][1] # 1st observation, take the \"y\", i.e. the label"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Fk1d84fSUZC8",
"outputId": "fbf68ef6-092a-4463-da70-0e0228a74294"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"5"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"source": [
"We'll convert the dataset to numpy arrays:"
],
"metadata": {
"id": "iKxl05lG5zAq"
}
},
{
"cell_type": "code",
"source": [
"x = training_data.data.numpy()\n",
"y = training_data.targets.numpy()"
],
"metadata": {
"id": "he4biHXk5-jC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Let's check the dimensionality of the data:"
],
"metadata": {
"id": "6gPBya8ZDEFp"
}
},
{
"cell_type": "code",
"source": [
"x.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jo5kTh5XDGi0",
"outputId": "c592e581-5f3c-437d-8256-9859d70984b6"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(60000, 28, 28)"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"source": [
"The overall dimensionality of the data is the product of the 28x28 pixels (x1 for color channels) = 784. We want to pass it to a linear layer, so we need to flatten the tensor. We'll reshape the dataset:"
],
"metadata": {
"id": "-WAvPlShC3PZ"
}
},
{
"cell_type": "code",
"source": [
"x = x.reshape(-1, 784)\n",
"x.shape"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EiutyBMYCx9R",
"outputId": "9d70f63d-77f3-4cf2-a79b-fbc0ba681410"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(60000, 784)"
]
},
"metadata": {},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"source": [
"# The NN from before\n",
"\n",
"Let's load the code we need for the NN from the previous notebook:"
],
"metadata": {
"id": "L0zUbU8hCIhM"
}
},
{
"cell_type": "code",
"source": [
"class LinearLayer():\n",
" def __init__(self, input_size, output_size, activation_fn):\n",
" self.W = np.random.randn(input_size, output_size)*0.1\n",
" self.b = np.zeros(output_size)\n",
" self.activation_fn = activation_fn\n",
" self.input = None\n",
" self.output = None\n",
" self.grad_W = None\n",
" self.grad_b = None\n",
"\n",
" def forward(self, x):\n",
" self.input = x # a_{l-1}\n",
" self.output = x @ self.W + self.b # z_l\n",
" return self.activation_fn(self.output) # a_l\n",
"\n",
" def backward(self, grad):\n",
" grad = grad * self.activation_fn(self.output, derivative=True)\n",
" self.grad_W = self.input.T @ grad\n",
" self.grad_b = np.sum(grad, axis=0)\n",
" return grad @ self.W.T"
],
"metadata": {
"id": "O2BeeQkUCPFX"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def relu(x, derivative=False):\n",
" if derivative:\n",
" return (x > 0).astype(float)\n",
" else:\n",
" return np.maximum(0, x)\n",
"\n",
"def identity(x, derivative=False):\n",
" if derivative:\n",
" return np.ones_like(x)\n",
" else:\n",
" return x"
],
"metadata": {
"id": "K982wh8OCR4P"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"class NeuralNetwork():\n",
" def __init__(self, layers):\n",
" self.layers = layers\n",
"\n",
" def forward(self, x):\n",
" for layer in self.layers:\n",
" x = layer.forward(x)\n",
" return x\n",
"\n",
" def backward(self, grad):\n",
" for layer in reversed(self.layers):\n",
" grad = layer.backward(grad)\n",
"\n",
" def update(self, learning_rate):\n",
" for layer in self.layers:\n",
" if isinstance(layer, LinearLayer):\n",
" layer.W -= learning_rate * layer.grad_W\n",
" layer.b -= learning_rate * layer.grad_b\n",
"\n",
" def train(self, x, y, learning_rate, loss_fn, num_epochs, batch_size):\n",
" losses = []\n",
" for epoch in range(num_epochs):\n",
" indices = np.random.permutation(len(x))\n",
" loss = 0\n",
" for i in range(0, len(x), batch_size):\n",
" x_batch = x[indices[i:i+batch_size]]\n",
" y_batch = y[indices[i:i+batch_size]]\n",
" y_pred = self.forward(x_batch)\n",
" loss_i, grad = loss_fn(y_batch, y_pred)\n",
" loss += loss_i\n",
" self.backward(grad)\n",
" self.update(learning_rate)\n",
" print(f'Epoch {epoch}, loss: {loss}')\n",
" losses.append(loss)\n",
" return losses"
],
"metadata": {
"id": "aRkRgNoBRjLT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Modifying the NN"
],
"metadata": {
"id": "46NfpZTo6Qgk"
}
},
{
"cell_type": "markdown",
"source": [
"The modifications we need are as follows:\n",
"- We need a softmax activation function\n",
"- We need a (non-binary) cross entropy loss function\n",
"\n",
"Remember that the Cross Entropy loss is equal to:\n",
"\n",
"$$ \\mathcal L = -\\sum_{i=1}^n \\sum_{c=1}^C y_{ic}\\log \\hat y_{ic}\n",
"$$\n",
"\n",
"We saw that if we actually combine the gradient w.r.t. the inputs of the last layer, we get something quite easy: \n",
"\n",
"$$ \\frac{\\partial \\mathcal L}{\\partial z_L} = \\hat y - y = a - y$$\n",
"\n",
"So, we are going to **use a trick and combine the loss function with the softmax**. This way the derivative calculations will be simpler. \n",
"\n",
"<font size=\"2\">[In the last section of this notebook I will also show a way to not do this trick, but this will require work with tensors, as the $\\frac{\\partial a_L}{\\partial z_L}$ derivative is a $C\\times C$ matrix for each observation, so a $(n_{batch}, C, C)$ tensor]. </font>\n",
"\n",
"So, the outputs of the final layer will be non normalized - i.e., a linear layer with identity activation. They are sometimes also known as the \"logits\" (as they are modeling $\\log \\frac{p_{c}}{1-p_c}$).\n",
"\n"
],
"metadata": {
"id": "A5UHBYQeyoep"
}
},
{
"cell_type": "markdown",
"source": [
"In the following code remember that the logits are a $(n_{batch},C)$ matrix. We are 1st going to subtract the maximum from each row in order to avoid numerical problems (overflow). Then we'll calculate the normalizing constant per observation, and then the log probability. For each row in the resulting matrix we will only take the $y$'th column ($y$ here is an index, not a 1-hot vector), because as we saw, we only focus on the element which is not equal to 0, and try to maximize it. Finally we sum it up. \n",
"\n",
"For the gradient, we have to convert $y$ to a 1-hot-vector. Then we take the exponent of the log probabilities to get the softmax activations, and we subtract it from the 1-hot-$y$'s. \n",
"\n",
"<font size=2>[ an alternative is to simply subtract 1 from the index of the $a$'s corresponding to the true $y$ index]</font>"
],
"metadata": {
"id": "tBgNhdaGAaXt"
}
},
{
"cell_type": "code",
"source": [
"def CrossEntropy(y, logits):\n",
" num_samples = y.shape[0]\n",
" shifted_logits = logits - np.max(logits, axis=1, keepdims=True)\n",
" Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)\n",
" log_probs = shifted_logits - np.log(Z)\n",
" loss = -np.sum(log_probs[np.arange(num_samples), y]) / num_samples\n",
"\n",
" y_one_hot = np.zeros((len(y), 10))\n",
" y_one_hot[np.arange(len(y)), y] = 1\n",
" a = np.exp(log_probs)\n",
" delta = (a - y_one_hot) / num_samples\n",
"\n",
" # Alternative: Compute the derivative of the loss w.r.t. the inputs to the softmax layer\n",
" # delta = np.exp(log_probs) # a\n",
" # delta[np.arange(num_samples), y] -= 1\n",
" # delta /= num_samples\n",
"\n",
" return loss, delta"
],
"metadata": {
"id": "5D9JWvTV0lJ4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The loss itself doesn't interest us so much. We care more about accuracy here. So let's modify the training function to calculate accuracy (by overloading the train function): "
],
"metadata": {
"id": "3ROohsEdMN1s"
}
},
{
"cell_type": "code",
"source": [
"class NeuralNetwork2(NeuralNetwork):\n",
" def train(self, x, y, learning_rate, loss_fn, num_epochs, batch_size):\n",
" accs = []\n",
" for epoch in range(num_epochs):\n",
" indices = np.random.permutation(len(x))\n",
" correct = 0\n",
" for i in range(0, len(x), batch_size):\n",
" x_batch = x[indices[i:i+batch_size]]\n",
" y_batch = y[indices[i:i+batch_size]]\n",
" y_pred = self.forward(x_batch)\n",
" loss_i, grad = loss_fn(y_batch, y_pred)\n",
" self.backward(grad)\n",
" self.update(learning_rate)\n",
" pred = np.argmax(y_pred, axis=1)\n",
" correct += (y_batch == pred).sum()\n",
" accuracy = 100. * correct / len(x)\n",
" accs.append(accuracy)\n",
" print(f'Epoch: {epoch}, Acc.: {accuracy}')\n",
" return accs"
],
"metadata": {
"id": "vZ1mswIAMmk_"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We will use a 2 hidden layers network - the first will project the data to a 50-dim space, and then use the ReLU activation function. The 2nd will project the activations from before to 10 outputs. We will use an identity activation, because the softmax activation is part of the Cross Entropy calculation."
],
"metadata": {
"id": "YS5MvLHhDWgA"
}
},
{
"cell_type": "code",
"source": [
"layer1 = LinearLayer(784, 50, relu)\n",
"layer2 = LinearLayer(50, 10, identity)\n",
"nn = NeuralNetwork2([layer1, layer2])"
],
"metadata": {
"id": "gt10UvkmqVYm"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"accs = nn.train(x, y, learning_rate=0.001, loss_fn=CrossEntropy, num_epochs=20, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "U5qWl5M5rPzn",
"outputId": "0ab2c238-0f33-4bdd-a718-1c32e202fd2d"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Acc.: 63.623333333333335\n",
"Epoch: 1, Acc.: 69.355\n",
"Epoch: 2, Acc.: 74.24666666666667\n",
"Epoch: 3, Acc.: 77.56333333333333\n",
"Epoch: 4, Acc.: 79.795\n",
"Epoch: 5, Acc.: 81.02\n",
"Epoch: 6, Acc.: 82.2\n",
"Epoch: 7, Acc.: 83.37166666666667\n",
"Epoch: 8, Acc.: 84.26333333333334\n",
"Epoch: 9, Acc.: 85.04833333333333\n",
"Epoch: 10, Acc.: 85.58833333333334\n",
"Epoch: 11, Acc.: 86.04\n",
"Epoch: 12, Acc.: 86.615\n",
"Epoch: 13, Acc.: 87.185\n",
"Epoch: 14, Acc.: 87.57\n",
"Epoch: 15, Acc.: 87.975\n",
"Epoch: 16, Acc.: 88.29166666666667\n",
"Epoch: 17, Acc.: 88.54333333333334\n",
"Epoch: 18, Acc.: 88.71\n",
"Epoch: 19, Acc.: 89.15833333333333\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"y_hat = nn.forward(x)\n",
"pred = np.argmax(y_hat, axis=1)\n",
"acc = (y == pred).mean()\n",
"print(f\"accuracy = {acc}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "sqLpkfnO3p8u",
"outputId": "90a20774-bfce-4ed2-a144-c38a42f3a081"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"accuracy = 0.8933\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Not bad, we got a pretty high accuracy! \n",
"\n",
"Let's train again and see how it goes:"
],
"metadata": {
"id": "cbRr-C3w4cZF"
}
},
{
"cell_type": "code",
"source": [
"accs = nn.train(x, y, learning_rate=0.001, loss_fn=CrossEntropy, num_epochs=20, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pmlNI-cD4ZhT",
"outputId": "ef2c7b8a-3576-45ae-d571-46d7c8e9b17c"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Acc.: 89.5\n",
"Epoch: 1, Acc.: 89.58166666666666\n",
"Epoch: 2, Acc.: 89.8\n",
"Epoch: 3, Acc.: 89.94166666666666\n",
"Epoch: 4, Acc.: 90.27666666666667\n",
"Epoch: 5, Acc.: 90.29833333333333\n",
"Epoch: 6, Acc.: 90.49833333333333\n",
"Epoch: 7, Acc.: 90.75166666666667\n",
"Epoch: 8, Acc.: 90.83\n",
"Epoch: 9, Acc.: 90.905\n",
"Epoch: 10, Acc.: 91.10666666666667\n",
"Epoch: 11, Acc.: 91.205\n",
"Epoch: 12, Acc.: 91.39666666666666\n",
"Epoch: 13, Acc.: 91.485\n",
"Epoch: 14, Acc.: 91.65666666666667\n",
"Epoch: 15, Acc.: 91.75333333333333\n",
"Epoch: 16, Acc.: 91.78666666666666\n",
"Epoch: 17, Acc.: 91.835\n",
"Epoch: 18, Acc.: 92.07\n",
"Epoch: 19, Acc.: 92.07833333333333\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Getting quite close!"
],
"metadata": {
"id": "Aut5IjKS4n32"
}
},
{
"cell_type": "markdown",
"source": [
"# Separate the loss and the activations"
],
"metadata": {
"id": "lkZNYBubOSfb"
}
},
{
"cell_type": "markdown",
"source": [
"If we want to implement a softmax activation function which also outputs a derivative, we have a bit of work ahead of us.\n",
"Our derivative for each row/observation will give us back a matrix. This means we need to use tensors. The derivative will be matrices of shape $(n_{batch}, C, C)$ where $n_{batch}$ is the # of observations in the current batch, and $C$ is the number of classes / inputs to the softmax. \n",
"\n",
"We will use the somewhat complicated `np.einsum` function. Since we are doing tensor multiplications, we have to tell it which \"sides\" or dimensions of the tensor we want to multiply. \n",
"\n",
"For example, `np.einsum('ij,ik->ijk', a, b)` tells it for each row (of dim `i`) in the original matrices, to multiply the column of the 1st matrix (`a`, of dim `j`) with the column of the 2nd matrix (`b`, of dim `k`) in an outer product, such that we will get a $(i,j,k)$ tensor. This assumes that both matrices have the same 1st dimension `i`.\n",
"\n",
"Another example, `np.einsum('ij,jk->ijk', a, b)` tells it for each row in the 1st matrix `a`, to multiply each column element (of dim `j`) with the row of another matrix `b` of dim $(j,k)$, to get a $(i,j,k)$ tensor. \n",
"\n",
"Finally, `np.einsum('ijk,ik->ij', a, b)` tells it to for each row in the first tensor `a` (of dim `i`) to multiply the corresponding matrix (of dim $(j,k)$ with a vector of dim `k`. We will get a matrix of degree $(i,j)$."
],
"metadata": {
"id": "dQQvRV1EOYCv"
}
},
{
"cell_type": "code",
"source": [
"def softmax(logits, derivative=False):\n",
" shifted_logits = logits - np.max(logits, axis=1, keepdims=True)\n",
" Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)\n",
" log_probs = shifted_logits - np.log(Z)\n",
" p = np.exp(log_probs) # the softmax activations\n",
" if derivative:\n",
" # z, da shapes - (m, n)\n",
" m, n = logits.shape\n",
" # First we create for each example feature vector, it's outer product with itself\n",
" # ( p1^2 p1*p2 p1*p3 .... )\n",
" # ( p2*p1 p2^2 p2*p3 .... )\n",
" # ( ... )\n",
" tensor1 = np.einsum('ij,ik->ijk', p, p) # (m, n, n)\n",
" # Second we need to create an (n,n) identity of the feature vector\n",
" # ( p1 0 0 ... )\n",
" # ( 0 p2 0 ... )\n",
" # ( ... )\n",
" tensor2 = np.einsum('ij,jk->ijk', p, np.eye(n, n)) # (m, n, n)\n",
" # Then we need to subtract the first tensor from the second\n",
" # ( p1 - p1^2 -p1*p2 -p1*p3 ... )\n",
" # ( -p1*p2 p2 - p2^2 -p2*p3 ...)\n",
" # ( ... )\n",
" dSoftmax = tensor2 - tensor1\n",
" return dSoftmax\n",
" else:\n",
" return p"
],
"metadata": {
"id": "EtUehWtSO1VW"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The cross entropy loss now doesn't recieve the logits, but the softmax, so we only calculate the loss w.r.t. the softmax activations, which as we saw are equal to $\\frac{\\partial \\mathcal L}{\\partial a_{Lc}} = -\\frac{y_c}{a_{Lc}}$.\n",
"\n",
"To avoid numerical issues we will add a small epsilon to the log and division operations."
],
"metadata": {
"id": "XUznIVnxP-dW"
}
},
{
"cell_type": "code",
"source": [
"def CrossEntropy2(y, a):\n",
" eps = 1e-10\n",
" num_samples = y.shape[0]\n",
" log_probs = np.log(a + eps)\n",
" loss = -np.sum(log_probs[np.arange(num_samples), y]) / num_samples\n",
"\n",
" y_one_hot = np.zeros((len(y), 10))\n",
" y_one_hot[np.arange(len(y)), y] = 1\n",
" delta = -1*(y_one_hot/(a + eps)) / num_samples\n",
"\n",
" return loss, delta"
],
"metadata": {
"id": "arw2NRw6QYY0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We also need to modify the linear layer to use the tensor to matrix product:"
],
"metadata": {
"id": "y9BnoB51SKox"
}
},
{
"cell_type": "code",
"source": [
"class LinearLayer2(LinearLayer):\n",
" def backward(self, grad):\n",
" da = self.activation_fn(self.output, derivative=True)\n",
" grad = np.einsum('ijk,ik->ij', da, grad) \n",
" self.grad_W = self.input.T @ grad\n",
" self.grad_b = np.sum(grad, axis=0)\n",
" return grad @ self.W.T"
],
"metadata": {
"id": "U-iDQBOaR7kK"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We will now use a softmax activation after the 2nd layer:"
],
"metadata": {
"id": "ydgKcl8oQxDk"
}
},
{
"cell_type": "code",
"source": [
"layer1 = LinearLayer(784, 50, relu)\n",
"layer2 = LinearLayer2(50, 10, softmax)\n",
"nn2 = NeuralNetwork2([layer1, layer2])"
],
"metadata": {
"id": "xtYuCB6ZQxDl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"losses = nn2.train(x, y, learning_rate=0.001, loss_fn=CrossEntropy2, num_epochs=20, batch_size=128)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "40419ae7-dcae-4a2c-b7c8-d57030d44b48",
"id": "QlAx8-VIQxDm"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch: 0, Acc.: 53.29\n",
"Epoch: 1, Acc.: 83.62666666666667\n",
"Epoch: 2, Acc.: 87.74833333333333\n",
"Epoch: 3, Acc.: 89.625\n",
"Epoch: 4, Acc.: 90.69833333333334\n",
"Epoch: 5, Acc.: 91.29333333333334\n",
"Epoch: 6, Acc.: 91.89166666666667\n",
"Epoch: 7, Acc.: 92.43166666666667\n",
"Epoch: 8, Acc.: 92.925\n",
"Epoch: 9, Acc.: 93.14666666666666\n",
"Epoch: 10, Acc.: 93.53833333333333\n",
"Epoch: 11, Acc.: 93.67666666666666\n",
"Epoch: 12, Acc.: 93.90833333333333\n",
"Epoch: 13, Acc.: 94.135\n",
"Epoch: 14, Acc.: 94.34166666666667\n",
"Epoch: 15, Acc.: 94.44\n",
"Epoch: 16, Acc.: 94.76833333333333\n",
"Epoch: 17, Acc.: 94.79666666666667\n",
"Epoch: 18, Acc.: 94.95\n",
"Epoch: 19, Acc.: 95.08\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"y_hat = nn2.forward(x)\n",
"pred = np.argmax(y_hat, axis=1)\n",
"acc = (y == pred).mean()\n",
"print(f\"accuracy = {acc}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XvuuZ-NvUHPw",
"outputId": "852adb07-2d63-464f-8a57-872f54d2903b"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"accuracy = 0.9490166666666666\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Not bad accuracy at all!"
],
"metadata": {
"id": "644ZHEbQULDv"
}
},
{
"cell_type": "markdown",
"source": [
"© David Refaeli 2023."
],
"metadata": {
"id": "aKiefmDX4qcc"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment