Skip to content

Instantly share code, notes, and snippets.

@cychitivav
Created November 21, 2022 16:09
Show Gist options
  • Save cychitivav/ea933a44f0bc5608d17f0e65c6e721af to your computer and use it in GitHub Desktop.
Save cychitivav/ea933a44f0bc5608d17f0e65c6e721af to your computer and use it in GitHub Desktop.
Implementation of a simple neural network with the backpropagation algorithm.
import numpy as np
# Activation function and its derivative (sigmoid)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return sigmoid(x) * (1 - sigmoid(x))
# Error function (Mean Squared Error) and its derivative
def MSE(yBar, y):
return 0.5 * np.sum((yBar - y)**2)
def MSE_derivative(yBar, y):
return y - yBar
# Class for a neural network
class NeuralNetwork:
def __init__(self, L=[2, 1]):
# L is a list of the number of neurons in each layer
self.L = L
# W is a list of the weight matrices
self.W = np.array([np.random.rand(L[i+1], L[i]+1) for i in range(len(L)-1)],
dtype=object)
def forward(self, x):
""" Forward propagation """
# List of perceptron outputs after applying the activation function
a = np.array([np.zeros((self.L[i], x.shape[1])) for i in range(len(self.L))],
dtype=object)
# Input layer
a[0] = x
for l in range(1, len(a)):
# Weighted sum of inputs (including bias row)
z = self.W[l-1] @ np.vstack((np.ones((1,x.shape[1])), a[l-1]))
# Apply activation function
a[l] = sigmoid(z)
return a
def backward(self, yBar, a):
""" Backward propagation """
# Derivative for the last layer
delta = MSE_derivative(yBar, a[-1]) * sigmoid_derivative(a[-1])
grad = np.zeros_like(self.W)
for l in range(len(self.W)-1, -1, -1):
# Gradient for the current layer
grad[l] = delta @ np.vstack((np.ones((1,a[l].shape[1])), a[l])).T
# Derivative for the previous layer
delta = self.W[l][:, 1:].T @ delta * sigmoid_derivative(a[l])
return grad
def train(self, x, yBar, lr=0.5, epochs=80, tol=1e-3):
""" Train the neural network """
for ep in range(epochs):
# With the current weights, calculate the output
a = self.forward(x)
# Calculate the gradient with backward propagation
grad = self.backward(yBar, a)
# Update weights
self.W -= lr * grad
# Calculate the error and check if it is below the tolerance
e = MSE(yBar, a[-1])
print(f"Epoch {ep} error: {e}")
if e < tol:
break
def predict(self, x):
return self.forward(x)[-1]
if __name__ == "__main__":
# -------------- Test 1 (XOR) ---------------- #
nn = NeuralNetwork([2, 2, 1])
inputs = np.array([[0, 0, 1, 1],
[0, 1, 0, 1]])
outputs = np.array([[0, 1, 1, 0]])
# -------------- Test 2 ---------------- #
# nn = NeuralNetwork([3, 5, 7, 2])
# inputs = np.array([[2],
# [1],
# [4]])
# outputs = np.array([[0.5],
# [0.7]])
# -------------- Train ---------------- #
nn.train(inputs, outputs, lr=1.5, epochs=10000, tol=1e-3)
print("Predict:\n", nn.predict(x=inputs))
@cychitivav
Copy link
Author

Neural network

The objective of this gist is implement a neural network fully-connected from scratch and train it through backpropagation.

Forward propagation

To compute the output of the neural network, we need to compute the output of each layer. The output of a layer is computed by applying the activation function to the weighted sum of the inputs. The output of the one layer is the input of the next layer. The output of the last layer is the output of the neural network.

To achieve that, it's necessary to define a convention for the input and output of the each layer and the weights. The outputs of each layer are vectors of size $n_l$, where $n_l$ is the number of neurons in the layer.

$$ a_l=AF(z_l) $$

Note: The activation function $AF$ may be different for each layer, the most common are the sigmoid function, the hyperbolic tangent function and the rectified linear unit function(relu).

Now, let's to define the weighted sum $z_l$, due to the convention, the input of a layer is the output of the previous layer, so the size of the weights matrix is $n_{l-1} \times n_l$. The output of the layer is a vector of size $n_l$. In order to include the bias in those matrices, the dimension of the weights matrix is $n_{l-1}+1 \times n_l$.

$$ z_l = W_l\cdot a_{l-1} $$

With the above equations, we can define the forward propagation algorithm. The algorithm is defined as follows:

  1. Initialize the input of the first layer as the input of the neural network.
  2. Compute the output of each layer with

$$ a_l=AF(W_l\cdot a_{l-1}) $$

  1. The output of the neural network is the output of the last layer.

Backpropagation

The backpropagation algorithm is used to compute the gradient of the loss function with respect to the weights of the neural network. The gradient is used to update the weights of the neural network in order to minimize the loss function.

$$ W_l = W_l - l_r\cdot\frac{\partial E}{\partial W} $$

First, we need to define the loss function. The loss function is used to measure the error of the neural network. In this gist, we will use the mean squared error function:

$$ MSE=\frac{1}{2}\sum_{i=1}^{n}(\bar{y}_i-y_i)^2 $$

The gradient of the loss function with respect to the weights of the neural network is computed by the chain rule:

$$ \frac{\partial E}{\partial W_l}=\frac{\partial E}{\partial a_l}\frac{\partial a_l}{\partial z_l}\frac{\partial z_l}{\partial W_l} $$

  • $\frac{\partial a}{\partial a}=AF'(z_l)$
  • $\frac{\partial z_l}{\partial W_l}=a_{l-1}$

With those derivatives, only remains to compute the gradient of the loss function with respect to the output of the layer, to do that, it's possible to use the chain rule again:

$$ \frac{\partial E}{\partial a_l}=\frac{\partial E}{\partial a_{l+1}}\frac{\partial a_{l+1}}{\partial z_{l+1}}\frac{\partial z_{l+1}}{\partial a_l} $$

  • $\frac{\partial z_{l+1}}{\partial a_l}=W_{l+1}$

As can be seen, the term $\frac{\partial E}{\partial a_{l+1}}$ is the gradient of the loss function with respect to the output of the next layer, so it's a recursive process where may be defined:

$$ \delta_l=\frac{\partial E}{\partial a_l}\frac{\partial a_l}{\partial z_l} $$

Finally, the algorithm is defined as follows:

  1. Initialize the weights of the neural network.
  2. In the output layer, compute the gradient of the loss function with respect to the output of the layer.

$$ \delta_l=\frac{\partial E}{\partial a_l}\frac{\partial a_l}{\partial z_l} $$

  1. With $\delta_l$, compute the gradient of the previous layer.

$$ \frac{\partial E}{\partial W_l}=\delta_{l+1}\cdot a_{l-1}^T $$

  1. Update the weights of the neural network.

@cychitivav
Copy link
Author

cychitivav commented Nov 21, 2022

Training

The training of the neural network is done by repeating the backpropagation algorithm for each sample of the training set. Using the XOR problem as an example, the algorithm yields the following results:

...
Epoch 9987 error: 0.0031167375281244422
Epoch 9988 error: 0.003948532289816238
Epoch 9989 error: 0.003114471626569108
Epoch 9990 error: 0.003945706096179238
Epoch 9991 error: 0.003112207999681534
Epoch 9992 error: 0.003942882387249255
Epoch 9993 error: 0.003109946644774476
Epoch 9994 error: 0.00394006116001136
Epoch 9995 error: 0.0031076875591646326
Epoch 9996 error: 0.003937242411456609
Epoch 9997 error: 0.0031054307401727974
Epoch 9998 error: 0.003934426138579485
Epoch 9999 error: 0.0031031761851236543

Predict:
 [[0.06200623 0.96272827 0.96272827 0.03521488]]

For an input of $[0,0,1,1;0,1,0,1]$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment