Created
November 21, 2022 16:09
-
-
Save cychitivav/ea933a44f0bc5608d17f0e65c6e721af to your computer and use it in GitHub Desktop.
Implementation of a simple neural network with the backpropagation algorithm.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
# Activation function and its derivative (sigmoid) | |
def sigmoid(x): | |
return 1 / (1 + np.exp(-x)) | |
def sigmoid_derivative(x): | |
return sigmoid(x) * (1 - sigmoid(x)) | |
# Error function (Mean Squared Error) and its derivative | |
def MSE(yBar, y): | |
return 0.5 * np.sum((yBar - y)**2) | |
def MSE_derivative(yBar, y): | |
return y - yBar | |
# Class for a neural network | |
class NeuralNetwork: | |
def __init__(self, L=[2, 1]): | |
# L is a list of the number of neurons in each layer | |
self.L = L | |
# W is a list of the weight matrices | |
self.W = np.array([np.random.rand(L[i+1], L[i]+1) for i in range(len(L)-1)], | |
dtype=object) | |
def forward(self, x): | |
""" Forward propagation """ | |
# List of perceptron outputs after applying the activation function | |
a = np.array([np.zeros((self.L[i], x.shape[1])) for i in range(len(self.L))], | |
dtype=object) | |
# Input layer | |
a[0] = x | |
for l in range(1, len(a)): | |
# Weighted sum of inputs (including bias row) | |
z = self.W[l-1] @ np.vstack((np.ones((1,x.shape[1])), a[l-1])) | |
# Apply activation function | |
a[l] = sigmoid(z) | |
return a | |
def backward(self, yBar, a): | |
""" Backward propagation """ | |
# Derivative for the last layer | |
delta = MSE_derivative(yBar, a[-1]) * sigmoid_derivative(a[-1]) | |
grad = np.zeros_like(self.W) | |
for l in range(len(self.W)-1, -1, -1): | |
# Gradient for the current layer | |
grad[l] = delta @ np.vstack((np.ones((1,a[l].shape[1])), a[l])).T | |
# Derivative for the previous layer | |
delta = self.W[l][:, 1:].T @ delta * sigmoid_derivative(a[l]) | |
return grad | |
def train(self, x, yBar, lr=0.5, epochs=80, tol=1e-3): | |
""" Train the neural network """ | |
for ep in range(epochs): | |
# With the current weights, calculate the output | |
a = self.forward(x) | |
# Calculate the gradient with backward propagation | |
grad = self.backward(yBar, a) | |
# Update weights | |
self.W -= lr * grad | |
# Calculate the error and check if it is below the tolerance | |
e = MSE(yBar, a[-1]) | |
print(f"Epoch {ep} error: {e}") | |
if e < tol: | |
break | |
def predict(self, x): | |
return self.forward(x)[-1] | |
if __name__ == "__main__": | |
# -------------- Test 1 (XOR) ---------------- # | |
nn = NeuralNetwork([2, 2, 1]) | |
inputs = np.array([[0, 0, 1, 1], | |
[0, 1, 0, 1]]) | |
outputs = np.array([[0, 1, 1, 0]]) | |
# -------------- Test 2 ---------------- # | |
# nn = NeuralNetwork([3, 5, 7, 2]) | |
# inputs = np.array([[2], | |
# [1], | |
# [4]]) | |
# outputs = np.array([[0.5], | |
# [0.7]]) | |
# -------------- Train ---------------- # | |
nn.train(inputs, outputs, lr=1.5, epochs=10000, tol=1e-3) | |
print("Predict:\n", nn.predict(x=inputs)) |
Training
The training of the neural network is done by repeating the backpropagation algorithm for each sample of the training set. Using the XOR problem as an example, the algorithm yields the following results:
...
Epoch 9987 error: 0.0031167375281244422
Epoch 9988 error: 0.003948532289816238
Epoch 9989 error: 0.003114471626569108
Epoch 9990 error: 0.003945706096179238
Epoch 9991 error: 0.003112207999681534
Epoch 9992 error: 0.003942882387249255
Epoch 9993 error: 0.003109946644774476
Epoch 9994 error: 0.00394006116001136
Epoch 9995 error: 0.0031076875591646326
Epoch 9996 error: 0.003937242411456609
Epoch 9997 error: 0.0031054307401727974
Epoch 9998 error: 0.003934426138579485
Epoch 9999 error: 0.0031031761851236543
Predict:
[[0.06200623 0.96272827 0.96272827 0.03521488]]
For an input of
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Neural network
The objective of this gist is implement a neural network fully-connected from scratch and train it through backpropagation.
Forward propagation
To compute the output of the neural network, we need to compute the output of each layer. The output of a layer is computed by applying the activation function to the weighted sum of the inputs. The output of the one layer is the input of the next layer. The output of the last layer is the output of the neural network.
To achieve that, it's necessary to define a convention for the input and output of the each layer and the weights. The outputs of each layer are vectors of size$n_l$ , where $n_l$ is the number of neurons in the layer.
Now, let's to define the weighted sum$z_l$ , due to the convention, the input of a layer is the output of the previous layer, so the size of the weights matrix is $n_{l-1} \times n_l$ . The output of the layer is a vector of size $n_l$ . In order to include the bias in those matrices, the dimension of the weights matrix is $n_{l-1}+1 \times n_l$ .
With the above equations, we can define the forward propagation algorithm. The algorithm is defined as follows:
Backpropagation
The backpropagation algorithm is used to compute the gradient of the loss function with respect to the weights of the neural network. The gradient is used to update the weights of the neural network in order to minimize the loss function.
First, we need to define the loss function. The loss function is used to measure the error of the neural network. In this gist, we will use the mean squared error function:
The gradient of the loss function with respect to the weights of the neural network is computed by the chain rule:
With those derivatives, only remains to compute the gradient of the loss function with respect to the output of the layer, to do that, it's possible to use the chain rule again:
As can be seen, the term$\frac{\partial E}{\partial a_{l+1}}$ is the gradient of the loss function with respect to the output of the next layer, so it's a recursive process where may be defined:
Finally, the algorithm is defined as follows: