Skip to content

Instantly share code, notes, and snippets.

@usernaamee
Last active August 26, 2023 09:38
Show Gist options
  • Save usernaamee/538c848a3695244d945afde58ac5ff28 to your computer and use it in GitHub Desktop.
Save usernaamee/538c848a3695244d945afde58ac5ff28 to your computer and use it in GitHub Desktop.
Derivation of Backpropagation algorithm in matrix form

Backpropagation : Derivation (Matrix form)

Notations

  1. Input matrix is $x_0$
  2. Layer 1 weight matrix is $W_1$
  3. Layer 1 output is $x_1 = f_1(W_1x_0)$, where $f$ is the activation function for layer 1
  4. There are 4 layers (including input layer)
  5. Hence, network output is $x_3 = f_3(W_3x_2)$
  6. Assuming MSE loss function, with $t$ as target variable, $E = \frac{1}{2}|x_3 - t|_2^2$

Derivation:

$$ \begin{align*} \frac{\partial E}{\partial W_3} &= (x_3 - t)\frac{\partial x_3}{W_3} \\ &= [(x_3 - t) \circ f_3'(W_3 x_2)] \frac{\partial W_3 x_2}{\partial W_3} \\ &= [(x_3 - t) \circ f_3'(W_3 x_2)] x_2^T \\ &= \delta_3 x_2^T \\ \textrm{Let } \delta_3 &= [(x_3 - t) \circ f_3'(W_3 x_2)] \\ \frac{\partial E}{\partial W_3}&= \delta_3 x_2^T \end{align*} $$

Next,

$$ \begin{align*} \frac{\partial E}{\partial W_2} &= (x_3 - t)\frac{\partial x_3}{W_2} \\ &= [(x_3 - t) \circ f_3'(W_3 x_2)] \frac{\partial W_3 x_2}{\partial W_2} \\ &= \delta_3 \frac{\partial W_3 x_2}{\partial W_2} \\ &= W_3^T \delta_3 \frac{\partial x_2}{\partial W_2} \\ &= [W_3^T \delta_3 \circ f'_2(W_2 x_1)] \frac{\partial W_2 x_1}{\partial W_2} \\ &= \delta_2 \frac{\partial W_2 x_1}{\partial W_2} \\ &= \delta_2 x_1^T \end{align*} $$

Next,

$$ \begin{align*} \frac{\partial E}{\partial W_1} &= (x_3 - t)\frac{\partial x_3}{W_1} \\ &= \delta_2 \frac{\partial W_2 x_1}{\partial W_1} \\ &= W_2^T \delta_2 \frac{\partial f(W_1 x_0)}{\partial W_1} \\ &= W_2^T \delta_2 f'_1(W_1 x_0) \frac{\partial (W_1 x_0)}{\partial W_1} \\ &= \delta_1 x_0^T \end{align*} $$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment