Skip to content

Instantly share code, notes, and snippets.

@Nikolaj-K
Last active June 2, 2024 17:04
Show Gist options
  • Save Nikolaj-K/77d2aaf582282920767ce4e53b6ecb75 to your computer and use it in GitHub Desktop.
Save Nikolaj-K/77d2aaf582282920767ce4e53b6ecb75 to your computer and use it in GitHub Desktop.
SoftMax: On derivations of its derivations, ∂σ/∂x
Scirpt used in the video:
https://youtu.be/yx2xc9oHvkY
This video was a reaction to derivations such as:
re: https://community.deeplearning.ai/t/calculating-gradient-of-softmax-function/1897/3
----
For general $s\colon{\mathbb R}\to{\mathbb R}$, define the scaled vector ${\vec x}^s$: $i\mapsto \dfrac{s(x_i)}{\sum_{k=1}^n s(x_k)}$
This is normalized in the sense that $\sum_{k=1}^n {\vec x}^s_i = 1$.
For positive $s$, also ${\vec x}^s_i\in(0, 1]$, akin to a probability.
We aso have the exchange property ${\vec x}^s_j = {\vec x}^s_i\cdot \dfrac{s(x_j)}{s(x_i)}$.
Relevant special case: $s=\exp$ we have $\dfrac{s(x_j)}{s(x_i)} = {\mathrm e}^{x_j-x_i}$. For $i=j$ this is of course $1$.
----
$f\,\,=\,\,\dfrac{g}{h} \implies f'=\dfrac{g'\cdot h - h'\cdot g}{h^2}$
Define the log-derivative
$Lf:=\log(f)'=\dfrac{f'}{f}$
$Lf = (1 - f)\cdot Lg - f\cdot \dfrac{(h-g)'}{g}$
Consider $f$ to be a rescaling as above, where in particular $g$ and the sum $h-g$ don't share variables.
Consider further and different derivatives $D_k$ for each possible dimension $x_k$.
$\bullet$ Case $D_ax_a=1$. Here $Lg = \dfrac{s'(x_a)}{s(x_a)}$ and $D_a(h-g)=0$.
$\bullet$ Case $D_bx_a=0$. Here $Lg = 0$ and $\dfrac{D_b(h-g)}{g} = \dfrac{s'(x_b)}{s(x_a)}$.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment