Skip to content

Instantly share code, notes, and snippets.

@yig
Last active December 1, 2023 19:39
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save yig/c1959ce997f1d2fd6f3f982cb482e060 to your computer and use it in GitHub Desktop.
Save yig/c1959ce997f1d2fd6f3f982cb482e060 to your computer and use it in GitHub Desktop.
matrix derivatives via frobenius norm
matrix derivatives via Frobenius norm
# Automatic matrix derivatives: http://www.matrixcalculus.org/
# A good primer on basic matrix calculus: https://atmos.washington.edu/~dennis/MatrixCalculus.pdf
# The Matrix Reference Manual: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html#Intro
# Trying to understand the derivative of the inverse: https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix
# Derivative of the pseudoinverse:
https://math.stackexchange.com/questions/2179160/derivative-of-pseudoinverse-with-respect-to-original-matrix
https://mathoverflow.net/questions/25778/analytical-formula-for-numerical-derivative-of-the-matrix-pseudo-inverse
https://mathoverflow.net/questions/264130/derivative-of-pseudoinverse-with-respect-to-original-matrix/264426
https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix
# Math Overflow user john316 does derivatives with Frobenius norms ( https://math.stackexchange.com/users/262158/john316 ):
https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix
https://math.stackexchange.com/questions/1405922/what-is-the-gradient-of-f-s-abat-2/1406290#1406290
https://math.stackexchange.com/questions/946911/minimize-the-frobenius-norm-of-the-difference-of-two-matrices-with-respect-to-ma/1474048#1474048
# Math Overflow user greg also does derivatives with Frobenius norms ( https://math.stackexchange.com/users/357854/greg ):
https://math.stackexchange.com/questions/2444284/matrix-derivative-of-frobenius-norm-with-hadamard-product-inside
https://math.stackexchange.com/questions/1890313/derivative-wrt-to-kronecker-product/1890653#1890653
https://math.stackexchange.com/questions/2125499/second-derivative-of-det-sqrtftf-with-respect-to-f/2125849#2125849
# Some matrix calculus:
Practical Guide to Matrix Calculus for Deep Learning (Andrew Delong)
http://www.psi.toronto.edu/~andrew/papers/matrix_calculus_for_learning.pdf
# Properties (: is Frobenius inner product, ⊙ is element-wise Hadamard product, ⋅ is matrix multiplication, ᵀ is transpose):
A:B=B:A
A:(B+C)=A:B + A:C
A:B=Aᵀ:Bᵀ
A⊙B=B⊙A
A:B⊙C=A⊙B:C
tr(Aᵀ⋅B) = tr(A⋅Bᵀ) = tr(Bᵀ⋅A) = tr(B⋅Aᵀ) = A:B
A:(B⋅C) = (Bᵀ⋅A):C = (A⋅Cᵀ):B
d(X:Y) = (dX):Y + X:(dY)
d(X:X) = dX:X + X:dX = 2X:dX
d(X⊙Y) = (dX)⊙Y + X⊙(dY)
d(X⋅Y) = (dX)⋅Y + X⋅(dY)
d(Xᵀ) = (dX)ᵀ
dZ/dX = dZ/dY ⋅ dY/dX
d(inv(X)) = -inv(X)⋅dX⋅inv(X)
vec_column( A⋅B⋅C ) = ( Cᵀ kronecker A ) ⋅ vec_column( B )
vec_row( A⋅B⋅C ) = ( A kronecker Cᵀ ) ⋅ vec_row( B )
# Example
E = norm2( A⋅x - b ) = M : M
dE = 2M : dM
dM = dA⋅x + A⋅dx - db
[To compute dE/dx, set the other derivatives to 0 and isolate dx] 2M : A⋅dx = 2 Aᵀ M : dx <=> dE/dx = 2 Aᵀ ( A x - b )
[You can compute dE/dA, which we don't usually do, just as easily. Set the other derivatives to 0 and isolate dA] 2M : dA⋅x = 2 M xᵀ : dA <=> dE/dA = 2 ( A x - b ) xᵀ
@yig
Copy link
Author

yig commented Jun 28, 2018

Updated example and added matrixcalculus.org link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment