yig/matrix derivatives.txt

## matrix derivatives.txt
matrix derivatives via Frobenius norm

# Automatic matrix derivatives: http://www.matrixcalculus.org/

# A good primer on basic matrix calculus: https://atmos.washington.edu/~dennis/MatrixCalculus.pdf
# The Matrix Reference Manual: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html#Intro
# Trying to understand the derivative of the inverse: https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix
# Derivative of the pseudoinverse:
    https://math.stackexchange.com/questions/2179160/derivative-of-pseudoinverse-with-respect-to-original-matrix
    https://mathoverflow.net/questions/25778/analytical-formula-for-numerical-derivative-of-the-matrix-pseudo-inverse
    https://mathoverflow.net/questions/264130/derivative-of-pseudoinverse-with-respect-to-original-matrix/264426
    https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix
# Math Overflow user john316 does derivatives with Frobenius norms ( https://math.stackexchange.com/users/262158/john316 ):
    https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix
    https://math.stackexchange.com/questions/1405922/what-is-the-gradient-of-f-s-abat-2/1406290#1406290
    https://math.stackexchange.com/questions/946911/minimize-the-frobenius-norm-of-the-difference-of-two-matrices-with-respect-to-ma/1474048#1474048
# Math Overflow user greg also does derivatives with Frobenius norms ( https://math.stackexchange.com/users/357854/greg ):
    https://math.stackexchange.com/questions/2444284/matrix-derivative-of-frobenius-norm-with-hadamard-product-inside
    https://math.stackexchange.com/questions/1890313/derivative-wrt-to-kronecker-product/1890653#1890653
    https://math.stackexchange.com/questions/2125499/second-derivative-of-det-sqrtftf-with-respect-to-f/2125849#2125849
# Some matrix calculus:
    Practical Guide to Matrix Calculus for Deep Learning (Andrew Delong)
    http://www.psi.toronto.edu/~andrew/papers/matrix_calculus_for_learning.pdf

# Properties (: is Frobenius inner product, ⊙ is element-wise Hadamard product, ⋅ is matrix multiplication, ᵀ is transpose):
    A:B=B:A
    A:(B+C)=A:B + A:C
    A:B=Aᵀ:Bᵀ
    A⊙B=B⊙A
    A:B⊙C=A⊙B:C
    tr(Aᵀ⋅B) = tr(A⋅Bᵀ) = tr(Bᵀ⋅A) = tr(B⋅Aᵀ) = A:B
    A:(B⋅C) = (Bᵀ⋅A):C = (A⋅Cᵀ):B
    d(X:Y) = (dX):Y + X:(dY)
    d(X:X) = dX:X + X:dX = 2X:dX
    d(X⊙Y) = (dX)⊙Y + X⊙(dY)
    d(X⋅Y) = (dX)⋅Y + X⋅(dY)
    d(Xᵀ) = (dX)ᵀ
    dZ/dX = dZ/dY ⋅ dY/dX
    d(inv(X)) = -inv(X)⋅dX⋅inv(X)

    vec_column( A⋅B⋅C ) = ( Cᵀ kronecker A ) ⋅ vec_column( B )
    vec_row( A⋅B⋅C ) = ( A kronecker Cᵀ ) ⋅ vec_row( B )

# Example
    E = norm2( A⋅x - b ) = M : M
    dE = 2M : dM
    dM = dA⋅x + A⋅dx - db
    [To compute dE/dx, set the other derivatives to 0 and isolate dx] 2M : A⋅dx = 2 Aᵀ M : dx <=> dE/dx = 2 Aᵀ ( A x - b )
    [You can compute dE/dA, which we don't usually do, just as easily. Set the other derivatives to 0 and isolate dA] 2M : dA⋅x = 2 M xᵀ : dA <=> dE/dA = 2 ( A x - b ) xᵀ
	matrix derivatives via Frobenius norm

	# Automatic matrix derivatives: http://www.matrixcalculus.org/

	# A good primer on basic matrix calculus: https://atmos.washington.edu/~dennis/MatrixCalculus.pdf
	# The Matrix Reference Manual: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html#Intro
	# Trying to understand the derivative of the inverse: https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix
	# Derivative of the pseudoinverse:
	https://math.stackexchange.com/questions/2179160/derivative-of-pseudoinverse-with-respect-to-original-matrix
	https://mathoverflow.net/questions/25778/analytical-formula-for-numerical-derivative-of-the-matrix-pseudo-inverse
	https://mathoverflow.net/questions/264130/derivative-of-pseudoinverse-with-respect-to-original-matrix/264426
	https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix
	# Math Overflow user john316 does derivatives with Frobenius norms ( https://math.stackexchange.com/users/262158/john316 ):
	https://math.stackexchange.com/questions/1689434/derivative-of-the-frobenius-norm-of-a-pseudoinverse-matrix
	https://math.stackexchange.com/questions/1405922/what-is-the-gradient-of-f-s-abat-2/1406290#1406290
	https://math.stackexchange.com/questions/946911/minimize-the-frobenius-norm-of-the-difference-of-two-matrices-with-respect-to-ma/1474048#1474048
	# Math Overflow user greg also does derivatives with Frobenius norms ( https://math.stackexchange.com/users/357854/greg ):
	https://math.stackexchange.com/questions/2444284/matrix-derivative-of-frobenius-norm-with-hadamard-product-inside
	https://math.stackexchange.com/questions/1890313/derivative-wrt-to-kronecker-product/1890653#1890653
	https://math.stackexchange.com/questions/2125499/second-derivative-of-det-sqrtftf-with-respect-to-f/2125849#2125849
	# Some matrix calculus:
	Practical Guide to Matrix Calculus for Deep Learning (Andrew Delong)
	http://www.psi.toronto.edu/~andrew/papers/matrix_calculus_for_learning.pdf

	# Properties (: is Frobenius inner product, ⊙ is element-wise Hadamard product, ⋅ is matrix multiplication, ᵀ is transpose):
	A:B=B:A
	A:(B+C)=A:B + A:C
	A:B=Aᵀ:Bᵀ
	A⊙B=B⊙A
	A:B⊙C=A⊙B:C
	tr(Aᵀ⋅B) = tr(A⋅Bᵀ) = tr(Bᵀ⋅A) = tr(B⋅Aᵀ) = A:B
	A:(B⋅C) = (Bᵀ⋅A):C = (A⋅Cᵀ):B
	d(X:Y) = (dX):Y + X:(dY)
	d(X:X) = dX:X + X:dX = 2X:dX
	d(X⊙Y) = (dX)⊙Y + X⊙(dY)
	d(X⋅Y) = (dX)⋅Y + X⋅(dY)
	d(Xᵀ) = (dX)ᵀ
	dZ/dX = dZ/dY ⋅ dY/dX
	d(inv(X)) = -inv(X)⋅dX⋅inv(X)

	vec_column( A⋅B⋅C ) = ( Cᵀ kronecker A ) ⋅ vec_column( B )
	vec_row( A⋅B⋅C ) = ( A kronecker Cᵀ ) ⋅ vec_row( B )

	# Example
	E = norm2( A⋅x - b ) = M : M
	dE = 2M : dM
	dM = dA⋅x + A⋅dx - db
	[To compute dE/dx, set the other derivatives to 0 and isolate dx] 2M : A⋅dx = 2 Aᵀ M : dx <=> dE/dx = 2 Aᵀ ( A x - b )
	[You can compute dE/dA, which we don't usually do, just as easily. Set the other derivatives to 0 and isolate dA] 2M : dA⋅x = 2 M xᵀ : dA <=> dE/dA = 2 ( A x - b ) xᵀ