Skip to content

Instantly share code, notes, and snippets.

@pervognsen
Last active May 16, 2019 06:57
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save pervognsen/c6b1d19754c2e8a38b10886b63d7bf2d to your computer and use it in GitHub Desktop.
Save pervognsen/c6b1d19754c2e8a38b10886b63d7bf2d to your computer and use it in GitHub Desktop.
This turned into a random walk through some linear algebra topics that are near and dear
to my heart. The initial stimulus was to give my preferred mathematician's explanation of
why the inverse transpose shows up when transforming normal vectors. I've never seen an
explanation in the computer graphics literature I found completely satisfactory.
Notation:
V -> W is the type of a function from V to W.
A^t is the transpose of A.
A^(-1) is the inverse of A.
adj(A) is the adjugate or classical adjoint, but I'll just call it the adjoint.
Linear Forms, Plane Equations, and the Inverse Transpose
========================================================
Let V and W be vector spaces. If T : V -> W is a linear map from V to W and f : W -> R
is a form on W (a scalar-valued linear function), then f pulls back to V via T as follows:
(T f)(v) = f(T(v))
The important point is that it is pulled back (in the reverse direction of T : V -> W), not
pushed forward. This is the opposite of how vectors behave. A vector in V would be pushed
forward by T to a vector in W.
How does this relate to transforming normal vectors of planes?
Geometrically, we are trying to transform the plane. A plane through the origin may be represented
algebraically in a variety of ways. The implicit representation most people know is with plane equations.
A plane equation (a hyperplane if V is not 3-dimensional) in V is a scalar-valued equation f(v) = 0,
where f is a form on V.
With an arbitrary linear transform T : V -> W there is simply no way to push a form on V forward to a
form on W. It's easy to see what goes wrong. If T is non-invertible, let's say T is actually 0,
then everything maps to 0 and hence the image of the plane is actually not a plane (a 2-dimensional
subspace) but a single point (a 0-dimensional subspace). Even if that's what we wanted to represent,
it would require 3 independent linear equations to do so (x = 0, y = 0, z = 0), not just 1.
Let's try going in the other direction. If we attempt to pull back a plane from W to V under T = 0,
what happens? Well, f(T(v)) = f(0) = 0 for all v in W, and hence T f = 0. While this is a degenerate
case (the solution set is the entire 3-dimensional space), it's representable by a single form.
If we pick a basis for V and W and re-examine what we did, a form f on W has a representation via
a vector n, f(w) = n^t w, and T has a representation via a matrix A, T(v) = A v. The pull back T f is a
form on V defined by (T f)(v) = f(T(v)) = n^t A v = (A^t n)^t v. So the vector in V that represents
T f is A^t n. That hopefully explains where the transpose comes in.
What about the inverse transpose? Like I said before, forms naturally want to be pulled back,
not pushed forward. However, if we have an invertible transform T : V -> W, then we can push forward
by pulling back by T's inverse. So if T is invertible and f is now a form on V (not on W), then
we can push forward f to W by (T^(-1) f)(w) = f(T^(-1)(w)) = n^t A^(-1) w = (A^(-1)^t n)^t w. So n is
pushed forward by the inverse transpose of T's matrix.
The preceding text dealt with planes through the origin, which are linear subspaces. What about affine
subspaces where the planes can be offset from the origin? This is usually handled via the so-called
projective representation of affine subspaces, as linear subspaces of one dimension higher, which most
people just call "using homogeneous coordinates". So instead of having a form in 3-space we will have a
form in 4-space. The plane is represented by a 4-dimensional vector n, and it transforms by A according
to the same formula, where A is now a 4x4 matrix. If A represents a linear transform on V then its last
row and column are just [0, 0, 0, 1]. However, everything else we did previously still works in this setting,
and it applies to not just affine transformations like translations but to all projective transformations.
This might sound too good to be true, so let's do an example. Let's say A is a translation:
[1 0 0 a]
[0 1 0 b]
[0 0 1 c]
[0 0 0 1]
The inverse of A is easily computed and it's what you'd expect:
[1 0 0 -a]
[0 1 0 -b]
[0 0 1 -c]
[0 0 0 1]
And the transpose of this is simply
[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[-a -b -c 1]
Let's compute (A^(-1))^t n, where n = [x, y, z, w]:
[x, y, z, w - (a x + b y + c z)]
In geometric terms, the normal vector stays the same, but the homogeneous offset factor is
modified according to the projection of the translation vector [a, b, c] onto the normal vector
[x, y, z]. If the translation is parallel to the plane (so the projection onto the normal vector
is zero), then the offset factor doesn't change. If the translation is orthogonal to the plane,
then a multiple of the translation length is subtracted from the offset factor.
As you can see, this is the correct answer.
You may verify this works equally well for, say, central projections. There's only one recipe.
As Fabian Giesen showed in an old blog post of his, this is handy if you want to extract frustum
planes from a projection matrix. There you don't need the inverse transpose, only the transpose,
since you are pulling back the planes, not pushing them forward. In homogeneous clip space,
the frustum planes have equations x +/- w = 0, y +/- w = 0 and z +- w = 0, corresponding to the
vectors [1, 0, 0, +/- 1], [0, 1, 0, +/- 1] and [0, 0, 1, +/- 1]. If A is a projection matrix then
the pull-back of the plane x + w = 0 is represented by A^t [1, 0, 0, 1]. This vector is just the
sum of the first and fourth row of A.
A quick side note on this frustum plane example. The appearance of the rows of A in this last
calculation exemplifies the general meaning of rows in a matrix versus columns. Geometrically, the
columns are the images in W of V's basis vectors. That much is well known to graphics people. What
is not well known is the dual interpretation for rows. The rows represent the _inverse_ images,
the pull-backs, of the basis _forms_ of W. Under our geometric interpretation of forms as equivalence
classes of planes, these basis forms of W correspond to basis planes. For a familiar 3-space with
x, y, z coordinates, the basis planes are x = 0, y = 0, and z = 0. So with our frustum example, a
better way of viewing the summing of the first and fourth row is that we are computing the pull back
of the x + w = 0 plane as a linear superposition of the pull backs of the x = 0 and w = 0 basis planes.
If you take only one thing from this: IN LINEAR ALGEBRA, PLANE EQUATIONS PULL BACK!
Addendum 1: Exterior Algebra
============================
Familiar scalar-valued plane equations indeed pull back, but there is another way.
I'll keep this part brief and without a lot of background. There is an alternative representation of planes
as 2-vectors in exterior algebra (also known as Grassmann algebra). Here, we represent a plane as the wedge
product p /\ q of a pair of plane basis vectors p and q, and we have a corresponding equation v /\ (p /\ q) = 0,
which is not scalar-valued but 3-vector-valued. And this representation pushes forward: T(p /\ q) = T(p) /\ T(q).
If V has basis vectors x, y and z, then the space of 2-vectors in V has a corresponding basis formed by y /\ z,
x /\ z and x /\ y. If you represent a 2-vector in this basis as a (y /\ z) + b (x /\ z) + c (x /\ y) and treat
it as a vector [a, b, c], then it transforms according to the adjoint matrix adj(A). Cramer's formula states
that adj(A)^t A = det(A). So, if A is invertible, then adj(A) = det(A) A^(-1)^t. That's how the inverse transpose
shows up in the exterior algebra picture. However, T as a push-forward on 2-vectors in V still exists and is well
defined when T is non-invertible. The defining push-forward formula T(p /\ q) = T(p) /\ T(q) always applies, and
its matrix representation is always adj(A) as computed from the determinant cofactors of A.
After teasing the relationship between adj(A) and 2-vectors, I can't resist pointing towards its generalization.
First of all, it extends to an n-dimensional space if you replace 2-vectors with (n-1)-vectors. But the more
interesting generalization is to look at k-vectors of arbitrary grade k. For example, in 4-dimensional homogeneous
space with x, y, z, w as basis vectors, the basis 2-vectors are z /\ w, x /\ w, and so on. The (n-1)-vectors are
now the 3-vectors and those do transform by adj(A), but the 2-vectors are a new breed. They turn out to transform
by the matrix of 2-minors, not a 4x4 matrix but a 6x6 matrix. More generally, the k-vectors transform by the
matrix of k-minors. The cofactors in adj(A) are just (n-1)-minors, so this extends the existing pattern.
A k-minor of an n by n matrix is what you get when you delete n-k rows and n-k columns and then compute the
determinant of the remaining k by k matrix. This is usually presented to students as a formal algebraic
construction, but it has a straightforward interpretation that is rich in geometric meaning. You already
know the interpretation of the determinant as measuring the signed volume distortion of a transformation.
Well, the minor is exactly that, but restricted and projected to a basis subspace. As a trivial example,
you can easily see that the matrix of 1-minors is the matrix itself. Indeed, when you delete n-1 rows
and n-1 columns, you are left with a 1x1 submatrix, and the determinant of a 1x1 matrix is just the scalar.
Since vectors are 1-vectors, this is a good litmus test. I promised the geometric interpretation was
simple, so let's see it here: the 1-minor corresponding to entry (i, j) of the matrix is what you get
when you first project the input to the subspace spanned by the ith basis vector, then apply the transform,
then project the output to the subspace spanned by the jth basis vector, and then measure the signed volume
distortion of that pre-project => transform => post-project pipeline. That checks out--it's exactly how we
normally define the matrix representation for a linear transformation in a given basis. This should suggest
why the matrix of k-minors is the right idea for transforming k-vectors in a given basis.
Addendum 2: Differential Forms
==============================
This section is purely optional.
After introducing k-vectors as an extension of vectors, it's important to note that there is a corresponding
exterior algebra of k-forms. In mathematics, a k-form is often defined as a form with k vector arguments,
with the property that if you swap the order of two inputs, the output sign is flipped. However, if your
starting point is k-vectors, you can equivalently define a k-form as a 1-form on k-vectors! For example, a
2-form f(p, q) with the first definition is equivalent to a 2-form f(p /\ q) with the second definition.
Because p /\ q = -(q /\ p) and f is linear, f(p /\ q) = f(-(q /\ p)) = -f(q /\ p), which corresponds to
the swap property of the first definition.
In mathematics and physics, k-forms are almost always what you run into. Just as you can have vector
fields on manifolds, you can have k-form fields (just called k-forms or differential forms). And unlike
vector fields, let alone k-vector fields, these k-forms have an intrinsic derivative operator, called the
exterior derivative, on differential manifolds without additional metric structure. The exterior derivative
of a 0-form f (which is just a scalar field) is the 1-form df that given a tangent vector v as input will
spit out the directional derivative of f along v. This is the invariant version of the gradient operator
you know from multi-variable calculus as producing a vector field. The gradient as a vector field can not
be defined invariantly without a metric; its conventional calculus definition is coordinate based, and its
variational definition as pointing in the direction of maximum increase assumes a measure of what a unit
step in each direction looks like, which is not available on a manifold without a metric. However, you
don't have this problem with the exterior derivative of f as a 1-form since it's defined as computing
directional derivatives, which are invariantly defined: each vector input acts as a measuring stick.
Much of 20th century mathematics and physics was built on the successes of exterior calculus on k-forms.
A random Greatest Hits collection from topology and geometry that are entirely based on differential forms:
Generalized Stokes theorem, de Rham cohomology, Cartan's method of moving frames, Hodge theory, gauge theory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment