A quadratic space is a real vector space V with a quadratic form Q(x), e.g. V = R^n with Q as the squared length. The Clifford algebra Cl(V) of a quadratic space is the associative algebra that contains V and satisfies x^2 = Q(x) for all x in V. We're imposing by fiat that the square of a vector should be the quadratic form's value and seeing where it takes us. Treat x^2 = Q(x) as a symbolic rewriting rule that lets you replace x^2 or x x with Q(x) and vice versa whenever x is a vector. Beyond that Cl(V) satisfies the standard axioms of an algebra: it lets you multiply by scalars, it's associative and distributive, but not necessarily commutative.
Remarkably, this is all you need to derive everything about Clifford algebras.
Let me show you how easy it is to bootstrap the theory from nothing.
We know Cl(V) contains a copy of V. Since x^2 = Q(x) for all x, it must also contain a copy of some nonnegative reals.
Even though Q(x) is nonnegative, the algebra is closed with respect to addition and multiplication, so it must in fact contain all real numbers. That's our general strategy. Use our existing inventory of things to try to generate new things until everything cycles back to earlier things we've already seen. At that point we know the algebra is closed. It's like depth-first or breadth-first search in a graph. We keep track of where we've already been so we don't cycle infinitely, and we keep track of the unexplored frontier so we don't miss anything. When there's no more unexplored frontier, we're done.
We will require that the quadratic form is nonnegative and nondegenerate, so that Q(x) = 0 if and only if x = 0. With a few caveats, the theory does work when nonzero vectors may satisfy Q(x) = 0. But you can assume our Q is something nice and familiar like the squared Euclidean length of a vector.
If Q(x) = 0 only happens when x = 0 then it follows from x^2 = Q(x) that 1/x = x/Q(x) for all x != 0. We'll use MATLAB-style left and right division: x/y = x 1/y and x\y = 1/x y. You can divide by any nonzero vector! It almost feels like cheating or make-believe, doesn't it? You should always be a little skeptical when investigating a structure defined by fiat like this. Maybe it's overconstrained and no nontrivial examples actually exist? It does here, of course.
If y = rx where x and y are vectors and r is a scalar then xy = xrx = yx = rx^2 since scalars like r commute with everything. Since r and x^2 are scalars, this implies that xy is a scalar whenever x is collinear with y. Note that xy = yx for this case, so parallel vectors commute.
By the distributive rule, (x + y)^2 = x^2 + y^2 + xy + yx. Since (x + y)^2, x^2 and y^2 are scalars, xy + yx must also be scalar. Unlike with y = rx, this conclusion for xy + yx required no assumption on x and y other than their existence as vectors. Note that xy + yx is symmetric: switching x and y replaces xy + yx by yx + xy. If we average instead of sum over the two orderings we get 1/2(xy + yx) which is xy if and only if xy = yx. This trick is called symmetrization. It works any time you have a potentially non-symmetric operation and want to project to its symmetric part. We'll use x.y = 1/2(xy + yx) to denote this symmetric product. There's also a remainder: xy = x.y + x^y. It follows that x^y = 1/2(xy - yx), the antisymmetric part of xy: xy = x^y if and only if xy = -yx. By their construction, x.y = y.x and x^y = -y^x for all x, y.
While we saw x.y is always scalar, we don't yet know anything about x^y. For now, it's just the remainder in xy after the scalar. We do know that x^y cannot contain a scalar part since scalars commute with everything and would be contained in the symmetric part.
We will call x.y the inner product and x^y the outer product.
Let's justify the name inner product by linking it directly to the quadratic form. The formula (x + y)^2 = x^2 + y^2 + xy + yx implies x.y = 1/2((x + y)^2 - x^2 - y^2) = 1/2(Q(x + y) - Q(x) - Q(y)). Compare this to the equation for a norm Q(x) = <x, x> derived from a vector space's inner product: <x + y, x + y> = <x, x> + <y, y> + 2<x,y> and so <x, y> = 1/2(<x + y, x + y> - <x, x> - <y, y>). So our definition of x.y recovers the old fashioned inner product/dot product/scalar product underlying the norm.
The inner product of vectors is a scalar, so we've cycled back around and can't make more immediate progress in that direction. Let's start looking at outer products.
We noted earlier that if x = ry then xy = yx, and hence x^y = 0. This suggests the outer product has to do with rejecting linear dependencies between vectors. If we're in dimension 1, the outer product has to be zero and everything is boring. Not that vector spaces in 1 dimension are ever exciting.
Let's say our space has at least 2 dimensions and let x and y be orthogonal vectors. Orthogonal for us means x.y = 0, but the connection to the norm's underlying inner product shows this fits the conventional meaning. For example, independent vectors can be replaced by an orthogonal set that spans the same subspace and has the same size. Note that x.y = 0 means xy = x^y = -y^x = -yx, and vice versa, so x and y are orthogonal if and only if they anticommute. Most of the time we'll avoid working with the inner and outer products separately and only work with the original product, so the characterization of parallellism as commutativity and of orthogonality as anticommutativity will be our main tool. Anticommutativity should remind you of cross products and determinants, where swapping adjacent operands will flip the sign, but unlike cross products the outer product is associative.
Let's consider the product xy. We saw earlier that xy reduces to a scalar if x and y are parallel, but now they're orthogonal. Maybe xy is a vector, but how would we know? Well, our decree by fiat was that any vector can be squared to yield a nonnegative real number. Let's test xy for that: (xy)^2 = xyxy = -xxyy = -x^2 y^2. Aha! If x and y are nonzero then x^2 y^2 > 0 and hence (xy)^2 < 0. It squared to a scalar, but it has the wrong sign, so xy cannot be a vector if x and y are orthogonal. This is a new species of object that is neither a vector or a scalar. Because it is generated by multiplying 2 vectors, we call it a bivector or 2-vector. We call an element of the Clifford algebra a multivector; so far, we've seen a multivector may have simultaneous scalar, vector and bivector parts, which should justify its name.
Before proceeding further with abstract reasoning, let's do some concrete calculations with coordinates and see what we have so far.
Example: Let V be a 2-dimensional space with an orthonormal basis [e1, e2]. That is, e1^2 = e2^2 = 1 and e1 e2 = -e2 e1. The vector basis is used to derive the bivector basis, for now only a single bivector: e12 = e1 e2 = -e2 e1. Then vectors like x and y in V may be decomposed as x = x1 e1 + x2 e2, y = y1 e1 + y2 e2. Then x y = (x1 y1 + x2 y2) + (x1 y2 - x2 y1) e12. Thus the scalar part of x y is the dot product, and the bivector part has the same coefficient as a 2-dimensional scalar cross product or 2x2 determinant; there's no vector part left over in x y.
We didn't really justify why there is only one basis bivector. Let's see: e12 e12 = e1 e2 e1 e2 = -e1 e1 e2 e2 = -1, so products of e12 cycle back to scalars with a negative sign. More generally, products of however many basis vectors can be swapped next to each other and then canceled in pairs. That means after reduction there's 0 or 1 remaining copy of each basis vector, with a final sign of (-1)^num_swaps. For our 2-dimensional space, that means we have 1 bivector dimension and no dimensions beyond that for other kinds of objects. (The reduction argument proves the number of basis elements in the Clifford algebra for an n-dimensional space across all grades is 2^n with (n choose k) in grade k, corresponding to the binomial formula 2^n = (1+1)^n = sum (n choose k). The reason is that Clifford algebras impose no additional relations for orthogonal vectors beyond anticommutativity that would allow further reductions to a lower number. If that's too much of a mouthful, don't worry about it.)
If x is a general multivector then x = x0 + x1 e1 + x2 e2 + x12 e12. This is the notation we will use throughout: index 0 for the scalar part, index 1 through the dimension n for vector parts, and concatenated indices like 12 for the higher parts generated by products of basis vectors. Our convention is to always use e for basis elements, with corresponding subscripts: e1 is a vector and x1 is its scalar coefficient in x, e12 = e1 e2 is a bivector and x12 is its scalar coefficient in x, and so on.
Let's work out the components of a general multivector product. We just apply the distributive rule, all our previous observations, and collect terms. There's nothing to it:
x y = (x0 y0 + x1 y1 + x2 y2 - x12 y12) + (x0 y1 + x1 y0) e1 + (x0 y2 + x2 y0) e2 + (x1 y2 - x2 y1 + x0 y12 + x12 y0) e12
We've seen so many examples of invertible elements that it's time to see some counterexamples so you don't get the wrong idea. (1+x)(1-x) = 1-x^2 will vanish when x^2 = 1. Let's define p1 = (1+e1)/2. Then p1^2 = (1 + 2 e1 + e1^2)/4 = (1+e1)/2 = p1, so p1 is idempotent. Its complement q1 = 1-p1 = (1-e2)/2 is also idempotent: q1^2 = (1 - 2 e2 + e2^2)/2 = q1. As we've seen, their product will vanish. This is not a bad thing; we want the ability to sometimes do lossy operations. Idempotents play an important role in Clifford algebras.
If we look at sums of scalars and bivectors, we get an internal closure property: scalar multiplication doesn't change grades, and multiplication of bivectors gives back scalars, at least in 2 dimensions:
x y = (x0 y0 - x12 y12) + (x0 y12 + x12 y0) e12
This is the formula for multiplication of complex numbers, where e12 acts like the imaginary i satisfying i^2 = -1 and so Re(x) = x0, Im(x) = x12 and Re(y) = y0, Im(y) = y12. We have discovered that the even Clifford subalgebra of a real space like R^2 is isomorphic to the complex numbers C. We can see by inspection that it is commutative; we also know from the complex numbers that it should be a field, but it might not be apparent how to construct the inverse of an element within the Clifford algebra given that it requires complex conjugation and so far we haven't seen anything like that for Clifford algebras. For now, we can just mirror the definition of conjugation and see if it works:
x' = x0 - x12 e12
Hence x' x = x x' = (x0^2 + x12^2) + (-x0 x12 + x12 x) e12 = x0^2 + x12^2, which is a scalar. Therefore any nonzero x has an inverse 1/x = x/(x' x) just like over C. But one thing that should worry is that this definition seems awfully basis dependent with the negation of the coefficient x12. The definition of conjugation of complex numbers is usually visualized as reflection in the imaginary line, which would seem to depend on the direction of that line. The even subalgebra picture clarifies this issue. The only thing we care about for the inversion formula is that x' fixes the grade 0 part and negates the grade 2 part. Let's denote by x[k] the grade k part of x so that x = x + x in our present case of the even subalgebra in Cl(R^2) and more generally x = x + ... + x[n] for an arbitrary multivector in an n-dimensional space. Then we may write x' in a basis independent way:
x' = x - x
Then x' x = (x - x)(x + x) = x^2 + x x - x x - x^2 = x^2 - x^2 because the scalar part x commutes with everything. Hence x' x is a scalar if and only x^2 is a scalar, which we saw is true in 2 dimensions but not more generally; we'll revisit the general definition of conjugation later. But in our present case we have 1/x = x/(x' x) if x^2 != x^2. In 2 dimensions we saw that x^2 was a non-positive scalar and hence x^2 = x^2 is only possible when both x and x are zero. Therefore x' x != 0 and 1/x exists for any nonzero x. We have a field.
If x = x + x corresponds to rectangular decomposition in the complex numbers, what about polar decomposition as embodied in Euler's formula z = r exp(ib)? The counterpart would be x = r exp(B), where B is a bivector and can be written in coordinates as b e12. If x is nonzero then r > 0 can be written as r = exp(a) for a = log(r). If this exp for multivectors works anything like our familiar exp, we would expect x = r exp(B) = exp(a) exp(B) = exp(a+B) since a and B commute, so polar decomposition in Cl(R^2) writes any nonzero even multivector as exp of an even multivector. This corresponds to z = r exp(ib) = exp(a+ib) over the complex numbers for a = log(r).
What does exp really mean outside of the real numbers? There are three important pictures of exp I like to use, all ultimately related. This will be a bit of an extended side quest that explains the why and wherefore of exp in a general setting, but it will justify why exponentials in Clifford algebras behave the way they do. If you find it too much, feel free to skip it
The most fundamental picture is that exp generates solutions to linear dynamical systems. A linear dynamical system on a vector space is described by a differential equation like x' = Ax where A is a linear operator/square matrix. It's linear because the derivative at x is a linear function of x. Over a 1-dimensional real space, a linear operator is just scalar multiplication, x' = ax where a is real. The solutions exhibit exponential growth (if a > 0) or decay (if a < 0) or are constant (if a = 0). We define exp(tA) as the propagator for this dynamical system, an operator which applied to a state x(0) advances it forward to x(t). If x(t) = exp(tA) x(0) for x' = Ax then d/dt exp(tA) is the constant A by definition.
The fact that exp(tA) is the propagator for x' = Ax accounts for all its properties. For example, exp((s+t)A) = exp(sA) exp(tA) says that propagating by s+t time is the same as propagating by t time and then s time, where the product exp(sA) exp(tA) is interpreted as composition/matrix multiplication. An important special case is s = -t. Propagating forward by t and then backward by t should land us where we started, so exp(-tA) is the inverse of exp(tA). By repeated application, exp(ntA) = exp(tA) ... exp(tA) = exp(tA)^n, and by combining that with exp(-mtA) you get exp(tA)^(m/n) = exp((m/n)tA). By a continuity argument, exp(stA) = exp(tA)^s for all real s, not just rationals m/n. Hence we can define exp(A)^t as exp(tA) for any real t and it will be consistent.
To put exp into context, suppose x' = f(x, t) is a time-dependent nonlinear dynamical system. This has a nonlinear propagator U(s, t) which is characterized by x(t) = U(s, t) x(s). If the system is time-independent then the propagator only depends on the time interval so U(s, t) = U(s', t') if t - s = t' - s'. We write U(s, t) = U(t - s) for this time-independent operator of one argument that only depends on the time interval t - s; you can define it concretely as U(t) = U(0, t). Now, if f(x, t) is not only time-independent but linear then d/dx f(x, t) = A is a constant independent of x and t, which is where we started. In that case U(t) is a linear operator and by definition it is exp(tA).
I don't want to get too far into the weeds with dynamical systems, but understanding the basics is important and clarifying.
Anyway, that was the first perspective on exp: it generates solutions to linear dynamical systems. In one dimension, it's simple exponential growth and decay, but in multiple dimensions there can be coupling between the different directions and things are more interesting. Let J be the 2x2 matrix [[0, -1], [1, 0]], so it's a counterclockwise 90-degree rotation corresponding to multiplication by i: J^2 = -I, which says that rotating twice by 90 degrees is the same as rotating once by 180 degrees/negating the radius vector. Then x' = Jx says that the velocity vector at every point has the same length as the radius vector but at a counterclockwise right angle. That is, it's telling you to move at uniform speed around the origin in a circle. Hence trigonometry gives us the propagator U(t) = [[cos(t), -sin(t)], [sin(t), cos(t)]] = cos(t)I + sin(t)J. But this must be the same as exp(tJ) by what we said earlier about linear dynamical systems.
This is Euler's formula in matrix form: exp(tJ) = cos(t)I + sin(t)J. If x' = (aI)x then x undergoes exponential growth/decay of its length and hence the propagator is exp(taI) = exp(ta)I. We can add two linear systems and get a linear system. Adding our uniform rotation and exponential growth/decay gives x' = aI + bJ. Then exp(aI + bJ) = exp(aI) exp(bJ) since I and J are commuting matrices. For exp(A+B) = exp(A) exp(B) to hold, it's a necessary and sufficient condition that A and B commute.
I hope that explains Euler's formula if you didn't fully understand it: exp generates solutions to linear dynamical systems, and uniform circular motion is a linear dynamical system where the velocity is the same as the radius vector but at a right angle. If you want to rotate at a faster/slower speed or clockwise instead of counterclockwise, you can use x' = aJ where |a| is the angular speed and it rotates clockwise if a < 0.
The second perspective on exp is as a power series. This follows closely from the linear dynamical system. Let U(t) be the propagator for x' = Ax. We suppose U(t) is a power series in tA with unknown coefficients: U(t) = a I + a tA + a (tA)^2 + ... so that U(t) = sum a[n] t^n A^n. Now the idea is that the initial value problem for an ordinary differential equation like x' = Ax defines an infinite system of linear equations for the unknown coefficients, and we can start at the bottom and solve for the coefficients inductively. We have U'(t) = A U(t) and U(0) = I as the propagator's defining equation. We can iterate this derivative formula to get U''(t) = A^2 U(t) and more generally the nth derivative is A^n U(t). At t = 0, the nth derivative is hence A^n. If we compare to the power series, we have U(0) = a, U'(0) = a, U''(0) = 2 a and more generally the nth derivative of the power series is n! a[n]. Setting these equal to what we got from the propagator, we must set a[n] = 1/n!. Hence U(t) = sum 1/n! t^n A^n, which is the Taylor series for exp(tA). This was just a symbolic derivation, but we can easily prove it converges for all A and t.
The third perspective on exp is the compound interest formula. I'm sure you know this from high school: exp(x) = lim(n->inf) (1 + x/n)^n. The way it's usually described is that x is an interest rate. Then (1 + x) is the growth if you earn interest in 1 period. If you earn compound interest over 2 periods with 1/2 the rate per period, the net growth is (1 + x/2)(1 + x/2) = 1 + x + 1/4 x^2 > 1 + x. And so on. In this picture exp(x) is "continuous compounding with rate x". The per-period rate is inversely proportional to the number of periods to prevent blow-up in the limit of infinite periods; it's easy to see that for every n (1 + x/n)^n = 1 + x + O(x^2) so the first derivative is always the same. The fact that 1 + x is the correct first-order approximation to exp(x) is the connection between continuous compounding and dynamical systems: exp(tA) = 1 + tA + O(t^2) is the first-order approximation to the propagator. Then exp(2tA) = exp(tA)^2 = (1 + tA + O(t^2))^2 = 1 + 2t A + t^2 A^2 + O(t^3), which has second-order accuracy. So by composing n propagators 1 + tA/n with first-order accuracy we get a propagator (1 + tA/n)^n with nth-order accuracy. If you imagine 1 + tA as the forward Euler method for integrating the differential equation, then (1 + tA/n)^n is like Euler integrating with a timestep of t/n instead of t. The compound interest formula is now exp(tA) = lim(n->inf) (1 + tA/n)^n and says that forward Euler integration converges to the correct continuous solution as the step size goes to 0, for arbitrary A and t (if t is large, you need more steps for a given level of accuracy).
I'm not going to do further coordinate calculations with general multivectors if I can help it, but I wanted to show it at least once, in the lowest nontrivial number of dimensions, so you can see what it looks like. By giving concrete algebraic expressions for objects and their operations, we also establish their concrete existence; we're not just pursuing a vacuous game of playing with axioms and proving things about non-existent fantasy creatures, like integers that are both positive and negative.
That completes the basic structure of 2-dimensional Clifford algebras. Let's put it to work on some geometric problems before proceeding further. One of the wonders of Clifford algebras is that many basic results generalize to higher dimensions with identical formulas, so any knowledge gained in 2 dimensions won't be throw-away effort even if you're one of those 3-dimensional snobs.
Onward to the cool applications!
While we haven't done any geometry yet, it should come as no surprise that Clifford algebras are intimimately connected to the quadratic space's geometry given that the quadratic form was burned into its DNA from the beginning.
We've established with the reduction argument that Cl(V) is a finite-dimensional vector space with dimension 2^n. Its elements are multivectors; I'm going to reserve the solitary term 'vector' for vectors originating from V so I don't have to say '1-vector' everywhere. The k-vectors in Cl(v) occupy a subspace of dimension (n choose k) which is generated by products of k vectors in V. This doesn't mean a k-vector is necessarily a pure product (called a simple k-vector) of vectors. For example, e1 e2 + e3 e4 is not a simple bivector in R^4 but has to be written as the sum of two simple bivectors.
I mentioned earlier that elements without inverses such as the idempotent (1+e1)(1-e1) are a necessary evil. Let's try to see why. This is optional content, but it's pretty fun and you don't usually see this proven or even sketched in intro material.
Suppose that D is a finite-dimensional real division algebra. It's not necessarily a Clifford algebra, but instead we assume we can divide by any nonzero element; that's what division algebra means. Spoiler alert: The only possibilities are the real number R, complex numbers C, and quaternions H. Hence we have to allow for zero divisors in our algebras if we want them to extend to all dimensions.
Let x be any element in D. If D is n-dimensional, if I keep taking powers of x, I will eventually develop a linear dependency: 1, x, x^2, ..., x^n contains n+1 elements, so there must be a linear dependency f(x) = a0 + a1 x + a2 x^2 + ... + an x^n = 0, where not all the coefficients are zero. (Note that n is the dimension of the algebra. If D were a Clifford algebra over V, n would be 2^dim(V), not dim(V).)
Since D is a real algebra, we may regard f(x) as a real polynomial and hence it splits into linear and irreducible quadratic factors according to the fundamental theorem of algebra. Each factor looks like x - a or x^2 - 2bx + c. Taking powers to eventually develop a linear dependency and using that to reduce higher powers to lower powers is done all the time in linear algebra and is related to things like minimal polynomials and the Cauchy-Hamilton theorem. But now comes the coup de grace: because D is a division algebra and the factorization must give zero when evaluated at x, at least one of the factors must be zero since D has no zero divisors. If the vanishing factor is linear then x - a = 0 and x is just a boring scalar.
The only non-boring case is x^2 - 2bx + c = 0. Note what this says: if you ever see x^2, you can replace it with 2bx - c. That means all higher powers of x collapse as well. This is what makes division algebras so limited compared to Clifford algebras and matrix algebras. For example, a matrix algebra of dimension n^2 allows for minimal polynomials with degree up to n but no larger. In effect, we just showed that elements in a division algebra have minimal polynomials with max degree 2.
Note that x^2 - 2bx + c = 0 almost says that x^2 is a scalar. By making a linear substitution, we can eliminate the linear term of any quadratic. If y = x - b then indeed y^2 = x^2 + b^2 - 2bx = 2bx - c + b^2 - 2bx = b^2 - c, so y^2 is scalar. Since the quadratic factor was irreducible, y^2 = b^2 - c must not have a real square root and hence b^2 - c < 0.
By analogy with complex numbers, any element y such that y^2 is a negative real scalar will be called an imaginary. In those terms, we just showed that every element x in D is a sum in a unique way of a real scalar Re(x) and an imaginary part Im(x). Let Re(D) and Im(D) denote the subsets of D of reals and imaginaries. Then D as a vector space is a direct sum of Re(D) and Im(D). If x, y in Im(D) then x+y in Im(D) since the reals are a subspace and hence if you sum two things without real parts you cannot produce a real part. Similarly, (rx)^2 = r^2 x^2 < 0 if r is real. While Re(D) is a subalgebra, Im(D) is clearly not: If x is in Im(D) then x^2 is a real number hence not contained in Im(D), so Im(D) is not closed under multiplication.
If D contains no imaginaries then D = R and we are done.
But suppose D contains a non-real element x. Then Im(x) != 0 and we can construct an element i such that i^2 = -1 by normalizing Im(x) with a division by the real number sqrt(abs(Im(x)^2)). This shows that D contains isomorphic copies of the complex numbers generated by each imaginary in D. By letting <Im(x)> act on D by left multiplication, we give D the structure of a complex division algebra. Since D is a vector space (that's part of being a division algebra), this immediately implies that D is even-dimensional. (This is a "combinatorial" feature of nested fields. Another cool example is that a finite field with characteristic p must have size p^n for some n because it's a vector space over its prime subfield of size p.)
Hence there are no odd-dimensional real division algebras other than R. This shows that the early hope of people like Hamilton who (before he discovered his quaternions) wanted a 3-dimensional division algebra was an impossible dream. But wait, there's more!
Let x, y be elements in Im(D). As with our construction of the inner product for Clifford algebras, we expand (x + y)^2 = x^2 + y^2 + xy + yx and hence xy + yx is a negative real. Define the inner product <x, y> = -1/2((x+y)^2 - x^2 - y^2). We need the minus sign to make <x, x> nonnegative. Now take any minimal generating subspace of Im(D). This is an inner product space and hence has an orthonormal basis e1, ..., en, where e1^2 = -1, ..., en^2 = -1 and e1 e2 = -e2 e1 anticommute for the exact same algebraic reason that elements with vanishing inner products were anticommuting in Clifford algebras.
These equations among the elements should be evocative. If the minimal generating subspace is 1-dimensional we let i = e1 and D is isomorphic to C. If the minimal generating subspace is 2-dimensional we let i = e1, j = e2, k = e1 e2 and get H, Hamilton's quaternions. Indeed, i and j already satisfy i^2 = j^2 = -1 and ij = -ji from their orthonormality, so we just need to check k: k^2 = e1 e2 e1 e2 = -e1 e1 e2 e2 = -(-1)(-1) = -1 and ik = e1 e1 e2 = -e1 e2 e1 = -ki, jk = e2 e1 e2 = -e1 e2 e2 = -kj and also ki = -ik and jk = -kj by the same argument. It's just the kind of swap dance we know from Clifford algebras.
In summary, we've shown that the only real division algebras were R in dimension 1, C in dimension 2, and H in dimension 4. Or rather, we proved these were the only possibilities; we already know they are division algebras.
Can we keep going higher? No! Suppose Im(D) has three orthogonal elements e1, e2, e3 and set x = e1 e2 e3. Then x^2 = e1 e2 e3 e1 e2 e3 = 1 since everything has to shift two places, so the sign flips cancel. Remember how we constructed our zero-divisor idempotents? We can do the same thing here: (1+x)(1-x) = 1 - x^2 = 1 - 1 = 0. Hence 1+x or 1-x is a zero divisor since neither can be zero when x has an imaginary part.
Thus R, C and H are all you could possibly hope for. Those zero divisors sneak in as soon as you have even one non-scalar with a positive real square, and you can't avoid getting those in dimension 5 and higher for simple combinatorial reasons because orthogonal elements anticommute. Nontrivial idempotents in any algebra must be zero divisors: p^2 = p implies p(p-1) = 0, so p = 0, p = 1 or p is a zero divisor. This investigation into division algebras has also revealed that having positive squares of vectors means we get idempotents "sooner". For example, we saw that Cl(R^2) had zero-divisor idempotents when using positive Q. If we used a negative Q so that e1^2 = -1, e2^2 = -1, we get H as we saw above. For that reason, you see algebraists primarily use negative quadratic forms for Clifford algebras. Degenerate quadratic forms are also very useful. As an extreme case you can take Q = 0 in which case the Clifford algebra collapses to just its antisymmetric parts, which gives an algebra isomorphic to the Grassmann algebra. The property x^2 = 0 is also reminiscent of algebraic infinitesimals and they let you express translations as "infinitesimal rotations around infinity" in a Clifford algebra.