Vector Algebra Done Right

How is vector algebra taught?

I doubt the system has changed much since my day. They started by drawing arrows to represent vectors, so it was obvious a vector is essentially a magnitude and direction, and obvious that a vector can be represented with Cartesian coordinates.

Vector addition was clear: just put the start of one vector at the end of the other, then follow the arrows. At the time, I was unacquainted with abstract algebra, but I still understood vector addition was associative and commutative. Inverses and vector subtraction were also clear.

But then they threw the dot product and the cross product in our laps. One magic formula yields the product of the sizes of the input vectors and the cosine of the angle between them, and another magic formula does the same except with sine instead and for some reason it’s associated with a direction perpendicular to both input vectors.

Both are called products, even though the dot product produces a scalar instead of another vector, and even though the cross product is not associative. The oddly specific cross product only works in exactly three dimensions. How did this fragile and finicky algebra arise?

The cross product dates back to a 1773 paper by Lagrange on tetrahedrons, but this seems to be an unnatural and difficult derivation.

At university, we were taught the inner product, a more general dot product, which deepens the mystery. Why is it an "inner" product?

Complex-er Numbers

I later heard the following story. Mathematicians noticed complex numbers could represent two-dimensional points, and moreover, addition and conjugation and multiplication can be interpreted as translation, rotation, reflection, and scaling. In other words, we can attack two-dimensional geometry problems just by manipulating complex numbers.

Since reality has three dimensions of space, mathematicians sought a way to add another axis to complex numbers. Why not introduce a second imaginary number \(j\), where the \(j\)-axis is orthogonal to the real axis and the \(i\)-axis? However, it seemed impossible to find sane rules to govern interaction between \(i, j\) and the reals.

Hamilton eventually cracked this problem (and so did Olinde Rodrigues): the trick is to also introduce yet another imaginary number \(k\) along with a non-commutative multiplication law: now and then we must flip signs. These are known as the quaternions, and in theory, the story ends here: we solve physics problems with quaternions and live happily ever after.

However, in practice, it turns out we only need a few key operations on quaternions, and over time, they figured it was much easier to teach students Cartesian coordinates along with the miraculous formulas for the dot and cross products, rather than teach quaternions with all its baggage. Typical students see only a highly abridged version of the subject, and all that remains of quaternions are dream-like fragments, such as the non-commutative cross product whose output is orthogonal to both inputs.

The Secret History of Vector Algebra

Later still, I learned the above narrative is incomplete. The development of linear algebra is not a linear story, and there is a lesser-known thread that almost sounds like an alternate history, except it really happened.

Our tale starts on familiar ground. In the beginning, there was Euclid, whose notation suggested algebraic laws. If \(A, B, C\) are successive points on a line, then we might write:

\[ AB + BC = AC \]

to mean that the length of the line segment \(AC\) is the sum of the lengths of the segments \(AB\) and \(BC.\)

Similarly, we can describe an angle being cut into smaller angles. For suitably positioned points \(A, B, C, D,\) we can write:

\[ \angle ABC + \angle CBD = \angle ABD \]

But there’s little we can do beyond narrow cases like these, and Euclid’s proofs mostly consist of prose and diagrams. Dijkstra observed that "Greek mathematics got stuck because it remained a verbal, pictorial activity", and complains that "Euclidean geometry, with all its known defects, is still taught as the prototype of a strictly deductive system." (To be fair, it turns out Euclid’s Elements can be formalized without losing too much of its flavour.)

Centuries later, Cartesian coordinates gave mathematicians enough power to handle all cases with algebra. Dijkstra:

…the modern civilized world could only emerge —for better or for worse— when Western Europe could free itself from the fetters of medieval scholasticism —a vain attempt at verbal precision!— thanks to the carefully, or at least consciously designed formal symbolisms that we owe to people like Vieta, Descartes, Leibniz, and (later) Boole.

In a competitive mathematics training camp, I was told that if you’re struggling with a geometry problem, then as a last resort, give coordinates to each point and maybe you can bash out a solution through sheer algebra. An algorithm due to Wu automates exactly this.

However, in the words of Leibniz:

…but one should know that algebra, the analysis of Viéta and Descartes, is primarily an analysis of numbers, and not of lines, although geometry is indirectly brought back (to arithmetic), given the fact that all magnitudes can be expressed by numbers. But this often forces us to take great digressions; and it is often the case that geometers can prove in a few words what in a calculation is a long procedure. And if one has found an equation in some difficult problem, it is still a long way to finding the problem’s structure, which one was looking for. Moving from algebra to geometry is a sure path, but not the best…

For example, let’s take the angle example from above. On a diagram, cutting an angle into two smaller angles is elementary. But how do you express this with coordinates?

We choose \(B\) to be the origin, and write \(A = (x_1, y_1), C = (x_2, y_2), D = (x_3, y_3) \). Then one can check:

\[ \angle ABC = \cos^{-1} \frac{x_1 x_2 + y_1 y_2}{ \sqrt{(x_1^2 + y_1^2)(x_2^2 + y_2^2)} } \]

with similarly unwieldy expressions for \(\angle CBD\) and \(\angle ABD\). Many trigonometric identities later (angle-grinding, one might say), a die-hard algebraist will find that the two smaller angles indeed sum to the big one. One is reminded of Whitehead and Russell’s infamous proof that 1 + 1 = 2.

Which leaves the higher-dimensional cases, where we must repeat the exercise with more coordinates. Though rather than go through a horrific calculation involving general coplanar points, we could pick our coordinate system so that all but the first two coordinates are nonzero, reducing the problem to the two-dimensional case. Strictly speaking, this means we briefly step outside algebra, a criticism that may also apply to choosing \(B\) to be the origin; translation-invariance is a geometric property. But I get the impression that some geometric hand-waving is traditionally acceptable in algebraic proofs.

What about schoolbook vector algebra? Here, an expression for an angle looks like:

\[ \newcommand{\i}{\mathbf{i}} \newcommand{\e}{\mathbf{e}} \newcommand{\u}{\mathbf{u}} \newcommand{\v}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \newcommand{\vv}[1]{\mathbf{#1}} \newcommand{\R}{\mathbb{R}} \angle ABC = \cos^{-1} \frac{\v \cdot \w}{ |\v| |\w| } \]

This is certainly an improvement. Individual coordinates no longer clutter the expression. Not only is the formula cleaner, but it also applies to any number of dimensions. Also, thanks to vectors, we avoid irritating irrelevant details such as choosing the origin to make computations easier, and proving that this is permissible in the first place.

But we still must suffer through a lot of algebra and trigonometry. Fundamentally, the problem is that vector algebra only talks about an angle via its cosine, which is like encrypting it. We cannot sum two angles directly; roughly speaking, we instead decrypt two encypted angles, sum them, and encrypt the result.

Hermann Grassmann

Hermann Grassmann was an original thinker whose mathematical work was overlooked until late in his life, partly because it was far ahead of its time. See Hans-Joachim Petsche, Hermann Grassmann.

Grassmann thought like a modern functional programmer. John Hughes, Why Functional Programming Matters, lists the symptoms:

Whole values: for example, if we’re thinking about points, we should denote a point somehow and write expressions that manipulate points, rather than expressions that manipulate coordinates.
Combining forms: it should be easy to compose smaller things into bigger things.
Algebra as a litmus test: notation should obey simple algebraic laws, as it implies expressiveness.
Functions as representations: modern mathematicians take this for granted, but it was once an outlandish idea. Only in the late 19th century did Frege revolutionize logic by viewing propositions as functions (see Section 9 of Begriffschrift), performing feats no earlier logician or philosopher could approach.

I bet Grassmann would agree with many remarks by computer scientists and mathematicians:

Peter Henderson: "It seems there is a positive correlation between the simplicity of the rules and the quality of the algebra as a description tool".
Edgar Dijkstra: "The virtue of formal texts is that their manipulations, in order to be legitimate, need to satisfy only a few simple rules; they are, when you come to think of it, an amazingly effective tool for ruling out all sorts of nonsense that, when we use our native tongues, are almost impossible to avoid.

Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve."
John Backus: "…effectively use powerful combining forms for building new programs from existing ones"
Peter Landin: "Expressive power should be by design, rather than by accident!"
Alfred Whitehead: "By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental power of the race."

On the other hand, Stephen Hawking was warned that for every equation in his book, readership would be halved, suggesting that some believe natural language beats mathematical notation. Let’s try it:

Cubum autem in duos cubos, aut quadrato-quadratum in duos quadrato-quadratos, et generaliter nullam in infinitum ultra quadratum potestatem in duos eiusdem nominis fas est dividere cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet.

Is this clearer than saying \(x^n + y^n = z^n\) has no solutions in \(\mathbb{Z}^+\) for \(n \ge 3?\)

To be fair, the above quote is lenghtier than needed because it includes an extra-mathematical statement about the margin size, but again natural language is to blame: of course the purported proof failed to fit, because Fermat wrote in prose instead of mathematical notation!

Algebra omnia vincit

Grassmann had two conditions for his definitions:

They must satisfy algebraic laws. Associativity; inverses; distributivity. In fact, "the consideration of negatives in geometry” was Grassmann’s "initial incentive" for his research.
They must have geometric meaning. Any operation must be a "real concept that expresses the method of generation of the product by the factors"; "each step from one formula to another appears at once as just the symbolic expression of a parallel act of abstract reasoning."

Grassmann started by giving opposite signs to opposite directions, "regarding the displacements AB and BA as opposite magnitudes. From this it follows that if A, B, C are points of a straight line, then AB + BC = AC is always true."

Having removed the condition that \(B\) must lie between the other two points, Grassmann generalized further: "the law AB + BC = AC is imposed even when A, B, C do not lie on a single straight line. Thus the first step was taken toward an analysis that subsequently led to the new branch of mathematics presented here." That is, Grassmann began thinking in terms of "displacements" (vectors) and discovered the vector addition that we all know and love.

Witness the power of whole values. We treat a vector as single entity, rather than talk about length and direction separately. This puts vectors under the auspices of an ancient and powerful algebra (Abelian groups).

Product design

For the product of two vectors, Grassmann was inspired by his father, who in an 1824 publication defined the area of a rectangle to be the formal product of its two sides, a high-brow version of the procedure children learn to compute the area of a rectangle. The difference is that rather than an unadorned number, the fancy algebraic area, if nonzero, involves two symbols representing the sides, which act like units of measurement.

Grassmann generalized by defining the exterior product or outer product to be a parallelogram so the input vectors need not be perpendicular. We write:

\[ \u \wedge \v \]

for the oriented parallelogram whose edges are \(\u\) and \(\v\). As with the cross product, orientation matters because Grassmann insisted upon distributivity. For example:

\[ 0 = \u \wedge (\v - \v) = (\u \wedge \v) + (\u \wedge (-\v)) \]

which means:

\[ \u \wedge \v = -(\u \wedge (-\v)) \]

Outer products "may be regarded as products of an adjacent pair of their sides, provided one again interprets the product, not as the product of their lengths, but as that of the two displacements with their directions taken into account."

We find:

\[ (\u + \v) \wedge (\u + \v) = \u \wedge \u + \u \wedge \v + \v \wedge \u + \v \wedge \v \]

The left-hand side is zero because it is a degenerate parallelogram. Same for the first and last terms on the right-hand side. This forces anticommutativity:

\[ \u \wedge \v = -\v \wedge \u \]

Grassmann was "initially perplexed by the remarkable result that, although the laws of ordinary multiplication, including the relation of multiplication to addition, remained valid for this new type of product, one could only interchange factors if one simultaneously changed the sign…" But Grassmann accepted it, because the algebra demanded it, and eventually welcomed it, as he realized the exterior product was a natural approach to the theory of determinants.

For example, if \(\e_1, \e_2\) are orthonormal, then by distributivity, anticommutativity, and eliminating degenerate parallelograms:

\[ (a \e_1 + b \e_2) \wedge (c \e_1 + d \e_2) = a \e_1 \wedge d \e_2 + b \e_2 \wedge c \e_1 = (a d - b c) (\e_1 \wedge \e_2) \]

We have recovered the formula for the 2x2 determinant, which is also the signed area of the parallelogram described by the input vectors.

We can view an outer product of two vectors as an equivalence class of parallelograms with the same area, all of which lie in the same plane through the origin, and whose edges have the same orientation. Today, we say bivector. (Actually, 2-blade is more accurate because in higher dimensions, a bivector can also mean the formal sum of 2-blades; see my other posts on this subject.)

We extend the outer product to higher-dimensional parallelotopes, which intuitively is associative. Then in an \(n\)-dimensional space:

\[ \v_1 \wedge … \wedge \v_n = \det A (\e_1 \wedge … \wedge \e_n) \]

where \(\v_1,…,\v_n\) are the column vectors (or row vectors) of \(A\) and \(\e_1,…,\e_n\) is the standard basis.

Instead of writing "are linearly independent" all over the place, we could write expressions like \(\v_1 \wedge \v_2 \wedge \v_3 \ne 0.\) In fact, Grassmann chose the word "outer" or "exterior" because a nonzero exterior product requires the space described by one input to be completely outside the space described by the other.

A War of Nerds

At first sight, quaternions might appear easier to figure out than outer products. We already know the complex numbers; all that remains is to add another imaginary number and a few rules. But Hamilton struggled for fifteen years to accomplish this. When the idea of non-commutative multiplication finally struck him, he carved an equation into a bridge near where he was taking a walk:

\[ i^2 = j^2 = k^2 = ijk = -1 \]

Perhaps there’s a tortoise-versus-hare lesson here. Building a theory step-by-step using algebraic laws as a guide sounds like a slow process, yet it beat a mad dash to the finish line.

When Hamilton learned of Grassmann’s work, he spoke of it admiringly. Sadly, this cordiality disappeared within a generation. Petsche writes:

The interplay of common features and differences in Graßmann’s and Hamilton’s thinking led to a fierce feud between two national "schools", which had begun to form around 1890, consisting of Graßmannians and Hamilton’s Quaternionists. It lasted until the First World War, both parties claiming to be the only and true representatives of the respective school of thought. The consequence of this was total conceptual chaos, which was only made worse by the Graßmannians’ manic ambition of creating an ever new, specifically German terminology. The independent lines of reception and application of Hamilton’s and Graßmann’s work by Gibbs (1881) in the USA and Peano (1888) in Italy only added to this confusion.

A product of our times

We’ve rightfully praised bundling a length and direction in a whole value, but at some point we want to unbundle to get at, say, just the length. What use is a Haskell tuple without fst and snd? Or in category theory terms, what use is a product without projection morphisms?

Algebra omnia vincit, so Grassmann sought a function \(f\) that extracted length information from a vector, that is \(f(\v) = f(\w)\) exactly when the lengths of \(\v\) and \(\w\) are equal.

Grassmann reasoned that in one-dimensional space, vectors are real numbers, and we must have \(f(p) = f(-p)\). He thus considered \(f(p) = p^2\), "the simplest to satisfy this condition."

To square something is to give a binary operator the same input twice, and Grassmann generalized by defining the inner product or scalar product of one-dimensional vectors \(a\) and \(b\) to be \(ab\), and viewed the inner square of a vector as the square of its magnitude. In other words, we slap on a new title "inner product" to plain old multiplication. Grassmann used the ordinary multiplication symbol, but we’ll used a centered dot (\(\cdot\)).

In higher dimensions, Grassmann began by defining the inner product for every one-dimensional subspace. Then for any vectors \(\u, \v\), algebraic laws demand that:

\[ |\u +\v|^2 = (\u + \v) \cdot (\u + \v) = \u \cdot \u + \v \cdot \v + 2 \u \cdot \v = |\u|^2 + |\v|^2 + 2 \u \cdot \v \]

If at least one of \(\u, \v\) is zero, then we must have \( \u \cdot \v = 0 \). Otherwise, by the cosine rule, we must have:

\[ \u \cdot \v = |\u| |\v| \cos \theta \]

where \(\theta\) is the angle between the two vectors, that is, the inner product is the magnitude of one vector projected on the other.

If \(\e_1, \e_2\) are orthonormal, then \(\e_1 \cdot \e_2 = 0\), hence:

\[ (a \e_1 + b \e_2) \cdot (c \e_1 + d \e_2) = ac \e_1 \cdot \e_1 + bd \e_2 \cdot \e_2 = ac + bd \]

recovering the formula for the two-dimensional dot product.

The dot product is often said to have originated in quaternion algebra, but it seems Grassmann found it first during his investigations into inner products.

I suppose Grassmann chose the word "inner" to contrast with "exterior"; a nonzero inner product means the two input vectors have something in common, and there is something vaguely inward about projecting one onto the other before multiplying lengths together.

Today, Grassmann’s term "inner product" has been elevated to the realm of abstract algebra, where it means a function taking two real or complex vectors and returning an element of the underlying field satisfying certain conditions (conjugate symmetry, linearity, and positive definiteness). An inner product induces a norm, the canonical norm, given by:

\[ |\v|^2 = \v \cdot \v \]

William Clifford

William Clifford was another original thinker, who suffered a worse fate than being unappreciated until late in his life, because he died at the age of 33.

Clifford understood how Hamilton’s quaternions fit into Grassmann’s algebra. Had his ideas spread while he was alive, perhaps the war of the geometers would have ended early, or devolved into a shouting match over nomenclature.

Clifford defined the geometric product as a formal sum of the inner and outer products:

\[ \v \w = \v \cdot \w + \v \wedge \w \]

This looks suspiciously easy, but it is more work than it appears, because we need a domain that is large enough to include scalars, vectors, bivectors, and so on, along with rules governing their interaction.

Happily, the rules turn out to be simple and elegant. Nowadays we generalize them beyond the standard dot product: for a vector space \(V\) and quadratic form \(Q\), the Clifford algebra \(\DeclareMathOperator{\Cl}{Cl} \Cl(V, Q)\) is the freest algebra (in a sense that can be made precise) generated by a vector space \(V\) with the condition:

\[ \v^2 = Q(\v) 1 \]

If we take \(V = \R^n\) and \(Q\) to be the dot product, we obtain the algebra developed by Grassmann and Clifford. In the case \(n = 3\), let \(\e_1, \e_2, \e_3\) be the standard basis, and define the bivectors:

\[i = \e_2 \e_3 , j = \e_3 \e_1 , k = \e_1 \e_2 \]

We have just constructed the quaternions. Also, the cross product of conventional vector algebra is little more than an outer product:

\[ \u \times \v = (\u \wedge \v)\e_3\e_2\e_1 \]

As for the complex numbers, in any dimension, for any two nonzero vectors \(\v, \w\):

\[ \v \w = |\v| |\w| e^{\i \theta} \]

where \(\theta\) is the angle from \(\v\) to \(\w\) and \(\i\) is the unit bivector with the same orientation as \(\v \wedge \w\) and lies in the same plane. We choose this notation because \(\i^2 = -1\).

Geometric algebra complex numbers work just like regular complex numbers, so we can use familiar calculations in any plane, no matter how many dimensions there are in our space. Applications like this show that the bivector is yet another triumph of the whole value principle.

Clifford’s geometric algebra was a one-stop shop for all the geometric and algebraic needs of his contemporaries: Cartesian coordinates, complex numbers, quaternions, vector algebra, Grassmann algebra.

Powers are powerful

Angles appear as exponents in geometric algebra complex numbers, which means we can add them simply by multiplying expressions together. There’s no need to wrestle with trigonometry.

Take any three coplanar nonzero vectors \(\u, \v, \w\), and let:

\(\alpha\) be the angle from \(\u\) to \(\v\)
\(\beta\) be the angle from \(\v\) to \(\w\)
\(\gamma\) be the angle from \(\u\) to \(\w\)

As usual, orientation matters and these are signed angles. Then:

\[ |\v|^2 \u \w = \u \v \v \w = |\u| |\v| e^{\i\alpha} |\v| |\w| e^{\i\beta} = |\u| |\v|^2 |\w| e^{\i(\alpha + \beta)} \]

where \(\i\) is the suitably oriented unit bivector in the plane. We also have:

\[ \u \w = |\u| |\w| e^{\i\gamma} \]

As expected, this means:

\[ \alpha + \beta = \gamma \]

Try showing this simple fact with conventional vector algebra!

Now let:

\(\alpha\) be the angle from \(-\u\) to \(\v\)
\(\beta\) be the angle from \(-\v\) to \(\w\)
\(\gamma\) be the angle from \(-\w\) to \(\u\)

Then:

\[ - |\u|^2 |\v|^2 |\w|^2 = (-\u) \v (-\v) \w (-\w) \u = |\u| |\v| e^{\i\alpha} |\v| |\w| e^{\i\beta} |\u| |\w| e^{\i\gamma} \]

Hence \(e^{\i(\alpha + \beta + \gamma)} = -1.\) That is, the angles sum to an odd multiple of \(\pi\).

If \(\u + \v + \w = 0\) then the three vectors describe a triangle, which implies the sum of the absolute values of the angles is \(\pi\), because each angle has the same sign, and their absolute values lie in \((0..\pi).\)

If \(\gamma = 0\) then this impies that the absolute values of consecutive interior angles sum to \(\pi\).

Again, it’s reassuring that simple geometric facts have simple proofs, echoing Grassmann’s experience. His research on the theory of tides took him "to Lagrange’s Mécanique Analytique, and thence back to the ideas of this analysis. All the developments in that work were transformed by the principles of this new analysis into such simple procedures that the calculations often came out one-tenth as long as there."

For more demos, see my pages on computer graphics with geometric algebra.

Give peace a chance

How should vector algebra be taught?

I can see arguments for the current curriculum. Everyone understands how to plug coordinates into formulas for the dot product and the cross product. And we can mostly sweep non-commutative multiplication under the rug, a concept that took even Grassmann by surprise.

The wackiness of the cross product sticks out like a sore thumb, but one must stick out a thumb anyway for the right-hand rule, and as long as students handle it with care, they’ll soon be solving problems in three-dimensional space.

However, this speedy lesson plan has a heavy price. The most visible drawbacks are occasional skirmishes reminiscent of the great war between Grassmannians and Quaternionists. Someone will post a diatribe explaining why, say, quarternions are obviously superior or obviously inferior. I can’t help thinking: "It’s all the same, you ignoramus! Go learn how Clifford brokered the peace!"

Less visible, but more worrying, is that the importance of algebra is hidden from students. Why should they learn strange notation and strange rules that can’t easily describe the sum of two angles right next to each other? How can they appreciate algebraic laws when the cross product is not even associative? How can they scale the heights reached by great mathematicians when they’re forced to resort to prose?

Learning geometric algebra may take longer, but is worth the effort. We gain a better understanding of determinants (via outer products) and dual spaces, as well as complex numbers and quaternions. And we free geometry from its Euclidean prison of prose and diagrams.

Ben Lynn blynn@cs.stanford.edu 💡