Determinants

We’ve managed to reach the Jordan canonical form without encountering determinants. But at some point, anyone studying linear algebra must learn about these things.

According to Thomas Muir, The theory of determinants in the historical order of development:

1693: Leibniz defines the determinant, albeit with an awkward and impractical method of finding the sign of a permutation.
1748: Fontaine finds identities involving some kinds of determinants.
1750: Cramer introduces Cramer’s rule, a general formula that solves for any variable in a linear system; the determinant of the coefficient matrix is the denominator of the answer, and the determinant of a related matrix is the numerator.

Leibniz’s contribution was overlooked, while Cramer’s work was noticed and spread; gradually, then suddenly.

(Muir assumes the reader is fluent in French, and other languages. See Antoni Kosinski, Cramer’s Rule Is Due to Cramer for an English version of the relevant paragraphs.)

Seki Takakazu also discovered the determinant around the same time, and it is possible that other Japanese mathematicians found it independently.

Permutations

We need to enumerate permutations and compute their signs:

-- Bird and Gibbons, Algorithm Design with Haskell
inserts x = \case
  [] -> [[x]]
  (y:ys) -> (x:y:ys):map (y:) (inserts x ys)
perms = foldr (\x xss -> inserts x =<< xss) [[]]

sgnPerm = \case
  [] -> id
  x:xs -> foldr (.) (sgnPerm xs) [negate | y <- xs, x > y]

We show the permutations of \([1..4]\) and their signs:

[(pi, sgnPerm pi 1) | pi <- perms [1..4]]

Cramer’s Rule

Suppose we wish to solve the linear system:

\[ \newcommand{\v}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \DeclareMathOperator{\sgn}{sgn} \begin{align} a_{11} v_1 + & … + a_{1n} v_n = b_1 \\ & \vdots \\ a_{n1} v_1 + & … + a_{nn} v_n = b_n \\ \end{align} \]

for \(v_1, …, v_n\). In modern notation:

\[ A \begin{bmatrix} v_1 \\ \vdots \\ v_n \end{bmatrix} = \begin{bmatrix} b_1 \\ \vdots \\ b_n \end{bmatrix} \]

where \(A\) is an \(n\times n\) matrix with coefficients \(a_{ij}\). Define the determinant of \(A\) by:

\[ \sum_{\pi\in S_n} \sgn(\pi) a_{\pi(1)1} … a_{\pi(n)n} \]

where \(S_n\) is the set of permutations of \( [1..n] \). This is equivalent to:

\[ \sum_{\pi\in S_n} \sgn(\pi) a_{1\pi(1)} … a_{n\pi(n)} \]

because a permutation and its inverse have the same sign.

Instead of \(a_{ij}\), Leibniz wrote \(ij\), and we write code to reproduce something similar to what he wrote for the formula for the determinant of a 3x3 matrix:

leibniz n = concat [plusOrMinus (sgnPerm pi) ++ intercalate "."
    (zipWith go [1..] pi) | pi <- perms [1..n]]
  where
  go i j = show i ++ show j

plusOrMinus f
  | f 1 > 0 = " + "
  | otherwise = " - "

putStr $ leibniz 3

We denote the determinant of \(A\) by \(\det(A)\) or \(|A|\). The vertical bars are handy when writing out matrices in full:

\[ \begin{vmatrix} 1 & 2 \\ 3 & 4 \end{vmatrix} = 1\times 4 - 2\times 3 = -2 \]

det a = sum [sgnPerm pi $ product $ zipWith get pi [1..n]
    | pi <- perms [1..n]]
  where
  n = length a
  get i j = a!!(i - 1)!!(j - 1)

det [[1, 2], [3, 4]]

We also compute the determinant of the matrix from The Nine Chapters:

riceEqns =
  [ [3, 2, 1, 39]
  , [2, 3, 1, 34]
  , [1, 2, 3, 26]
  ]

a = init <$> riceEqns
bs = last <$> riceEqns

det a

We turn this expression into a function where the first column of the matrix has been replaced with input variables. Define the function \(f : K^n → K\) by:

\[ f(x_1,…,x_n) = \sum_{\pi\in S_n} \sgn(\pi) x_{\pi(1)} a_{\pi(2)2} … a_{\pi(n)n} \]

detCol k a xs = sum [sgnPerm pi $ product $ zipWith get pi [1..n]
    | pi <- perms [1..n]]
  where
  n = length a
  get i j
    | j == k = xs!!(i - 1)
    | otherwise = a!!(i - 1)!!(j - 1)

f = detCol 1 a

We could have started from this function and defined:

\[ \det(A) = f(a_{11}, …, a_{n1}) \]

f $ head <$> a

Then if \(\det(A) \ne 0\), we can solve for \(v_1\):

\[ v_1 = \frac{f(b_1,…,b_n)}{\det(A)} \]

f bs / det a

The numerator can be seen as the determinant of the matrix \(A\) except the first column has been replaced by \(b_1,…,b_n\).

Solving for \(v_2, …, v_n\) is similar.

detCol 2 a bs / det a
detCol 3 a bs / det a

Cramer stated his results without proof, which is fair as he was writing about fitting curves through points, and was just describing a shortcut he used to solve linear equations in passing. My guess is that Cramer tired of Gaussian elimination, so he slogged through an epic Gaussian elimination on a matrix whose entries were all variables, and then he noticed a pattern which led to his rule.

Proof: Consider applying \(f\) to the second column. This is \(\det(A')\) for some matrix \(A'\) whose first column is identical to the second column, that is if \(a'_{ij}\) denotes the entries of \(A'\), we have:

\[ a'_{k1} = a'_{k2} \]

for all rows \(k\). In other words:

\[ f(a_{12}, ..., a_{n2}) = \det(A') = \sum_{\pi\in S_n} \sgn(\pi) a'_{1\pi(1)} a'_{2\pi(2)} ... a'_{n\pi(n)} \]

Let \(\tau = (1\:2) \), the permutation that transposes 1 and 2. Then we also have:

\[ \begin{align} \det(A') &= \sum_{\pi\in S_n} \sgn(\tau \pi) a'_{1\tau\pi(1)} a'_{2\tau\pi(2)} ... a'_{n\tau\pi(n)} \\ & = \sum_{\pi\in S_n} \sgn(\tau \pi) a'_{1\pi(1)} a'_{2\pi(2)} ... a'_{n\pi(n)} \end{align} \]

Since \(\sgn(\tau) = -1\) we have shown \(\det(A') = -\det(A')\) and hence \(f(a_{12},…,a_{n2}) = 0\). In general:

\[ f(a_{1k},…,a_{nk}) = 0 \]

for all columns \(k > 1\).

f ((!!1) <$> a)
f ((!!2) <$> a)

Next, from its definition, we see that \(f\) is a linear combination of its inputs, that is:

\[ f(x_1,…,x_n) = c_1 x_1 + … + c_n x_n \]

for some \(c_1, …, c_n \in K\).

Multiplying the \(i\)th equation in our system by \(c_i\) and summing them all yields:

\[ \begin{align} f(a_{11},…,a_{n1}) v_1 + f(a_{12},…,a_{n2}) v_2 + … + f(a_{1n},…,a_{nn}) v_n \\ = f(b_1,…,b_n) \end{align} \]

From above, most of these terms disappear, and we’re left with:

\[ \det(A) v_1 = f(b_1,…,b_n) \]

That is, if a solution exists, then \(v_1\) must satisfy this equation.

QED.

This proof has some elements in common with the Chinese Remainder Theorem and the Lagrange interpolating polynomial. In all cases, we cleverly choose a linear combination where all but one of the terms disappear at the crucial moment.

Today, Cramer’s rule remains useful for humans, but it is less useful for computers, as Gaussian elimination is at least as fast and gives us control over the order of operations, which improves accuracy when using floating point.

However, the determinant is still vital for studying properties of the the system of equations as a whole, which is how Leibniz used it.

We show that if there is no solution, or if there are infinite solutions, then \(\det(A) = 0\). The infinite-solutions case is easy: if \(\det(A) \ne 0\), our reasoning above implies \(v_1 = f(b_1.,,,.b_n)/\det(A)\), and in general every \(v_i\) must have a unique value, a contradiction because there are infinite solutions.

We reduce the no-solution case to the infinie-solution case. Replace every \(b_i\) with zero, which makes the system homogeneous. Then at least one solution exists: \(v_1 = … = v_n = 0\). If this is the only solution, then some sequence of row operations must change the coefficient matrix into the identity matrix. But this sequence of row operations implies the original system has a unique solution, a contradiction since it’s supposed to have no solutions. Thus there must exist infinite solutions, which we have just seen implies \(\det(A) = 0\).

We also show the existence of a unique solution implies \(\det(A) \ne 0\). (Thus Cramer’s rule is always useful: it would be frustrating if we calculated a determinant only to find \(0 \v_1 = 0\).)

We can prove this by first proving:

\[ \det(A)\det(B) = \det(AB) \]

If \(A\) is noninvertible then \(AB\) must also be noninvertible, otherwise we can construct an inverse of \(A\). Thus \(\det(A) = 0\) implies \(\det(AB) = 0\). Similarly for noninvertible \(B\).

Otherwise, consider an invertible matrix \(A\). Then we can reach the identity matrix via elementary row operations. The corresponding elementary matrices are simple enough that we can easily compute their determinants:

the matrix scaling a row by \(a\) has determinant \(a\)
the matrix swapping any two rows has determinant \(-1\)
the matrix adding a multiple of one row to another has determinant \(1\)

From the formula for the determinant, one can check that performing the corresponding operations on matrix \(A\) scales \(\det(A)\) by the same factor; the toughest case is adding a multiple of one row to another, which in brief is a consequence of two identical rows leading to a zero determinant (just like two identical columns, which we used above), and because multiplication distributes over addition.

Thus we show \( \det(A)\det(B) = \det(AB) \) for invertible matrices by breaking them down into elementary matrices.

Signed area

Consider a unit square with one corner at the origin and the opposite corner at \( (1, 1) \). If we scale it horizontally or vertically by \(a\), its area scales by \(a\), while any horizontal or vertical shear leaves its area unchanged, as it becomes a parallelogram with unit base and unit height.

The same is true for any unit parallelogram in any orientation, which we can see by dissecting it with up to two horizontal or vertical lines.

Let us refine the concept of area into signed area. We now require a parallelogram to be specified as an ordered pair of points, which we interpret as vectors \( (\v, \w) \) from the origin that are two adjacent edges of the parallelogram.

Rotate so that \(\v\) points in the same direction as the standard basis vector \( (1, 0) \). Then the area of the parallelogram has the same sign as the \(y\)-coordinate of \( \w \).

Write this parallelogram in matrix form with these two vectors as rows:

\[ P = \begin{bmatrix} v_1 & v_2 \\ w_1 & w_2 \end{bmatrix} \]

If \(f\) returns the signed area of a parallelogram given in this form, then we have shown for 2x2 matrices:

\[ |E| f(P) = f(E P) \]

for any elementary matrix \(E\). Defining \(f(1) = 1\) and \(f(A) = 0\) when \(A\) is a degenerate parallelogram gives \(\det = f\), that is \(\det\) can be interpreted as returning the signed area of a given parallelogram, or, returning by how much a given transformation scales the signed area of its inputs.

These statements generalize to all dimensions; to all hypervolumes and parallelotopes. The sign of the signed area is like a parity check bit: roughly speaking, as we compare successive edges with successive vectors of the standard basis, we flip the sign every time we wind up in the negative half.

Area of a triangle

What is the area of the triangle with vertices at \( (x_0, y_0), (x_1, y_1), (x_2, y_2) \)?

This question frustrated me when I was young. It seems so simple, as what could be more natural then specifying a triangle with three pairs of coordinates? But the only trick I knew of was Heron’s formula:

\[ A = \sqrt{s(s-a)(s-b)(s-c)} \]

where \(a, b, c\) are the side lengths, and \(s\) is half of their sum (the semiperimeter). This is no fun to compute with pen and paper, even with the aid of a basic calculator.

Now that we know the determinant is the signed area of a parallelogram, we can simply take half the absolute value of:

\[ \begin{vmatrix} x_1 - x_0 & y_1 - y_0 \\ x_2 - x_0 & y_2 - y_0 \end{vmatrix} = (x_1 - x_0)(y_2 - y_0) - (x_2 - x_0)(y_1 - y_0) \]

Alternatively, consider the triangular pyramid whose tip is the origin and whose base vertices are the standard basis coordinates \( (1,0,0), (0,1,0), (0,0,1) \). The volume of this pyramid is \(V = 1 / 6.\) (The base triangle is equilateral with area \(\sqrt{3} / 2\) and the height is \(1/\sqrt{3}\).)

The matrix:

\[ \begin{bmatrix} x_0 & x_1 & x_2 \\ y_0 & y_1 & y_2 \\ 1 & 1 & 1 \\ \end{bmatrix} \]

transforms the vertices of our triangular pyramid so the base is the given triangle embedded in the \(z = 1\) plane. The volume of transformed pyramid is \(dV = d / 6\) where \(d\) is the determinant of the matrix. The \(z = 1\) condition implies the height is now 1, so the volume is also \(A / 3\) where \(A\) is the area of the embedded triangle.

Hence \(A = d / 2\), that is, the area of the given triangle is half the magnitude of:

\[ \begin{vmatrix} x_0 & x_1 & x_2 \\ y_0 & y_1 & y_2 \\ 1 & 1 & 1 \\ \end{vmatrix} \]

The sign of a signed area tells us the side of a line a given point lies, which is useful in computer graphics.

The Characteristic Equation

Now that we have the determinant at our disposal, we can clear a different path to eigenvalues. The characteristic equation of an \(n\times n\) matrix \(A\) is:

\[ \det (A - x) \]

A scalar \(\lambda\) is a root if and only if \(\det (A - \lambda) = 0\). This happens exactly when \(A - \lambda\) is non-invertible, and any vector \(v\) in its nontrivial kernel satisfies:

\[ A \v = \lambda \v \]

In other words, we see \(\v\) is an eigenvector with eigenvalue \(\lambda\).

The characteristic equation of the transition matrix \(T\) from our rabbits example is:

\[ \begin{vmatrix} 1 - x & 1 \\ 1 & -x \end{vmatrix} = (1 - x)(-x) - 1 = x^2 - x - 1 \]

The roots of \(x^2 - x - 1\) are:

\[ \begin{align} \phi &= \frac{1 + \sqrt{5}}{2} \\ \psi &= \frac{1 - \sqrt{5}}{2} \end{align} \]

In this case, there are 2 distinct roots, so the Jordan normal form of \(T\) must be:

\[ \begin{bmatrix} \phi & 0 \\ 0 & \psi \end{bmatrix} \]

Next, with Gaussian elimination, we solve:

\[ \begin{bmatrix} 1 - \phi & 1 \\ 1 & -\phi \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \]

to find the eigenspace with eigenvalue \(\phi\). We soon find that the solutions are generated by the vector:

\[ \begin{bmatrix} \phi \\ 1 \end{bmatrix} \]

Similarly, we find the eigenvectors corresponding to the eigenvalue \(\psi\) are generated by:

\[ \begin{bmatrix} \psi \\ 1 \end{bmatrix} \]

This leads to the matrix that changes the basis from the standard basis to the above two eigenvectors, along with its inverse, and we have thus revealed secret behind the magic numbers that helped us compute the long-term behviour of the rabbit population.

Ben Lynn blynn@cs.stanford.edu 💡