introduction-to-linear-algebra

Introduction to Linear Algebra by Gilbert Strang

The word “unit” is always indicating that some measurement equals “one”.

TODO: keep all the vector symbols have a boldface style.

Introduction to Vectors

  1. Vectors and Linear Combinations
  2. Lengths and Dot Products
    1. The “dot product” of \(v=\begin{bmatrix} 1 \\ 2 \end{bmatrix}\) and \(w=\begin{bmatrix} 4 \\ 5 \end{bmatrix}\) is \(v\cdot w=1\cdot 4+2\cdot 5=4+10=14\)

    2. The dot product is \(v\cdot w=0\) when \(v\) is perpendicular to \(w\)
      The zero vector is perpendicular to every vector.
      The angle is less 90° when \(v\cdot w\) is positive. The angle is above 90° when \(v\cdot w\) is negative.
      Proof: When \(v\) and \(w\) are perpendicular, they form two sides of a right triangle.
      The third side is \(v-w\) (the hypotenuse going across in image below). With Pythagoras Law \(a^2+b^2=c^2\) :
      \[\|v\|^2+\|w\|^2=\|v-w\|^2\]
      Writing out the formulas for those lengths in two dimensions, this equation is:
      \[(v_1^2+v_2^2)+(w_1^2+w_2^2)=(v_1-w_1)^2+(v_2-w_2)^2 \\ \rArr 0=-2v_1w_1-2v_2w_2 \\ \rArr v_1w_1+v_2w_2=0\]

    3. The length squared of \(v=\begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}\) is \(v\cdot v=1+9+4=14\) . The length is \(\|v\|=\sqrt{14}\)

    4. Divide \(v\) by its length \(\|v\|\) to get its unit vector: \(u=\frac{v}{\|v\|}=\frac{v}{\sqrt{14}}=\frac{1}{\sqrt{14}}\begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}\)
      \(\begin{bmatrix} \cos\theta \\ \sin\theta \end{bmatrix}\) were unit vectors. These vectors reach out to the unit circle in images below. Thus \(\cos\theta\) and \(\sin\theta\) are simply the coordinates of that point at angle \(\theta\) on the unit circle.

    5. The angle \(\theta\) between \(v\) and \(w\) has \(\cos\theta=\frac{v\cdot w}{\|v\|\cdot\|w\|}\)
      All angles have \(|\cos\theta|\leqslant 1\) . So all vectors have \(|v\cdot w|\leqslant\|v\|\cdot\|w\|\)
      Unit vector \(u\) and \(U\) at angle \(\theta\) have \(u\cdot U=\cos\theta\)

    • Example 2: Put a weight of 4 at the point \(x=-1\) (left of zero) and a weight of 2 at the point \(x=2\) (right of zero).
      The axis will balance on the center point (like a see-saw).
      The weights balance because the dot product is \(4\cdot -1+2\times 2=0\)

      This example is typical of engineering and science. The vector of weight is \((w_1,w_2)=(4,2)\) .
      The vector of distances from the center is \((v_1,v_2)=(-1,2)\) . The weights times the distances (\(w_1v_1\) and \(w_2v_2\)), give the “moments”. The equation for the see-saw to balance is \(w_1v_1+w_2v_2=0\)

    • Example 3: Dot products enter in economics and business. We have three goods to buy and sell.
      Their prices are \((p_1,p_2,p_3)\) for each unit–this is the “price vector” \(p\) .
      The quantities we buy or sell are \((q_1,q_2,q_3)\) : positive when we sell, negative when we buy.
      Selling \(q_1\) units at the price \(p_1\) brings in \(q_1p_1\) . The total income (quantities \(q\) times prices \(p\)) is the dot product \(q\cdot p\) in three dimensions:
      \[\textbf{Income}=(q_1,q_2,q_3)\cdot(p_1,p_2,p_3)=q_1p_1+q_2p_2+q_3p_3\]

      A zero dot product means that “the books balance”. Total sales equal total purchases if \(q\cdot p=0\) . Then \(p\) is perpendicular to \(q\) (in three-dimensional space). A supermarket with thousands of goods goes quickly into high dimensions.
      Small note: Spreadsheets have become essential in management. They compute linear combinations and dot products. What you see on the screen is a matrix.

  3. Matrices
    1. Matrix times vector: \(Ax=\) combination of the columns of \(A\)

    2. The solution to \(Ax=b\) is \(x=A^{-1}b\) when \(A\) is an invertible matrix.

    3. Matrices which three columns lie in the same plane has no inverses. (e.g. cyclic matrix \(C\))

    4. \(Ax\) is dot products with columns:

      $$ Ax=\begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} =x_1\begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} +x_2\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} +x_3\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} $$

      \(Ax\) is also dot products with rows:

      $$ Ax=\begin{bmatrix} 1 & 0 & 0 \\ -1 & 1 & 0 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} =\begin{bmatrix} (1,0,0)\cdot(x_1,x_2,x_3) \\ (-1,1,0)\cdot(x_1,x_2,x_3) \\ (0,-1,1)\cdot(x_1,x_3,x_3) \end{bmatrix} $$
    5. Equation in matrix form \(Ax=b:\begin{bmatrix} 2 & 5 \\ 3 & 7 \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}=\begin{bmatrix} b_1 \\ b_2 \end{bmatrix}\) replaces \(\begin{aligned} 2x_1+5x_2=b_1 \\ 3x_1+7x_2=b_2 \end{aligned}\)

Solving Linear Equations

TODO: integrate the ideas in this chapter

  1. Vectors and Linear Equations

    • The Matrix Form of the Equations
      Matrix-vector multiplication \(Ax\) can be computed by dot products, a row at a time. But \(Ax\) must be understood as a combination of the columns of \(A\)
      Multiplication by rows: \(Ax=\begin{bmatrix} (\textbf{row 1})\cdot x \\ (\textbf{row 2})\cdot x \\ (\textbf{row 3})\cdot x \end{bmatrix}\)
      Multiplication by columns: \(Ax=x_1(\textbf{column 1})+x_2(\textbf{column 2})+x_3(\textbf{column 3})\)
      The identity matrix \(I=\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\) always yields the multiplication \(Ix=x\)
  2. The Idea of Elimination
    Pivot: first nonzero in the row that does the elimination
    Multiplier \(\ell\): entry to eliminate divided by pivot

    1. For matrices with \(m=n=3\) , there are three equations \(Ax=b\) and three unknowns \(x_1,x_2,x_3\)

    2. The equations will like \(\begin{cases} a_{11}x_1+a_{12}x_2+a_{13}x_3=b_1 \\ a_{21}x_1+a_{22}x_2+a_{23}x_3=b_2 \\ a_{31}x_1+a_{32}x_2+a_{33}x_3=b_2 \end{cases}\)

    3. Column 1. Use the first equation to create zeros below the first pivot.
      Column 2. Use the new eqution 2 to create zeros below the second pivot.
      Columns 3. to \(n\). keep going to find all \(n\) pivots and the upper triangular \(U\).

      Eliminate \(x_1\) from every remaining equation \(i\) by subtracing \(\frac{a_{i1}}{a_{11}}\) times the first equation.
      We subtract \(\ell_{ij}\) times equation \(j\) from equation \(i\) , to make the \((i,j)\) entry zero.

    4. Elimination breaks down if zero appears in the pivot. Exchanging two equations may save it.
      When breakdown is permanent, \(Ax=b\) has no solution or infinitely many.

    5. Elimination produces an upper triangular system.
      The upper triangular \(Ux=c\) is solved by back substitution (starting at the bottom).

  3. Elimination Using Matrices
    The word “entry” for a matrix corresponds to “componend” for a vector. General rule: \(a_{ij}=A(i,j)\) is in row \(i\) , column \(j\).

    1. The Matrix Form of One Elimination Step
      The identity matrix has 1’s on the diagonal and otherwise 0’s. Then \(Ib=b\) for all b.
      The elementary matrix or elimination matrix \(E_{ij}\) has the extra nonzero entry \(-\ell_{ij}\) in the \(i,j\) position. Then \(E_{ij}\) subtracts a multiple \(\ell_{ij}\) of row \(j\) from row \(i\).
      e.g. The purpose of \(E_{31}\) is to product a zero in the \((3,1)\) position of the matrix.
    2. Matrix Multiplication
      Associative law is true: \(A(BC)=(AB)C\)
      Commutative law is false: \(\text{Often}\enspace AB\neq BA\)
      Matrix multiplication \(AB=A\begin{bmatrix} b_1 & b_2 & b_3 \end{bmatrix}=\begin{bmatrix} Ab_1 & Ab_2 & Ab_3 \end{bmatrix}\)
    3. The Matrix \(P_{ij}\) for a Row Exchange
      A row exchange is needed when zero is in the pivot position.
      Row Exchange Matrix: \(P_{ij}\) is the identity matrix with rows \(i\) and \(j\) reversed.
      When this “permutation matrix” \(P_{ij}\) multiplies a matrix, it exchanges rows \(i\) and \(j\)
      for example: to exchange equations 1 and 3 multiply by \(P_{13}=\begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix}\)
    4. The Augmented Matrix
      Elimination does the same row operations to \(A\) and to \(b\) . We can include \(b\) as an extra column and follow it through elimination. The matrix \(A\) is enlarged or “augmented” by the extra column \(b\) :
      \(\begin{bmatrix} A & b \end{bmatrix}=\begin{bmatrix} 2 & 4 & -2 & \mathbf{2} \\ 4 & 9 & -3 & \mathbf{8} \\ -2 & -3 & 7 & \mathbf{10} \end{bmatrix}\)
      For the augmented matrix \(\begin{bmatrix} A & b \end{bmatrix}\) , that first elimination step gives \(\begin{bmatrix} E_{21}A & E_{21}b \end{bmatrix}\)

    Elimination multiplies \(Ax=b\) by \(E_{21},E_{31},\dots,E_{n1}\) , then \(E_{32},E_{42},\dots,E_{n2}\) , and onward.

  4. Rules for Matrix Operations

    1. To multiply \(AB\) , if \(A\) has \(n\) columns, \(B\) must have \(n\) rows.
    2. Matrices \(A\) with \(n\) columns multiply matrices \(B\) with \(n\) rows: \(\boxed{A_{m\times n}B_{n\times p}=C_{m\times p}}\)
    3. Each entry in \(AB=C\) is a dot product: \(C_{ij}=(\text{row }i\text{ of }A)\cdot(\text{column }j\text{ of }B))\)
    4. The second, third and fourth way to compute \(AB\):
      • Matrix \(A\) times every column of \(B\) : \(A\begin{bmatrix} b_1 & \cdots & b_p \end{bmatrix}=\begin{bmatrix} Ab_1 & \cdots & Ab_p \end{bmatrix}\)
      • Every row of \(A\) times matrix \(B\) : \(\begin{bmatrix} \text{row }i\text{ of }A \end{bmatrix}B=\begin{bmatrix} \text{row }i\text{ of }AB \end{bmatrix}\)
      • Multiply columns 1 to \(n\) of \(A\) times rows 1 to \(n\) of \(B\). Add those matrices:
        $$ \begin{bmatrix} \text{col 1} & \text{col 2} & \text{col 3} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \end{bmatrix} \begin{bmatrix} \text{row 1} & \cdot & \cdot & \cdot \\ \text{row 2} & \cdot & \cdot & \cdot \\ \text{row 3} & \cdot & \cdot & \cdot \end{bmatrix} =(\text{col 1})(\text{row 1})+(\text{col 2})(\text{row 2})+(\text{col 3})(\text{row 3}) $$
    5. Block multiplication: If blocks of \(A\) can multiply blocks of \(B\) , then block multiplication of \(AB\) is allowed. Cuts between columns of \(A\) match cuts between rows of \(B\):
      \(\begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}\begin{bmatrix} B_{11} \\ B_{21} \end{bmatrix}=\begin{bmatrix} A_{11}B_{11}+A_{12}B_{21} \\ A_{21}B_{11}+A_{22}B_{21} \end{bmatrix}\)
    6. Block elimination: \(\left[\begin{array}{c|c} I & 0 \\ \hline -CA^{-1} & I \end{array}\right]\left[\begin{array}{c|c} A & B \\ \hline C & D \end{array}\right]=\left[\begin{array}{c|c} A & B \\ \hline 0 & D-CA^{-1}B \end{array}\right]\)
      The final block \(D-CA^{-1}B\) is called Schur complement
  5. Inverse Matrices

    1. The inverse matrix gives \(A^{-1}A=I\) and \(AA^{-1}=I\)
    2. The algorithm to test invertibility is elimination: \(A\) must have \(n\) (nonzero) pivots.
      The algebra test for invertibility is the determinant of \(A\) : det \(A\) must not be zero.
    3. Important. If \(Ax=0\) for a nonzero vector \(x\) , then \(A\) has no inverse.
      Suppose there is a nonzero vector \(x\) such that \(Ax=0\) . Then \(A\) cannot have an inverse. No matrix can bring 0 back to \(x\)
    4. The Inverse of a Product \(AB\)
      If \(A\) and \(B\) are invertible then so is \(AB\) . The inverse of a product \(AB\) is \((AB)^{-1}=B^{-1}A^{-1}\)
    5. Calculating \(A^{-1}\) by Gauss-Jordan Elimination
      \(AA^{-1}=I\) is \(n\) equations for \(n\) columns of \(A^{-1}\) . Gauss-Jordan eliminates \(\begin{bmatrix} A & I \end{bmatrix}\) to \(\begin{bmatrix} I & A^{-1}\end{bmatrix}\) :
    6. Recognizing an Invertible Matrix
      Diagonally dominant matrices are invertible. Each \(a_{ii}\) on the diagonal is larger than the total sum along the rest of row \(i\) . On every row,
      \[|a_{ii}|>\sum_{j\neq i}|a_{ij}\]
  6. Elimination = Factorization: \(A=LU\)

    1. Gaussian elimination (with no row exchanges) factors \(A\) into \(L\) times \(U\)

    2. Each elimination step \(E_{ij}\) in inverted by \(E_{ij}^{-1}=L_{ij}\) . Off the main diagonal change \(-\ell_{ij}\) to \(+\ell_{ij}\)

    3. The whole forward elimination process (with no row exchanges) is inverted by \(L\) :
      \[\bm{L}=(L_{21}L_{31}\dots L_{n1})(L_{32}\dots L_{n2})(L_{43}\dots L_{n3})\dots(L_{nn-1})\]

    4. That product matrix \(L\) is still lower trignaular. Every multiplier \(\bm{\ell_{ij}}\) is in row \(\bm{i}\) , column \(\bm{j}\)

    5. The original \(A\) is recovered from \(U\) by \(\bm{A=LU}=(\text{lower triangular})(\text{upper triangular})\)

      • Better balance from LDU
        Divide \(\bm{U}\) by a diagonal matrix \(\bm{D}\) that contains the pivots:
        Split \(U\) into \(\begin{bmatrix} d_1 & & & \\ & d_2 & & \\ & & \ddots & \\ & & & d_n \end{bmatrix}\begin{bmatrix} 1 & \frac{u_{12}}{d_1} & \frac{u_{13}}{d_1} & \cdot \\ & 1 & \frac{u_{23}}{d_2} & \cdot \\ & & \ddots & \vdots \\ & & & 1 \end{bmatrix}\)
    6. Elimination on \(Ax=b\) reaches \(Ux=c\) . Then back-substitution solves \(Ux=c\)
      (\(Ax=b\rArr LUx=b\rArr Ux=L^{-1}b=c\))

    7. Solving a triangular system takes \(\frac{n^2}{2}\) multiply-subtracts. Elimination to find \(U\) takes \(\frac{n^3}{3}\)

  7. Transposes and Permutations

    1. Transpose exchange rows and columns: \((A^T)_{ij}=A_{ji}\)
    2. The transposes of:
      Sum The transpose of \(A+B\) is \(A^T+B^T\)
      Product The transpose of \(AB\) is \((AB)^T=B^TA^T\)
      Inverse The transpose of \(A^{-1}\) is \((A^{-1})^T=(A^T)^{-1}\)
      The reverse order rule extends to three or more factors: \((ABC)^T\) equals \(C^TB^TA^T\)
      • Product proof:
        To understand \((AB)^T=B^TA^T\) , start with \((Ax)^T=x^TA^T\) when \(B\) is just a vector:
        \(\bm{Ax}\) combines the columns of \(\bm{A}\) while \(\bm{x^TA^T}\) combines the rows of \(\bm{A^T}\)\((x^TA^T=x_1\cdot(\textbf{row 1})+x_2\cdot(\textbf{row 2})+x_3\cdot(\textbf{row 3}))\)
        Now we can prove the formula \((AB)^T=B^TA^T\) , when \(B\) has several columns.
        Transposing \(AB=\begin{bmatrix} Ab_1 & Ab_2 & \cdots \end{bmatrix}\) gives \(\begin{bmatrix} b_1^TA^T \\ b_2^TA^T \\ \vdots \end{bmatrix}\) which is \(B^TA^T\)
      • Inverse proof:
        Transpose of inverse \(A^{-1}A=I\) is transposed to \(A^T(A^{-1})^T=I\rArr(A^{-1})^T=(A^T)^{-1}\)
    3. The dot product (inner product) is \(x\cdot y=x^Ty\) . This is \((1\times n)(n\times 1)=(1\times 1)\) matrix.
      The outer product is \(xy^T=\) column times row \(=(n\times 1)(1\times n)=n\times n\) matrix.
    4. The idea behind \(A^T\) is that \(Ax\cdot y=x\cdot A^Ty\) because \((Ax)^Ty=x^TA^Ty=x^T(A^Ty)\)
    5. A symmetric matrix has \(S^T=S\) (and the product \(A^TA\) is always symmetric, because the transpose of \(A^TA\) is \(A^T(A^T)^T=A^TA\))
      If \(S=S^T\) is factored into \(LDU\) with no row exchanges, then \(U\) is exactly \(L^T\) , which is \(S=LDL^T\)
    6. Permutation Matrices
      There are \(n!\) permutation matrices of size \(n\). Half even, half odd.
      Permutation Matrices are even with Orthogonal Matrix \(Q\) (The columns of \(Q\) are orthogonal unit vectors.)
    7. Transpose in Permutation Matrices: \(\bm{P^{-1}=P^T}\)
    8. The \(PA=LU\) Factorization with Row Exchanges
      If \(A\) is invertible then a permutation \(P\) will reorder its rows for \(PA=LU\)

      Sometimes row exchanges are needed to produce pivots. Then \(A=(E^{-1}\cdots P^{-1}\cdots E^{-1}\cdots P^{-1})U\).
      (Every row exchange is carried out by a \(P_{ij}\) and inverted by that \(P_{ij}\)).
      The main question is where to collect the \(P_{ij}\)'s. There are two good possibilities:

      1. The row exchanges can be done in advance. Their product \(P\) puts the rows of \(A\) in the right order, so that no exchanges are needed for \(PA\). Then \(\bm{PA=LU}\).
      2. If we hold row exchanges until after elimination, the pivot rows are in a strange order. \(P_1\) puts them in the correct triangular order in \(U_1\). Then \(\bm{A=L_1P_1U}\)

      \(PA=LU\) is constantly used in all computing. We will concentrate on this form.

Vector Spaces and Subspaces

  1. Spaces of Vectors
    1. The standard \(n\)-dimensional space \(\textbf{R}^n\) contains all real column vectors with \(n\) components.
      (The components of \(v\) are real numbers, which is the reason for the letter \(\textbf{R}\). A vector whose \(n\) components are complex numbers lies in the space \(\textbf{C}^n\))

    2. If \(v\) and \(w\) are in a vector space \(\bm{S}\) , every combination \(c\bm{v}+d\bm{v}\) must be in \(\bm{S}\) .

    3. \(\textbf{M}\) : The vector space of all real 2 by 2 matrices.
      \(\textbf{F}\) : The vector space of all real functions \(f(x)\) .
      \(\textbf{Z}\) : The vector space that consists only of a zero vector.

    4. A subspace of \(\textbf{R}^n\) is a vector space inside \(\textbf{R}^n\).
      Here is a list of all the possible subspaces of \(\textbf{R}^3\) :

      $$ \boxed{\begin{array}{ll} (\textbf{L}) \text{ Any line through } (0,0,0) & (\textbf{R}^3) \text{ The whole space} \\ (\textbf{P}) \text{ Any plane through } (0,0,0) & (\textbf{Z}) \text{ The single vector } (0,0,0) \end{array}} $$

      A subspace containing \(\bm{v}\) and \(\bm{w}\) must contain all linear combinations \(c\bm{v}+d\bm{w}\)

    5. The column space of \(A\) (called \(\bm{C}(A)\)) contains all combinations of the columns of \(A\) (A subspace of \(\textbf{R}^m\)) .
      Reverse, the column space is “spanned” by the columns of \(A\) .

    6. The column space contains all the vectors \(Ax\) . So \(A\bm{x}=b\) is solvable when \(b\) is in \(\bm{C}(A)\)

  2. The Nullspace of A: Solving \(\bm{Ax=0}\) and \(\bm{Rx=0}\)
    1. The nullspace \(\bm{N}(A)\) in \(\bm{R}^n\) contains all solutions \(x\) to \(A\bm{x}=0\) . This includes \(\bm{x}=0\)

    2. Elimination (from \(A\) to \(R\)) does not change the nullspace: \(\bm{N}(A)=\bm{N}(U)=\bm{N}(R)\)

    3. When \(A\) is rectangular, elimination will not stop at the uppertriangular \(U\) .
      We can continue to make this matrix simpler, in two ways. These steps bring us to the best matrix – reduced row echelon matrix \(R\) :

      1. Produce zeros above the pivots. Use pivot rows to eliminate upward in \(\bm{R}\) .
      2. Produce ones in the pivots. Divide the whole pivot row by its pivot

      The reduced row echelon from \(\bm{R=\textbf{rref}(A)}\) has all pivots \(=1\) , with zeros above and below.

    4. Every free column leads to a special solution.
      If column \(j\) of \(R\) is free (no pivot), there is a “special solution” to \(Ax=0\) with \(x_j=1\) , and the other free variables are 0.

    5. Number of pivots \(=\) number of nonzero rows in \(R=\textbf{rank r}\) . There are \(n-r\) free columns.
      The complete solution to \(Ax=0\) is a combination of the \(n-r\) special solutions.

    6. Every matrix with \(m<n\) has nonzero solutions to \(Ax=0\) in nullspace.

  3. The Complete Solution to \(Ax=b\)
    1. Complete solution to \(A\bm{x}=\bm{b}\) : \(\bm{x}=(\text{one particular solution }\bm{x_p})+(\text{any }\bm{x_n}\text{ in the nullspace})\) .

    2. \(A\bm{x}=\bm{b}\) and \(R\bm{x}=\bm{d}\) are solvable only when all zero rows of \(R\) have zeros in \(\bm{d}\) .

    3. When \(R\bm{x}=\bm{d}\) is solvable, ony very particular solution \(\bm{x_p}\) has all free variables qeual to zero.

    4. The four possibilities for linear equations depend on the rank \(\bm{r}\)

      $$ \begin{array}{llll} r=m\text{ and }r=m & \textit{Square and invertible} & R=\begin{bmatrix} I \end{bmatrix} & Ax=b\text{ has 1 solution} \\ r=m\text{ and }r\lt m & \textit{Short and wide} & R=\begin{bmatrix} I & F \end{bmatrix} & Ax=b\text{ has }\infty\text{ solutions} \\ r\lt m\text{ and }r=n & \textit{Tall and thin} & R=\begin{bmatrix} I \\ 0 \end{bmatrix} & Ax=b\text{ has 0 or 1 solution} \\ r\lt m\text{ and }r\lt n & \textit{Not full rank} & R=\begin{bmatrix} I & F \\ 0 & 0 \end{bmatrix} & Ax=b\text{ has 0 or }\infty\text{ solutions} \end{array} $$

      \(F\) means free matrix.
      Full column rank \(r=n\) means no free variables, its nullspace \(\bm{N}(A)=\) zero vector
      Full row rank \(r=m\) means one solution if \(m=n\) or infinitely many if \(m<n\) , its column space \(\bm{C}(A)\) is \(\textbf{R}^m\) , \(A\bm{x}=\bm{b}\) is always solvable.

  4. Independence, Basis and Dimension
    1. Independent columns of \(A\) : The only solution to \(Ax=0\) is \(x=0\) . The nullspace is \(Z\) .
    2. Independent vectors: The only zero combination \(c_1v_1+\dots+c_kv_k=0\) has all \(c\)'s \(=0\)
    3. A matrix with \(m<n\) has dependnt columns: At least \(n-m\) free variables/special solutions.
    4. The vectors \(v_1,\dots,v_k\) span the space S if \(S=\) all combinations of the \(v\)'s.
    5. The vectors \(v_1,\dots,v_k\) are a basis for S if they are independent and they span \(S\) .
    6. The dimension of a space S is the number of vectors in every basis for S.
    7. If \(A\) is 4 by 4 and invertible, its columns are a basis for \(\textbf{R}^4\) . The dimension of \(\textbf{R}^4\) is 4.
  5. Dimensions of the Four Subspaces
    1. Four Fundamental Subspaces
      1. The row space is \(\bm{C}(A^T)\) , a subspace of \(\textbf{R}^n\) .
      2. The column space is \(\bm{C}(A)\) , a subspace of \(\textbf{R}^m\) .
      3. The nullspace is \(\bm{N}(A)\) , a subspace of \(\textbf{R}^n\) .
      4. The left nullspace is \(\bm{N}(A^T)\) , a subspace of \(\textbf{R}^m\) . This is our new space.
    2. The column space \(C(A)\) and the row space \(C(A^T)\) both have dimension \(r\) (the rank of \(A\))
    3. The nullspace \(\bm{N}(A)\) has dimension \(n-r\) . The left nullspace \(\bm{N}(A^T)\) has dimension \(m-r\) .
    4. Elimination produces bases for the row space and nullspace of \(A\) :
      They are the same as for \(R\) .
      Notice \(\bm{C}(A)\neq\bm{C}(R)\) , elimination often changes the column space and left nullspace (but dimensions don’t change).
    5. Rank one matrices :
      \(\bm{C}(A)\) has basis \(\bm{u}\) , \(\bm{C}(A^T)\) has basis \(v\) , then \(A=\bm{uv}^T=\) column times row.

Orthogonality

  1. Orthogonality of the Four Subspaces
    1. Orhogonal vectors have \(\bm{v}^\mathrm{T}\bm{w}=0\) .
    2. Subspaces \(\bm{V}\) and \(\bm{W}\) are orthogonal when \(\bm{v}^\mathrm{T}\bm{w}=0\) for every \(\bm{v}\) in \(\bm{V}\) and every \(\bm{w}\) in \(\bm{W}\) .
    3. \(\bm{V}\) and \(\bm{W}\) are “orthogonal complements” if \(\bm{W}\) contains all vectors perpendicular to \(\bm{V}\) (and vice versa). Inside \(\bm{R}^n\) , the dimensions of complements \(\bm{V}\) and \(\bm{W}\) add to \(n\) .
    4. The nullspace \(\bm{N}(A)\) and the row space \(\bm{C}(A^\mathrm{T})\) are orthogonal complements, with dimensions \((n-r)+r=n\) . Similarly \(\bm{N}(A^\mathrm{T})\) and \(\bm{C}(A)\) are orthogonal complements with \((m-r)+r=m\) .
    5. Any \(n\) independent vectors in \(\bm{R}^n\) span \(\bm{R}^n\) . Any \(n\) spanning vectors are independent.
  2. Projections
    1. The projection of a vector \(\bm{b}\) onto the line through \(\bm{a}\) is the closest point \(\bm{p=a\frac{a^\mathrm{T}b}{a^\mathrm{T}a}}\)

      Projecting \(\bm{b}\) onto \(\bm{a}\) . The line from \(\bm{b}\) to \(\bm{p}\) is perpendicular to the vector \(\bm{a}\) . This is the dotted line marked \(\bm{e=b-p=b-\hat{x}a}\) .
      \(\bm{a\cdot e=a\cdot(b-\hat{x}a)=0 \rArr a\cdot b-\hat{x}a\cdot a=0 \rArr\boxed{\hat{x}=\frac{a\cdot b}{a\cdot a}=\frac{a^\mathrm{T}b}{a^\mathrm{T}a}}}\)
      Projection matrix \(P\) : \(\boxed{\bm{p=a\hat{x}=a\frac{a^\mathrm{T}b}{a^\mathrm{T}a}=}P\bm{b}}\rArr\boxed{P=\bm{\frac{aa^\mathrm{T}}{a^\mathrm{T}a}}}\)

    2. The projection of \(\bm{b}\) onto a subspace \(\bm{S}\) is the closest vector \(\bm{p}\) in \(\bm{S}\), \(\bm{b}-\bm{p}\) is orthogonal to \(S\)
      The equation \(A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\) leads to \(\hat{\bm{x}}\) and \(\bm{p}=A\hat{\bm{x}}\)

      \(n\) vectors \(a_1,\dots,a_n\) in \(\textbf{R}^m\) are a basis of the subspace.
      The Problem is to find the combination \(p=\hat{x}_1\bm{a}_1+\dots+\hat{x}_n\bm{a}_n\) closest to a given vector \(\bm{b}\) .
      We put these basis vectors into the columns of \(A\) , \(A\) has full rank \(n\) , the subspace is equal to \(\bm{C}(A)\).
      The error vector \(\bm{b-A\hat{x}}\) is perpendicular to the subspace. (See the Figure 4.6 above)
      Then \(\begin{matrix} \bm{a}_1^\mathrm{T}(\bm{b}-A\hat{\bm{x}}))=0 \\ \vdots \\ \bm{a}_n^\mathrm{T}(\bm{b}-A\hat{\bm{x}}))=0 \end{matrix}\rArr\begin{bmatrix} \bm{a}_1^\mathrm{T} \\ \vdots \\ \bm{a}_n^\mathrm{T} \end{bmatrix}(\bm{b}-A\hat{\bm{x}})=\begin{bmatrix} \\ 0 \\ \\ \end{bmatrix}\rArr A^\mathrm{T}(\bm{b}-A\hat{\bm{x}})=0\)

      • Find \(\hat{\bm{x}}\enspace\footnotesize(n\times 1)\): \(\footnotesize A^\mathrm{T}(\bm{b}-A\hat{\bm{x}})=0\rArr A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\rArr\hat{\bm{x}}=(A^\mathrm{T}A)^{-1}A^\mathrm{T}\bm{b}\)
        (The symmetric matrix \(A^\mathrm{T}A\) is \(n\) by \(n\) . It is invertible if the \(\bm{a}\)'s are independent)
      • Find \(\bm{p}\enspace\footnotesize(m\times 1)\): \(\footnotesize\bm{p}=A\hat{\bm{x}}=A(A^\mathrm{T}A)^{-1}A^\mathrm{T}\bm{b}=P\bm{b}\)
      • Find \(\bm{P}\enspace\footnotesize(m\times m)\): \(\footnotesize\bm{p}=P\bm{b}\rArr P=A(A^\mathrm{T}A)^{-1}A^\mathrm{T}\)
    3. The projection matrix \(P=A(A^\mathrm{T}A)^{-1}A^\mathrm{T}\) has \(P^2=P=P^T\)
      (\(P=P^2\) because a second projection doesn’t change the first projection)
  3. Least Squares Approximations
    We cannot always get the error \(\bm{e}=\bm{b}-A\bm{x}\) down to zero.
    When \(\bm{e}\) is zero, \(\bm{x}\) is an exact solution to \(A\bm{x}=\bm{b}\) .
    When the length of \(e\) is as small as possible, \(\hat{\bm{x}}\) is a least squares solution.
    1. The least squares solution \(\hat{\bm{x}}\) comes from the normal equations \(A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\) .
      It minimizes the sum of squares of the errors \(E=||A\bm{x}-\bm{b}||^2\) in the \(m\) equations (\(m>n\)).
    2. To fit \(m\) points by a line \(b=C+Dt\) , the normal qeuations give \(C\) and \(D\) .
      In that case \(A^\mathrm{T}A\) is the 2 by 2 matrix \(\begin{bmatrix} m & \sum t_i \\ \sum t_i & \sum t_i^2 \end{bmatrix}\) and \(A^\mathrm{T}\bm{b}\) is the vector \(\begin{bmatrix} \sum b_i \\ \sum t_ib_i \end{bmatrix}\)

      At times \(t_1,\dots,t_m\) those \(m\) points are at heights \(b_1,\dots,b_m\) .
      \(A\bm{x}=\bm{b}\) is \(\begin{matrix} C+Dt_1=b_1 \\ C+Dt_2=b_2 \\ \vdots \\ C+Dt_m=b_m \end{matrix}\) with \(A=\begin{bmatrix} 1 & t_1 \\ 1 & t_2 \\ \vdots & \vdots \\ 1 & t_m \end{bmatrix}\)
      Solve \(A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\) for \(\hat{\bm{x}}=(C,D)\) :

      • \(A^\mathrm{T}A=\begin{bmatrix} 1 & \dots & 1 \\ t_1 & \dots & t_m \end{bmatrix}\begin{bmatrix} 1 & t_1 \\ \vdots & \vdots \\ 1 & t_m \end{bmatrix}=\begin{bmatrix} m & \sum t_i \\ \sum t_i & \sum t_i^2 \end{bmatrix}\)
      • \(A^\mathrm{T}b=\begin{bmatrix} 1 & \dots & 1 \\ t_1 & \dots & t_m \end{bmatrix}\begin{bmatrix} b_1 \\ \vdots \\ b_m \end{bmatrix}=\begin{bmatrix} \sum b_i \\ \sum t_ib_i \end{bmatrix}\)

      then \(A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\hArr\begin{bmatrix} m & \sum t_i \\ \sum t_i & \sum t_i^2 \end{bmatrix}\begin{bmatrix} C \\ D \end{bmatrix}=\begin{bmatrix} \sum b_i \\ \sum t_ib_i \end{bmatrix}\)

    3. Setting partial derivatives of \(E=||A\bm{x}-\bm{b}||^2\) to zero (\(\frac{\partial E}{\partial x_i}=0\)) also produces \(A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\)
  4. Orthonormal Bases and Gram-Schmidt
    1. If the orthonormal vectors \(\bm{q_1},\dots,\bm{q_2}\) are the columns of \(Q\) , then \(\bm{q_i^\mathrm{T}q_j}=0\) and \(\bm{q_i^\mathrm{T}q_i}=1\) translate into the matrix multiplication \(Q^\mathrm{T}Q=I\) , \(Q\) is called orthogonal matrix

    2. The least squares solution to \(Q\bm{x}=\bm{b}\) is \(\hat{\bm{x}}=Q^\mathrm{T}\bm{b}\) . Projection of \(\bm{b}:\bm{p}=QQ^\mathrm{T}\bm{b}=\bm{q_1}(\bm{q_1}^\mathrm{T}\bm{b})+\dots+\bm{q_n}(\bm{q_n}^\mathrm{T}\bm{b})=P\bm{b}\)

      \(\bm{p}=\begin{bmatrix} | & & | \\ \bm{q_1} & \cdots & \bm{q_n} \\ | & & | \end{bmatrix}\begin{bmatrix} \bm{q_1^\mathrm{T}b} \\ \vdots \\ \bm{q_n^\mathrm{T}b} \end{bmatrix}=\bm{q_1}(\bm{q_1}^\mathrm{T}\bm{b})+\dots+\bm{q_n}(\bm{q_n}^\mathrm{T}\bm{b})\)

      So the projection onto the column space of \(Q\) spanned by the \(\bm{q}\)'s is \(P=QQ^\mathrm{T}\)
      If \(Q\) is square then \(Q^\mathrm{T}=Q^{-1}\) (transpose = inverse), \(P=QQ^\mathrm{T}=I\) and \(\bm{b=p}\)

    3. The Gram-Schmidt process takes independent \(\bm{a_i}\) to orthonormal \(\bm{q_i}\) . Start with \(\bm{q_1}=\frac{a_1}{||a_1||}\)
      \(\bm{q_i}\) is \(\frac{\bm{a_i}-\text{projection }\bm{p_i}}{||\bm{a_i}-\bm{p_i}||}\) ; \(\text{projection }\bm{p_i}=(\bm{a_i}^\mathrm{T}\bm{q_1})\bm{q_1}+\dots+(\bm{a_i}^\mathrm{T}\bm{q_{i-1}})\bm{q_{i-1}}\)

    4. Each \(a_i\) will be a combination of \(\bm{q_1}\) to \(\bm{q_i}\) , Then \(\bm{A=QR}\) : orthogonal \(Q\) and triangular \(R\)

      \(\bm{A=QR}\hArr\begin{bmatrix} | & | & | \\ \bm{a} & \bm{b} & \bm{c} \\ | & | & |\end{bmatrix}=\begin{bmatrix} | & | & | \\ \bm{q_1} & \bm{q_2} & \bm{q_3} \\ | & | & | \end{bmatrix}\begin{bmatrix} \bm{q_1^\mathrm{T}a} & \bm{q_1^\mathrm{T}b} & \bm{q_1^\mathrm{T}c} \\ & \bm{q_2^\mathrm{T}b} & \bm{q_2^\mathrm{T}c} \\ & & \bm{q_3^\mathrm{T}c} \end{bmatrix}\)
      Multiply by \(Q^T\) to recognize \(\bm{R=Q^\mathrm{T}A}\) above.
      We must not forget why this is useful for least squares: \(\bm{A^\mathrm{T}A=(QR)^\mathrm{T}QR=R^\mathrm{T}Q^\mathrm{T}QR=R^\mathrm{T}R}\) . The least squares equation \(A^\mathrm{T}A\hat{\bm{x}}=A^\mathrm{T}\bm{b}\) simplifies to \(R^\mathrm{T}R\hat{\bm{x}}=R^\mathrm{T}Q^\mathrm{T}\bm{b}\rArr R\hat{\bm{x}}=Q^\mathrm{T}\bm{b}\)

Determinants

The determinant of \(A=\begin{vmatrix} a & b \\ c & d \end{vmatrix}\) is \(ad-bc\)

  1. The Properties of Determinants
    1. The determinant of the \(\bm{n}\) by \(\bm{n}\) identity matrix is 1
      \(\begin{vmatrix} 1 & 0 \\ 0 & 1\end{vmatrix}=1\) and \(\begin{vmatrix} 1 & & \\ & \ddots & \\ & & 1 \end{vmatrix}=1\)

    2. The determinant changes sign when two rows are exchanged (sign reversal):
      Check: \(\begin{vmatrix} a & b \\ c & d \end{vmatrix}=-\begin{vmatrix} c & d \\ a & b \end{vmatrix}\)
      Because of this rule, we can find \(\operatorname{det} P\) for any permutation matrix. Just exchange rows of \(I\) until you reach \(P\) . Then \(\operatorname{det} P=+1\) for an even number of row exchanges and \(\operatorname{det} P=-1\) for an *odd number.

    3. The determinant is a linear function of each row separately (all other rows stay fixed)
      multiply row 1 by any number \(\bm{t}\) , \(\operatorname{det}\) is multiplied by \(\bm{t}\): \(\begin{vmatrix} ta & tb \\ c & d \end{vmatrix}=t\begin{vmatrix} a & b \\ c & d \end{vmatrix}\)
      add row 1 of \(A\) to row 1 of \(A^\prime\) , then determinants add: \(\begin{vmatrix} a+a^\prime & b+b^\prime \\ c & d \end{vmatrix}=\begin{vmatrix} a & b \\ c & d \end{vmatrix}+\begin{vmatrix} a^\prime & b^\prime \\ c & d \end{vmatrix}\)

    4. If two rows of \(A\) are equal, then \(\operatorname{\textbf{det}}\bm{A=0}\)
      Check 2 by 2: \(\begin{vmatrix} a & b \\ a & b \end{vmatrix}=0\)

      Rule 4 follows from rule 2. Exchange the two equal rows. The determinant \(D\) is supposed to change sign. But also \(D\) has to stay the same, because the matrix is not changed. The only number with \(-D=D\) is \(D=0\)

    5. Subtracting a multiple of one row from another row leaves \(\operatorname{\textbf{det}}A\) unchanged.
      \(\ell\) times row 1 from row 2: \(\begin{vmatrix} a & b \\ c-\ell a & d-\ell b \end{vmatrix}=\begin{vmatrix} a & b \\ c & d \end{vmatrix}+\begin{vmatrix} a & b \\ -\ell a & -\ell b \end{vmatrix}=\begin{vmatrix} a & b \\ c & d \end{vmatrix}\) (Rule 5 follows from rule 3 and rule 4)
      The determinant is not changed by the usual elimination steps from \(A\) to \(U\) . But every row exchange reverses the sign, so always \(\operatorname{det} A=\pm\operatorname{det}U\) . Rule 5 has narrowed the problem to triangular matrics.

    6. A Matrix with a row of zeros has \(\operatorname{\textbf{det}}A=0\)
      Rule 6 follows from rule 5 and rule 4: \(\begin{vmatrix} a & b \\ 0 & 0\end{vmatrix}=\begin{vmatrix} a & b \\ a & b \end{vmatrix}=0\)

    7. If \(\bm{A}\) is triangular then \(\operatorname{\textbf{det}}\bm{A=a_{11}a_{22}\cdots a_{nn}=}\) product of diagonal entries
      Rule 7 follows rule 5, rule 3 and rule 1: \(\begin{vmatrix} a & b \\ 0 & d \end{vmatrix}=\begin{vmatrix} a & 0 \\ 0 & d \end{vmatrix}=ad\begin{vmatrix} 1 & 0 \\ 0 & 1 \end{vmatrix}=ad\)

    8. If \(\bm{A}\) is singular then \(\operatorname{\textbf{det}}\bm{A=0}\) . If \(A\) is invertible then \(\operatorname{\textbf{det}}\bm{A\neq 0}\)
      Proof Elimination goes from \(A\) to \(U\) . If \(A\) is singular then \(U\) has a zero row. The rules give \(\operatorname{det}A=\operatorname{det}U=0\) . If \(A\) is invertible then \(U\) has the pivots along its diagonal. The product of nonzero pivots (using rule 7) gives a nonzero determinant

    9. The determinant of \(\bm{AB}\) is \(\operatorname{\textbf{det}}\bm{A}\) times \(\operatorname{\textbf{det}}\bm{B}\) : \(|AB|=|A||B|\)
      For the \(n\) by \(n\) case, here is a snappy proof. When \(|B|\) is not zero, consider the ratio \(D(A)=\frac{|AB|}{|B|}\) . Check that this ratio \(D(A)\) has properties 1,2,3. Then \(D(A)\) has to be the \(\operatorname{det}A\) and we have \(\frac{|AB|}{|B|}=|A|\)

      • Property 1 (Determinant of \(I\)) If \(A=I\) then the ratio \(D(A)\) becomes \(\frac{|B|}{|B|}=1\)
      • Property 2 (Sign reversal) When two rows of \(A\) are exchanged, so are the same two rows of \(AB\) . Therefore \(|AB|\) changes sign and so does the ratio \(\frac{|AB|}{|B|}\)
      • Property 3 (Linearity)
        When row 1 of \(A\) is multiplied by \(t\) , so is row 1 of \(AB\) . This multiplies the determinant \(|AB|\) by \(t\) . So the ratio \(\frac{|AB|}{|B|}\) is multiplied by \(t\)
        Add row 1 of \(A\) to row 1 of \(A^\prime\) . Then row 1 of \(AB\) adds to row 1 of \(A^\prime B\) . By rule 3, determinants add. After dividing by \(|B|\) , the ratios add–as desired.

      Conclusion This ratio \(\frac{|AB|}{|B|}\) has the same three properties that define \(|A|\) . Therefore it equals \(|A|\) . This proves the product rule \(|AB|=|A||B|\) . The case \(|B|=0\) is separate and easy, because \(AB\) is singular when \(B\) is singular. Then \(|AB|=|A||B|\) is \(0=0\)

    10. The transpose \(A^\mathrm{T}\) has the same determinant as \(A\)
      The proof of \(|A|=|A^\mathrm{T}|\) comes by using rule 9 for products:
      Compare \(\operatorname{det}P\operatorname{det}A=\operatorname{det}L\operatorname{det}U\) with \(\operatorname{det}A^\mathrm{T}\operatorname{det}P^\mathrm{T}=\operatorname{det}U^\mathrm{T}\operatorname{det}L^\mathrm{T}\)
      First, \(\operatorname{det}L=\operatorname{det}L^\mathrm{T}=1\) (both have 1’s on the diagonal). Second, \(\operatorname{det}U=\operatorname{det}U^\mathrm{T}\) (those triangular matrices have the same diagonal). Third, \(\operatorname{det}P=\operatorname{det}P^\mathrm{T}\) (permutations have \(P^\mathrm{T}P=I\) , so \(|P^\mathrm{T}||P|=1\) by rule 9; thus \(|P|\) and \(|P^\mathrm{T}|\) both equal 1 or both eual -1). So \(L\), \(U\), \(P\) have the same determinants as \(L^\mathrm{T}\), \(U^\mathrm{T}\), \(P^\mathrm{T}\) and this leaves \(\operatorname{det}A=\operatorname{det}A^\mathrm{T}\)

  2. Permutations and Cofactors
    1. The Pivot Formula: \((\operatorname{det}P)(\operatorname{det}A)=(\operatorname{det}L)(\operatorname{det}U)\) gives \(\operatorname{det}A=\pm(d_1d_2\cdots d_n)\)
    2. The Big Formula:
      \(\operatorname{det}A=\) sum over all \(n!\) column permutations \(P=(\alpha,\beta,\dots,\omega)=\sum(\operatorname{det}P)a_{1\alpha}a_{2\beta}\cdots a_{n\omega}\)

      A order 2 determinant can splits into 4 simpler determinants (by rule 3):
      \(\begin{vmatrix} a & b \\ c & d \end{vmatrix}=\begin{vmatrix} a & 0 \\ c & d \end{vmatrix}+\begin{vmatrix} 0 & b \\ c & d \end{vmatrix}=\begin{vmatrix} a & 0 \\ c & 0 \end{vmatrix}+\begin{vmatrix} a & 0 \\ 0 & d \end{vmatrix}+\begin{vmatrix} 0 & b \\ c & 0 \end{vmatrix}+\begin{vmatrix} 0 & b \\ 0 & d \end{vmatrix}\)
      Try \(n=3\) . Each row splits into 3 simpler rows like \(\begin{bmatrix} a_{11} & 0 & 0 \end{bmatrix}\) . Using linearity in each row, \(\operatorname{det}A\) splits into \(3^3=27\) simple determinants. If a column choice is repeated–for example if we also choose the row \(\begin{vmatrix} a_{11} & 0 & 0 \\ a_{21} & 0 & 0 \\ 0 & a_{32} & 0 \end{vmatrix}=\begin{vmatrix} a_{11} & 0 & 0 \\ 0 & 0 & 0 \\ 0 & a_{32} & 0 \end{vmatrix}\) , then the simple determinant is zero.
      We pay attention only when the entries \(\bm{a_{ij}}\) come from different columns, like col 3, 1, 2:

      $$ \begin{array}{rl} \begin{vmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{vmatrix}= & \begin{vmatrix} a_{11} & & \\ & a_{22} & \\ & & a_{33} \end{vmatrix}+\begin{vmatrix} & a_{21} & \\ & & a_{23} \\ a_{31} & & \end{vmatrix}+\begin{vmatrix} & & a_{13} \\ a_{21} & & \\ & a_{32} & \end{vmatrix} \\ & +\begin{vmatrix} a_{11} & & \\\ & & a_{23} \\\ & a_{32} & \end{vmatrix}+\begin{vmatrix} & a_{12} & \\ a_{21} & & \\ & & a_{33} \end{vmatrix}+\begin{vmatrix} & & a_{13} \\ & a_{22} & \\ a_{31} & & \end{vmatrix} \\ = & a_{11}a_{22}a_{33}\begin{vmatrix} 1 & & \\ & 1 & \\ & & 1 \end{vmatrix}+a_{12}a_{23}a_{31}\begin{vmatrix} & 1 & \\ & & 1 \\ 1 & & \end{vmatrix}+a_{13}a_{21}a_{32}\begin{vmatrix} & & 1 \\ 1 & & \\ & 1 & \end{vmatrix} \\ & +a_{11}a_{23}a_{32}\begin{vmatrix} 1 & & \\ & & 1 \\ & 1 & \end{vmatrix}+a_{12}a_{21}a_{33}\begin{vmatrix} & 1 & \\ 1 & & \\ & & 1 \end{vmatrix}+a_{13}a_{22}a_{31}\begin{vmatrix} & & 1 \\ & 1 & \\ 1 & & \end{vmatrix} \end{array} $$

      The formula has \(n!\) terms. For \(n=3\) there are \(3!=6\) terms.
      The last three are odd permutations(one exchange). The first three are even permutations(0 or 2 exchanges).
      Now you can see the \(n\) by \(n\) formula. There are \(n!\) orderings of the columns. The columns (\(1, 2, \dots, n\)) go in each possible order (\(\alpha, \beta, \dots, \omega\)) . Taking \(a_{1\alpha}\) from row 1 and \(a_{2\beta}\) from row 2 and eventually \(a_{n\omega}\) from row \(n\) , the determinant contains the product \(a_{1\alpha}a_{2\beta}\cdots a_{n\omega}\) time +1 or -1. Half the column orderings have sign -1.

    3. Determinant by Cofactors:
      The determinant is the dot product of any row \(i\) of \(A\) with its cofactors: \(\operatorname{det}A=a_{i1}C_{i1}+a_{i2}C_{i2}+\cdots+a_{in}C_{in}\)
      Each cofactor \(C_{ij}\) (order \(n-1\) , without row \(i\) and column \(j\)) includes its correct sign: \(C_{ij}=(-1)^{i+j}\operatorname{det}M_{ij}\)

      Split an order 3 determinant into 3 simpler determinants (by rule 3 and rule 5):

      $$ \begin{array}{rl} \begin{vmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{vmatrix}= & \begin{vmatrix} a_{11} & & \\ & a_{22} & a_{23} \\ & a_{32} & a_{33} \end{vmatrix}+\begin{vmatrix} & a_{12} & \\ a_{21} & & a_{23} \\ a_{31} & & a_{33} \end{vmatrix}+\begin{vmatrix} & & a_{13} \\ a_{21} & a_{22} & \\ a_{31} & a_{32} & \end{vmatrix} \\ = & a_{11}\begin{vmatrix} 1 & & \\ & a_{22} & a_{23} \\ & & a_{33}-\frac{a_{32}}{a_{22}}a_{23} \end{vmatrix}-a_{12}\begin{vmatrix} a_{21} & & a_{23} \\ & 1 & \\ & & a_{33}-\frac{a_{31}}{a_{21}}a_{23} \end{vmatrix}+a_{13}\begin{vmatrix} a_{21} & a_{22} & \\ & a_{32}-\frac{a_{31}}{a_{21}}a_{22} \\ & & 1 \end{vmatrix} \\ = & a_{11}(a_{22}a_{33}-a_{23}a_{32})-a_{12}(a_{21}a_{33}-a_{23}a_{31})+a_{13}(a_{21}a_{32}-a_{22}a_{31}) \\ = & a_{11}\begin{vmatrix} a_{22} & a_{23} \\ a_{32} & a_{33} \end{vmatrix}-a_{12}\begin{vmatrix} a_{21} & a_{23} \\ a_{31} & a_{33} \end{vmatrix}+a_{13}\begin{vmatrix} a_{21} & a_{22} \\ a_{31} & a_{32} \end{vmatrix} \\ = & a_{11}M_{11}-a_{12}M_{12}+a_{13}M_{13} \end{array} $$

      Whatever is possible for row 1 is possible for row \(i\) . The sign \((-1)^{i+j}\) multiplies the determinant of \(M_{ij}\) to give \(C_{ij}\)

  3. Cramer’s Rule, Inverses, and Volumes
    1. Cramer’s Rule computes \(\bm{x}=A^{-1}\bm{b}\) from \(x_j=\frac{\operatorname{det}(A\text{ with column }j\text{ changed to }\bm{b})}{\operatorname{det}A}\)

      Replacing the first column of \(I\) by \(\bm{x}\) gives a matrix with determinant \(x_1\) . When you multiply it by \(A\) , the first column becomes \(A\bm{x}\) which is \(\bm{b}\) . The other columns of \(B_1\) are copied from \(A\) :
      Key idea \(A\begin{bmatrix} \bm{x_1} & 0 & 0 \\ \bm{x_2} & 1 & 0 \\ \bm{x_3} & 0 & 1 \end{bmatrix}=\begin{bmatrix} \bm{b_1} & a_{12} & a_{13} \\ \bm{b_2} & a_{22} & a_{23} \\ \bm{b_3} & a_{32} & a_{33} \end{bmatrix}=B_1\) .
      According determinant rule 9 we can get \((\operatorname{det}A)(x_1)=\operatorname{det}B_1\) and then \(x_1=\frac{\operatorname{det}B_1}{\operatorname{det}A}\)
      To find \(x_2\) and \(B_2\) , put the vectors \(\bm{x}\) and \(\bm{b}\) into the second columns of \(I\) and \(A\) (and vice versa for \(x_3\))

    2. The \(i\), \(j\) entry of \(A^{-1}\) is the cofactor \(C_{ji}\) (not \(C_{ij}\)) divided by \(\operatorname{det}A\) :
      \((A^{-1})_{ij}=\frac{C_{ji}}{\operatorname{det}A}\) and \(A^{-1}=\frac{C^\mathrm{T}}{\operatorname{det}A}\)

      Direct proof of the formula \(A^{-1}=\frac{C^\mathrm{T}}{\operatorname{det}A}\) , this means \(AC^\mathrm{T}=(\operatorname{det}A)I\) :
      \(\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix}\begin{bmatrix} C_{11} & C_{21} & C_{31} \\ C_{12} & C_{22} & C_{32} \\ C_{13} & C_{23} & C_{33} \end{bmatrix}=\begin{bmatrix} \operatorname{det}A & 0 & 0 \\ 0 & \operatorname{det}A & 0 \\ 0 & 0 & \operatorname{det}A \end{bmatrix}\)
      But how to explain the zeros off the main diagonal? The rows of \(A\) are multiplying cofactors from different rows. Why is the answer zero?
      e.g. Row 2 of \(A\) with Row 1 of \(C\) : \(a_{21}C_{11}+a_{22}C_{12}+a_{23}C_{13}\)
      Answer: \(a_{21}C_{11}+a_{22}C_{12}+a_{23}C_{13}=\begin{vmatrix} a_{21} & a_{22} & a_{23} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{vmatrix}\)
      The new matrix has two equal rows, so its determinant is zero.

    3. Area of triangle: if the three corners are \((x_1, y_1)\) , \((x_2, y_2)\) , \((x_3, y_3)\) , then its area=\(\frac{1}{2}|\begin{vmatrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ x_3 & y_3 & 1 \end{vmatrix}|\)
      Area of papallelogram: if the three of corners are \((0, 0)\) , \((a, b)\) , \((c, d)\) , and \((a+c, b+d)\) , then its area \(=|\begin{vmatrix} a & b \\ c & d \end{vmatrix}|=|ad-bc|\)
    4. Volume of box \(=\operatorname{det}A\) if the rows of \(A\) (or the columns of \(A\)) give the sides of the box.
    5. The cross product \(\bm{w}=\bm{u}\times\bm{v}\) is \(\operatorname{det}\begin{bmatrix} \bm{i} & \bm{j} & \bm{k} \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{bmatrix}\)
      \(\bm{u}\times\bm{v}=-(\bm{v}\times\bm{u})\)
      the new vector \(\bm{u}\times\bm{v}\) is perpendicular to \(u\) and \(v\) :

      for \(n=3\):
      \(\begin{array}{rl} u\cdot(u\times v) & =u_1(u_2v_3-u_3v_2)+u_2(u_3v_1-u_1v_3)+u_3(u_1v_2-u_2v_1) \\ & =u_1w_1+u_2w_2+u_3w_3 \\ & =\bm{u}^\mathrm{T}\bm{w}=0 \end{array}\)
      \(w_1\) , \(w_2\) , \(w_3\) are cofactors of row 1
      vice versa \(\bm{v}^\mathrm{T}\bm{w}=0\)

Eigenvalues and Eigenvectors

  1. Introduction to Eigenvalues
    1. \(A\bm{x}=\lambda\bm{x}\) says that eigenvectors \(\bm{x}\) keep the same direction when multiplied by \(A\).
      It also says that \((A-\lambda I)\bm{x}=0\) and \(\operatorname{det}(A-\lambda I)=0\) . This determines \(n\) eigenvalues (each eigenvalue \(\lambda\) finds an eigenvector \(x\)).
      To solve the eigenvalue problem for an \(n\) by \(n\) matrix, follow these steps:
      1. Compute the \(\operatorname{det}(A-\lambda I)=0\).
        With \(\lambda\) subtracted along the diagonal, this determinant starts with \(\lambda^n\) or \(-\lambda^n\) . It is a polynomial in \(\lambda\) of degree \(n\) .
      2. Find the roots of this polynominal
        by solving \(\operatorname{det}(A-\lambda I)=0\) . The \(n\) roots are the \(n\) eigenvalues of \(A\) . They make \(A-\lambda I\) singular.
      3. For each eigenvalue \(\lambda\) , solve \((A-\lambda I)\bm{x}=0\) to find an eigenvector \(\bm{x}\).
    2. If \(A\bm{x}=\lambda{x}\) then \(A^2\bm{x}=\lambda^2\bm{x}\) and \(A^{-1}\bm{x}=\lambda^{-1}\bm{x}\) and \((A+cI)\bm{x}=(\lambda+c)\bm{x}\)
    3. The sum of the \(\lambda\)'s equals the sum down the main diagonal of \(A\) (the trace);
      The product of the \(\lambda\)'s equals the determinant of \(A\):
      Check \(\lambda\)'s by \(\footnotesize\operatorname{det}A=\lambda_1\lambda_2\cdots\lambda_n\) and diagonal sum \(\footnotesize a_{11}+a_{22}+\dots+a_{nn}=\text{sum of }\lambda\text{'s}\)
    4. Projections \(P\), reflections \(R\), 90° rotations \(Q\) have special eigenvalues
      The projection matrix \(\footnotesize P=\begin{bmatrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{bmatrix}\) has eigenvalues 1 and 0
      The reflection matrix \(\footnotesize R=\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\) has eigenvalues 1 and -1
      The 90° rotation \(\footnotesize Q=\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\) has eigenvalues \(i\) and \(-i\)
    5. \(A\) and \(B\) share the same \(n\) independent eigenvectors if and only if \(AB=BA\) , then we do have \(AB\bm{x}=\lambda\beta\bm{x}\) and \(BA\bm{x}=\lambda\beta\bm{x}\) (\(\beta\) is an eigenvalue of \(B\))
  2. Diagonalizing a Matrix
    1. If \(A\) has \(n\) independent eigenvectors \(x_1,\dots,x_n\) , they go into the columns of \(X\)
      \(\bm{A}\) is diagonalized into eigenvalue matrix \(\Lambda\) by \(\bm{X}\) : \(X^{-1}AX=\Lambda\) and \(A=X\Lambda X^{-1}\)

      \(X^{-1}AX=\Lambda\) and \(A=X\Lambda X^{-1}\) means \(AX=X\Lambda\)
      The first column of \(AX\) is \(A\bm{x_1}=\lambda_1x_1\) . Each column of \(X\) is multiplied by its eigenvalue:
      \(AX=A\begin{bmatrix} & & \\ \bm{x_1} & \cdots & \bm{x_n} \\ & & \end{bmatrix}=\begin{bmatrix} & & \\ \lambda_1\bm{x_1} & \cdots & \lambda_n\bm{x_n} \\ & & \end{bmatrix}\)
      The trick is to split this matrix \(AX\) into \(X\) times \(\Lambda\) :
      \(\begin{bmatrix} & & \\ \lambda_1\bm{x_1} & \cdots & \lambda_n\bm{x_n} \\ & & \end{bmatrix}=\begin{bmatrix} & & \\ \bm{x_1} & \cdots & \bm{x_n} \\ & & \end{bmatrix}\begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix}=X\Lambda\)

    2. The eigenvector matrix \(X\) also diagonalizes all powers \(A^k\) : \(A^k=X\Lambda^kX^{-1}\)
    3. The solution to \(\bm{u}_{k+1}=A\bm{u}_k\) starting from \(\bm{u}_0\) is \(\bm{u}_k=A^k\bm{u}_0=X\Lambda^kX^{-1}\bm{u}_0\) :
      \(\bm{u}_k=c_1(\lambda_1)^k\bm{x}_1+\cdots+c_n(\lambda_n)^k\bm{x}_n\) provided by \(\bm{u}_0=c_1\bm{x}_1+\cdots+c_n\bm{x}_n=X\bm{c}\)
    4. \(A\) is diagonalizable if every eigenvalue has enough eigenvectors (GM = AM).
      Always GM \(\leqslant\) AM for each \(\bm{\lambda}\)
      • Geometric Multiplicity = GM: Count the independent eigenvectors for \(\lambda\) . Then GM is the dimension of the nullspace of \(A-\lambda I\)
      • Algebraic Multiplicity = AM: Count the repetitions of \(\bm{\lambda}\) among the eigenvalues. Look at the \(n\) roots of \(\operatorname{det}(A-\lambda I)=0\)

      for example if \(A\) has \(\lambda=4, 4, 4\) , then that eigenvalue has \(\text{AM}=3\) and \(\text{GM}=1, 2, \text{or }3\)
      The shortage of eigenvectors when GM is below AM means that A is not diagonalizable.

  3. Systems of Differential Equations
  4. Symmetric Matrices
  5. Positive Definite Matrices
Author

Ndoskrnl

Posted on

2021-02-24

Updated on

2021-12-05

Licensed under