Linear Algebra

Notice that by Definition 1, a subspace is simply an affine set containing the origin. Also notice that the dimension of an affine set A\mathcal{A} is the same as the dimension of V\mathcal{V}.

Norms

In the limit as pp\to\infty,

x=maxkxk.\|\mathbf{x}\|_{\infty} = \max_k|x_k|.

Similar to vectors, matrices can also have norms.

One useful way to characterize matrices is by measuring their “gain” relative to some lpl_p norm.

When p=2p=2, the norm is called the spectral norm because it relates to the largest eigenvalue of ATAA^TA.

A2=λmax(ATA)\|A\|_2 = \sqrt{\lambda_{max}(A^TA)}

Inner Products

Inner products induce a norm x=x,x\|\mathbf{x}\| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle}. In Rn\mathbb{R}^n, the standard inner product is xTy\mathbf{x}^T\mathbf{y}. The angle bewteen two vectors is given by

cosθ=xTyx2y2.\cos\theta = \frac{\mathbf{x}^T\mathbf{y}}{\|\mathbf{x}\|_2\|\mathbf{y}\|_2}.

In general, we can bound the absolute value of the standard inner product between two vectors.

Notice that for p=q=2p=q = 2, Theorem 1 turns into the Cauchy-Schwartz Inequality (xTyx2y2|\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\|_2\|\mathbf{y}\|_2).

Functions

We consider functions to be of the form f:RnRf:\mathbb{R}^n\rightarrow\mathbb{R}. By contrast, a map is of the form f:RnRmf:\mathbb{R}^n\rightarrow\mathbb{R}^m. The components of the map ff are the scalar valued functions fif_i that produce each component of a map.

When a polyhedron is bounded, it is called a polytope.

Types of Functions

An affine function is linear when b=0b=0. A hyperplane is simply a level set of a linear function.

Another special class of functions are polyhedral functions.

Vector Calculus

We can also do calculus with vector functions.

The gradient is perpendicular to the level sets of ff and points from a point x0\mathbf{x}_0 to higher values of the function. In other words, it is the direction of steepest increase. It is akin to the derivative of a 1D function.

The Hessian is akin to the second derivative in a 1D function. Note that the Hessian is a symmetric matrix.

Matrices

Matrices define a linear map between an input space and an output space. Any linear map f:RnRmf: \mathbb{R}^n \to \mathbb{R}^m can be represented by a matrix.

Symmetric Matrices

Recall that a symmetric matrix is one where A=ATA = A^T.

Let λmin(A)\lambda_{min}(A) be the smallest eigenvalue of symmetric matrix AA and λmax(A)\lambda_{max}(A) be the largest eigenvalue.

Two special types of symmetric matrices are those with non-negative eigenvalues.

These matrices are important because they often have very clear geometric structures. For example, an ellipsoid in multi-dimensional space can be defined as the set of points

E={xRm: xTP1x1}\mathcal{E} = \{ x\in\mathbb{R}^m : \ \mathbf{x}^T P^{-1} \mathbf{x} \leq 1 \}

where PP is a positive definite matrix. The eigenvectors of PP give the principle axes of this ellipse, and λ\sqrt{\lambda} are the semi-axis lengths.

QR Factorization

Similar to how spectral theorem allows us to decompose symmetric matrices, QR factorization is another matrix decomposition technique that works for any general matrix.

An easy way to find the QR factorization of a matrix is to apply Graham Schmidt to the columns of the matrix and express the result in matrix form. Suppose that our matrix AA is full rank (i.e its columns ai\mathbf{a}_i are linearly independent) and we have applied Graham-Schmidt to columns ai+1an\mathbf{a}_{i+1}\cdots\mathbf{a}_n to get orthogonal vectors qi+1qn\mathbf{q}_{i+1}\cdots\mathbf{q}_{n}. Continuing the procedure, the ith orthogonal vector qi\mathbf{q}_i is

q~i=aik=i+1n(qkTak)qkqi=q~iq~i2.\mathbf{\tilde{q}}_i = \mathbf{a}_i - \sum_{k=i+1}^{n} (\mathbf{q}_k^T \mathbf{a}_k)\mathbf{q}_k \qquad \mathbf{q}_i = \frac{\mathbf{\tilde{q}}_i}{\|\mathbf{\tilde{q}}_i\|_2}.

If we re-arrange this, to solve for ai\mathbf{a}_i, we see that

ai=q~i2qi+k=i+1n(qkTak)qk.\mathbf{a}_i = \|\mathbf{\tilde{q}}_i\|_2 \mathbf{q}_i + \sum_{k=i+1}^{n} (\mathbf{q}_k^T \mathbf{a}_k)\mathbf{q}_k.

Putting this in matrix form, we can see that

[a1a2an]=[q1q2qn][r11r12r1n0r22r2n00rnn]rij=aiTqj,rii=q~i2.\begin{bmatrix} | & | & & | \\ \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_{n}\\ | & | & & | \\ \end{bmatrix} = \begin{bmatrix} | & | & & | \\ \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_{n}\\ | & | & & | \\ \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n}\\ 0 & r_{22} & \cdots & r_{2n}\\ \vdots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & r_{nn} \end{bmatrix} \qquad r_{ij} = \mathbf{a}_i^T\mathbf{q_j}, r_{ii} = \|\mathbf{\tilde{q}}_i\|_2.

Singular Value Decomposition

A dyad is a rank-one matrix. It turns out that all matrices can be decomposed into a sum of dyads.

Th singular values are ordered such that σ1>=σ2>=\sigma_1 >= \sigma_2 >= \cdots. The left singular values are the eigenvectors of AATAA^T and the right singular values are the eigenvectors of ATAA^TA. The singular values are λi\sqrt{\lambda}_i where λi\lambda_i are the eigenvalues of ATAA^TA. Since AATAA^T and ATAA^TA are symmetric, ui\mathbf{u}_i and vi\mathbf{v}_i are orthogonal. The number of non-zero singular values is equal to the rank of the matrix. We can write the SVD in matrix form as

A=[UrUnr]diag(σ1,,σr,0,,0)[VrTVnrT]A = \left[U_r\quad U_{n-r}\right]\text{diag}(\sigma_1,\cdots,\sigma_r,0,\cdots,0)\begin{bmatrix}V^T_r\\V^T_{n-r}\end{bmatrix}

Writing the SVD tells us that

  1. VnrV_{n-r} forms a basis for N(A)\mathcal{N}(A)

  2. UrU_{r} form a basis for R(A)\mathcal{R}(A)

The Frobenius norm and spectral norm are tightly related to the SVD.

AF=iσi2\|A\|_F = \sum_{i}\sigma_i^2

A22=σ12\|A\|_2^2 = \sigma_1^2

Last updated

Was this helpful?