In general, we can bound the absolute value of the standard inner product between two vectors.
Theorem 1 (Holder Inequality)
Functions
Definition 8
Definition 9
Definition 10
Definition 11
Definition 12
The half-spaces are the regions of space which a hyper-plane separates.
Definition 13
When a polyhedron is bounded, it is called a polytope.
Types of Functions
Theorem 2
Theorem 3
Any quadratic function can be written as the sum of a quadratic term involving a symmetric matrix and an affine term:
Another special class of functions are polyhedral functions.
Definition 14
Vector Calculus
We can also do calculus with vector functions.
Definition 15
Definition 16
The Hessian is akin to the second derivative in a 1D function. Note that the Hessian is a symmetric matrix.
Matrices
Theorem 4 (Fundamental Theorem of Linear Algebra)
Symmetric Matrices
Theorem 5 (Spectral Theorem)
Any symmetric matrix is orthogonally similar to a real diagonal matrix.
Definition 17
Theorem 6
Two special types of symmetric matrices are those with non-negative eigenvalues.
Definition 18
Definition 19
These matrices are important because they often have very clear geometric structures. For example, an ellipsoid in multi-dimensional space can be defined as the set of points
QR Factorization
Similar to how spectral theorem allows us to decompose symmetric matrices, QR factorization is another matrix decomposition technique that works for any general matrix.
Definition 20
Putting this in matrix form, we can see that
Singular Value Decomposition
Definition 21
A dyad is a rank-one matrix. It turns out that all matrices can be decomposed into a sum of dyads.
Definition 22
Writing the SVD tells us that
The Frobenius norm and spectral norm are tightly related to the SVD.
An affine set is one of the form A={x∈X:x=v+x0,v∈V} where V is a subspace of a vector space X and x0is a given point.
Notice that by Definition 1, a subspace is simply an affine set containing the origin. Also notice that the dimension of an affine set A is the same as the dimension of V.
A norm on the vector space X is a function ∥⋅∥:X→R which satisfies:
∥x∥≥0 with equality if and only if x=0
∥x+y∥≤∥x∥+∥y∥
∥αx∥=∣α∣∥x∥ for any scalar α.
The lp norms are defined by
∥x∥p=(∑k=1n∣xk∣p)p1,1≤p≤∞
In the limit as p→∞,
∥x∥∞=maxk∣xk∣.
A function f:Rm×n→R is a matrix norm if
f(A)≥0f(A)=0⇔A=0f(αA)=∣α∣f(A)f(A+B)≤f(A)+f(B)
The Froebenius norm is the l2 norm applied to all elements of the matrix.
∥A∥F=traceAAT=∑i=1m∑j=1n∣aij∣2
One useful way to characterize matrices is by measuring their “gain” relative to some lp norm.
∥A∥p=maxu=0∥u∥p∥Au∥p
When p=2, the norm is called the spectral norm because it relates to the largest eigenvalue of ATA.
∥A∥2=λmax(ATA)
An inner product on real vector space is a function that maps x,y∈X to a non-negative scalar, is distributive, is commutative, and ⟨x,x,⟩=0⇔x=0.
Inner products induce a norm ∥x∥=⟨x,x⟩. In Rn, the standard inner product is xTy. The angle bewteen two vectors is given by
Notice that for p=q=2, Theorem 1 turns into the Cauchy-Schwartz Inequality (∣xTy∣≤∥x∥2∥y∥2).
We consider functions to be of the form f:Rn→R. By contrast, a map is of the form f:Rn→Rm. The components of the map f are the scalar valued functions fi that produce each component of a map.
The graph of a function f is the set of input-output pairs that f can attain.
{(x,f(x))∈Rn+1:x∈Rn}
The epigraph of a function is the set of input-output pairs that f can achieve and anything above.
{(x,t)∈Rn+1:x∈Rn+1,t≥f(x)}
The t-level set is the set of points that achieve exactly some value of f.
{x∈Rn:f(x)=t}
The t-sublevel set of f is the set of points achieving at most a value t.
{x∈Rn:f(x)≤t}
H_={x:aTx≤b}H+={x:aTx>b}
A polyhedron is the intersection of m half-spaces given by aiTx≤bi for i∈[1,m].
A function is linear if and only if it can be expressed as f(x)=aTx+b for some unique pair (a,b).
An affine function is linear when b=0. A hyperplane is simply a level set of a linear function.
q(x)=21xTHx+cTx+d.
A function f:Rn→R is polyhedral if its epigraph is a polyhedron.
epi f={(x,t)∈Rn+1:C[xt]≤d}
The gradient of a function at a point x where f is differentiable is a column vector of first derivatives of f with respsect to the components of x
ablaf(x)=∂x1∂f⋮∂xn∂f
The gradient is perpendicular to the level sets of f and points from a point x0 to higher values of the function. In other words, it is the direction of steepest increase. It is akin to the derivative of a 1D function.
The Hessian of a function f at point x is a matrix of second derivatives.
Hij=∂xi∂xj∂2f
Matrices define a linear map between an input space and an output space. Any linear map f:Rn→Rm can be represented by a matrix.
For any matrix A∈Rm×n,
N(A)⊕R(AT)=RnR(A)⊕N(AT)=Rm.
Recall that a symmetric matrix is one where A=AT.
A=AT⟹A=UΛUT=∑iλiuiuiT,∥u∥=1,uiTuj=0(i=j)
Let λmin(A) be the smallest eigenvalue of symmetric matrix A and λmax(A) be the largest eigenvalue.
The Rayleigh Quotient for x=0 is ∥x∥2xTAx.
For any x=0,
λmin(A)≤∥x∥2xTAx≤λmax(A).
A symmetric matrix is positive semi-definite if xTAx≥0⟹λmin(A)≥0.
A symmetric matrix is poitive definite if xTAx>0⟹λmin(A)>0.
E={x∈Rm:xTP−1x≤1}
where P is a positive definite matrix. The eigenvectors of P give the principle axes of this ellipse, and λ are the semi-axis lengths.
The QR factorization matrix are the orthogonal matrix Q and the upper triangular matrix R such that A=QR
An easy way to find the QR factorization of a matrix is to apply Graham Schmidt to the columns of the matrix and express the result in matrix form. Suppose that our matrix A is full rank (i.e its columns ai are linearly independent) and we have applied Graham-Schmidt to columns ai+1⋯an to get orthogonal vectors qi+1⋯qn. Continuing the procedure, the ith orthogonal vector qi is
A matrix A∈Rm×n is a dyad if it can be written as pqT.
The Singular Value Decomposition of a matrix A is
A=∑i=1rσiuiviT
where σi are the singular values of A and ui and viare the left and right singular vectors.
Th singular values are ordered such that σ1>=σ2>=⋯. The left singular values are the eigenvectors of AAT and the right singular values are the eigenvectors of ATA. The singular values are λi where λi are the eigenvalues of ATA. Since AAT and ATA are symmetric, ui and vi are orthogonal. The number of non-zero singular values is equal to the rank of the matrix. We can write the SVD in matrix form as