, Theorem 1 turns into the Cauchy-Schwartz Inequality (
∣xTy∣≤∥x∥2∥y∥2
).
Functions
We consider functions to be of the form
f:Rn→R
. By contrast, a map is of the form
f:Rn→Rm
. The components of the map
f
are the scalar valued functions
fi
that produce each component of a map.
Definition 8
The graph of a function
f
is the set of input-output pairs that
f
can attain.
{(x,f(x))∈Rn+1:x∈Rn}
Definition 9
The epigraph of a function is the set of input-output pairs that
f
can achieve and anything above.
{(x,t)∈Rn+1:x∈Rn+1,t≥f(x)}
Definition 10
The t-level set is the set of points that achieve exactly some value of
f
.
{x∈Rn:f(x)=t}
Definition 11
The t-sublevel set of
f
is the set of points achieving at most a value
t
.
{x∈Rn:f(x)≤t}
Definition 12
The half-spaces are the regions of space which a hyper-plane separates.
H_={x:aTx≤b}H+={x:aTx>b}
Definition 13
A polyhedron is the intersection of
m
half-spaces given by
aiTx≤bi
for
i∈[1,m]
.
When a polyhedron is bounded, it is called a polytope.
Types of Functions
Theorem 2
A function is linear if and only if it can be expressed as
f(x)=aTx+b
for some unique pair
(a,b)
.
An affine function is linear when
b=0
. A hyperplane is simply a level set of a linear function.
Theorem 3
Any quadratic function can be written as the sum of a quadratic term involving a symmetric matrix and an affine term:
q(x)=21xTHx+cTx+d.
Another special class of functions are polyhedral functions.
Definition 14
A function
f:Rn→R
is polyhedral if its epigraph is a polyhedron.
epi f={(x,t)∈Rn+1:C[xt]≤d}
Vector Calculus
We can also do calculus with vector functions.
Definition 15
The gradient of a function at a point
x
where
f
is differentiable is a column vector of first derivatives of
f
with respsect to the components of
x
ablaf(x)=⎣⎡∂x1∂f⋮∂xn∂f⎦⎤
The gradient is perpendicular to the level sets of
f
and points from a point
x0
to higher values of the function. In other words, it is the direction of steepest increase. It is akin to the derivative of a 1D function.
Definition 16
The Hessian of a function
f
at point
x
is a matrix of second derivatives.
Hij=∂xi∂xj∂2f
The Hessian is akin to the second derivative in a 1D function. Note that the Hessian is a symmetric matrix.
Matrices
Matrices define a linear map between an input space and an output space. Any linear map
f:Rn→Rm
can be represented by a matrix.
Theorem 4 (Fundamental Theorem of Linear Algebra)
For any matrix
A∈Rm×n
,
N(A)⊕R(AT)=RnR(A)⊕N(AT)=Rm.
Symmetric Matrices
Recall that a symmetric matrix is one where
A=AT
.
Theorem 5 (Spectral Theorem)
Any symmetric matrix is orthogonally similar to a real diagonal matrix.
A=AT⟹A=UΛUT=∑iλiuiuiT,∥u∥=1,uiTuj=0(i=j)
Let
λmin(A)
be the smallest eigenvalue of symmetric matrix
A
and
λmax(A)
be the largest eigenvalue.
Definition 17
The Rayleigh Quotient for
x=0
is
∥x∥2xTAx.
Theorem 6
For any
x=0
,
λmin(A)≤∥x∥2xTAx≤λmax(A).
Two special types of symmetric matrices are those with non-negative eigenvalues.
Definition 18
A symmetric matrix is positive semi-definite if
xTAx≥0⟹λmin(A)≥0
.
Definition 19
A symmetric matrix is poitive definite if
xTAx>0⟹λmin(A)>0
.
These matrices are important because they often have very clear geometric structures. For example, an ellipsoid in multi-dimensional space can be defined as the set of points
E={x∈Rm:xTP−1x≤1}
where
P
is a positive definite matrix. The eigenvectors of
P
give the principle axes of this ellipse, and
λ
are the semi-axis lengths.
QR Factorization
Similar to how spectral theorem allows us to decompose symmetric matrices, QR factorization is another matrix decomposition technique that works for any general matrix.
Definition 20
The QR factorization matrix are the orthogonal matrix Q and the upper triangular matrix R such that
A=QR
An easy way to find the QR factorization of a matrix is to apply Graham Schmidt to the columns of the matrix and express the result in matrix form. Suppose that our matrix
A
is full rank (i.e its columns
ai
are linearly independent) and we have applied Graham-Schmidt to columns
ai+1⋯an
to get orthogonal vectors
qi+1⋯qn
. Continuing the procedure, the ith orthogonal vector