Given a vector x in inner product space X and a subspace S⊆X, the projection of x onto S is given by
ΠS(x)=argminy∈S∥y−x∥
There exists a unique vector x∗∈S which solves
miny∈S∥y−x∥.
It is necessary and sufficient for x∗ to be optimal that (x−x∗)⊥S. The same condition applies when projecting onto an affine set.
A pseudoinverse is a matrix A† that satisfies:
AA†A=AA†AA†=A†(AA†)T=AA†(A†A)T=A†A
A†=Vrdiag(σ11,⋯,σr1)UrT is the Moore-Penrose Pseudo-inverse.
When A and non-singular, A†=A−1.
When A is full column rank, A†=(ATA)−1AT.
When A is full row rank, A†=AT(AAT)−1
The pseudo-inverses are useful because they can easily compute the projection of a vector onto a related subspace of A.
argminz∈R(A)∥z−y∥2=AA†y
argminz∈R(A)⊥∥z−y∥2=(I−AA†)y
argminz∈N(A)∥z−y∥2=(I−A†A)y
argminz∈N(A)⊥∥z−y∥2=A†Ay
The Low Rank Approximation problem is to approximate a matrix A with a rank k matrix
minAk∥A−Ak∥F2 such that rank(Ak)=k.
The solution to the low rank approximation problem is simply the first k terms of the SVD:
AK⋆=∑i=1kσiuiviT.
η=∥A∥F2∥Ak∥F2=∑irσi2∑ikσi2
Suppose we had a matrix containing m data points in Rn (each data point is a column), and without loss of generality, assume this data is centered around 0 (i.e ∑ixi=0). The variance of this data along a particular direction z is given by zTXXTz. Principle Component Analysis is finding the directions z such that the variance is maximized.
maxz∈RnzTXXTz such that ∥z∥2=1
The left singular vector corresponding to the largest singular value of the XXT matrix is the optimizer of this problem, and the variance along this direction is σ12. If we wanted to find subsequent directions of maximal variance, they are just the left singular vectors corresponding to the largest singular values.
Following from the Fundmental Theorem of Linear Algebra, if Ax=y has a solution, then the set of solutions can be expressed as
S={xˉ+Nz}
where Axˉ=y and N is a basis for N(A). This means if we have a constrained optimization problem
minxf0(x) : Ax=b,
minzf0(x0+Nz)
where Ax0=b