# Linear Algebraic Optimization

Many optimization problems can be solved using the machinery of Linear Algebra. These problems do not have inequality constraints or non-euclidean norms in the objective function.

## Projection

The idea behind projection is to find the closest point in a set closest (with respect to particular norm) to a given point.

{% hint style="info" %}

### Definition 28

Given a vector $$\mathbf{x}$$ in inner product space $$\mathcal{X}$$ and a subspace $$S\subseteq\mathcal{X}$$, the projection of $$\mathbf{x}$$ onto $$S$$ is given by

$$\Pi\_S(\mathbf{x}) = \text{argmin}\_{\mathbf{y}\in S}|\mathbf{y}-\mathbf{x}|$$

where the norm is the one induced by the inner product.
{% endhint %}

{% hint style="info" %}

### Theorem 9

There exists a unique vector $$\mathbf{x}^\*\in S$$ which solves

$$\min\_{\mathbf{y}\in S} |\mathbf{y}-\mathbf{x}|.$$
{% endhint %}

It is necessary and sufficient for $$\mathbf{x}^*$$ to be optimal that $$(\mathbf{x}-\mathbf{x}^*)\perp S$$. The same condition applies when projecting onto an affine set.

### Matrix Pseudo-inverses

{% hint style="info" %}

### Definition 29

A pseudoinverse is a matrix $$A^{\dagger}$$ that satisfies:

$$A A^\dagger A = A \quad A^\dagger A A^\dagger = A^\dagger \quad (AA^\dagger)^T = A A^\dagger \quad (A^\dagger A)^T = A^\dagger A$$
{% endhint %}

There are several special cases of pseudoinverses.

1. $$A^\dagger = V\_r \text{diag}\left(\frac{1}{\sigma\_1},\cdots,\frac{1}{\sigma\_r}\right)U\_r^T$$ is the Moore-Penrose Pseudo-inverse.
2. When $$A$$ and non-singular, $$A^\dagger = A^{-1}$$.
3. When $$A$$ is full column rank, $$A^\dagger = (A^TA)^{-1}A^T$$.
4. When $$A$$ is full row rank, $$A^{\dagger} = A^T(AA^T)^{-1}$$

The pseudo-inverses are useful because they can easily compute the projection of a vector onto a related subspace of $$A$$.

1. $$\text{argmin}\_{z\in\mathcal{R}(A)}|\mathbf{z}-\mathbf{y}|\_2 = AA^\dagger \mathbf{y}$$
2. $$\text{argmin}\_{z\in\mathcal{R}(A)^\perp}|\mathbf{z}-\mathbf{y}|\_2 = (I - AA^\dagger)\mathbf{y}$$
3. $$\text{argmin}\_{z\in\mathcal{N}(A)}|\mathbf{z}-\mathbf{y}|\_2 = (I - A^\dagger A)\mathbf{y}$$
4. $$\text{argmin}\_{z\in\mathcal{N}(A)^\perp}|\mathbf{z}-\mathbf{y}|\_2 = A^\dagger A\mathbf{y}$$

## Explained Variance

The Low Rank Approximation problem is to approximate a matrix $$A$$ with a rank $$k$$ matrix

$$\min\_{A\_k} |A - A\_k|\_F^2 \text{ such that rank}(A\_k) = k.$$

The solution to the low rank approximation problem is simply the first $$k$$ terms of the SVD:

$$A\_K^\star = \sum\_{i=1}^k \sigma\_i\mathbf{u}\_i\mathbf{v}^T\_i.$$

This is because the singular values give us a notion of how much of the Frobenius Norm (Total Variance) each dyad explains.

$$\eta = \frac{|A\_k|\_F^2}{|A|\_F^2} = \frac{\sum\_i^k \sigma\_i^2}{\sum\_i^r \sigma\_i^2}$$

### PCA

Suppose we had a matrix containing $$m$$ data points in $$\mathbb{R}^n$$ (each data point is a column), and without loss of generality, assume this data is centered around 0 (i.e $$\sum\_i \mathbf{x}\_i = 0$$). The variance of this data along a particular direction $$\mathbf{z}$$ is given by $$\mathbf{z}^TXX^T\mathbf{z}$$. Principle Component Analysis is finding the directions $$\mathbf{z}$$ such that the variance is maximized.

$$\max\_{z\in\mathbb{R}^n} \mathbf{z}^TXX^T\mathbf{z} \text{ such that } |\mathbf{z}|\_2 = 1$$

The left singular vector corresponding to the largest singular value of the $$XX^T$$ matrix is the optimizer of this problem, and the variance along this direction is $$\sigma\_1^2$$. If we wanted to find subsequent directions of maximal variance, they are just the left singular vectors corresponding to the largest singular values.

## Removing Constraints

Following from the Fundmental Theorem of Linear Algebra, if $$A\mathbf{x}=\mathbf{y}$$ has a solution, then the set of solutions can be expressed as

$$S = {\bar{\mathbf{x}} + N\mathbf{z}}$$

where $$A\bar{\mathbf{x}}=\mathbf{y}$$ and $$N$$ is a basis for $$\mathcal{N}(A)$$. This means if we have a constrained optimization problem

$$\min\_\mathbf{x} f\_0(\mathbf{x}) \ : \ A\mathbf{x} = \mathbf{b},$$

we can write an equivalent unconstrained problem

$$\min\_\mathbf{z} f\_0(\mathbf{x}\_0 + N\mathbf{z})$$

where $$A\mathbf{x}\_0 = \mathbf{b}$$
