# Linear Algebraic Optimization

Many optimization problems can be solved using the machinery of Linear Algebra. These problems do not have inequality constraints or non-euclidean norms in the objective function.

## Projection

The idea behind projection is to find the closest point in a set closest (with respect to particular norm) to a given point.

{% hint style="info" %}

### Definition 28

Given a vector $$\mathbf{x}$$ in inner product space $$\mathcal{X}$$ and a subspace $$S\subseteq\mathcal{X}$$, the projection of $$\mathbf{x}$$ onto $$S$$ is given by

$$\Pi\_S(\mathbf{x}) = \text{argmin}\_{\mathbf{y}\in S}|\mathbf{y}-\mathbf{x}|$$

where the norm is the one induced by the inner product.
{% endhint %}

{% hint style="info" %}

### Theorem 9

There exists a unique vector $$\mathbf{x}^\*\in S$$ which solves

$$\min\_{\mathbf{y}\in S} |\mathbf{y}-\mathbf{x}|.$$
{% endhint %}

It is necessary and sufficient for $$\mathbf{x}^*$$ to be optimal that $$(\mathbf{x}-\mathbf{x}^*)\perp S$$. The same condition applies when projecting onto an affine set.

### Matrix Pseudo-inverses

{% hint style="info" %}

### Definition 29

A pseudoinverse is a matrix $$A^{\dagger}$$ that satisfies:

$$A A^\dagger A = A \quad A^\dagger A A^\dagger = A^\dagger \quad (AA^\dagger)^T = A A^\dagger \quad (A^\dagger A)^T = A^\dagger A$$
{% endhint %}

There are several special cases of pseudoinverses.

1. $$A^\dagger = V\_r \text{diag}\left(\frac{1}{\sigma\_1},\cdots,\frac{1}{\sigma\_r}\right)U\_r^T$$ is the Moore-Penrose Pseudo-inverse.
2. When $$A$$ and non-singular, $$A^\dagger = A^{-1}$$.
3. When $$A$$ is full column rank, $$A^\dagger = (A^TA)^{-1}A^T$$.
4. When $$A$$ is full row rank, $$A^{\dagger} = A^T(AA^T)^{-1}$$

The pseudo-inverses are useful because they can easily compute the projection of a vector onto a related subspace of $$A$$.

1. $$\text{argmin}\_{z\in\mathcal{R}(A)}|\mathbf{z}-\mathbf{y}|\_2 = AA^\dagger \mathbf{y}$$
2. $$\text{argmin}\_{z\in\mathcal{R}(A)^\perp}|\mathbf{z}-\mathbf{y}|\_2 = (I - AA^\dagger)\mathbf{y}$$
3. $$\text{argmin}\_{z\in\mathcal{N}(A)}|\mathbf{z}-\mathbf{y}|\_2 = (I - A^\dagger A)\mathbf{y}$$
4. $$\text{argmin}\_{z\in\mathcal{N}(A)^\perp}|\mathbf{z}-\mathbf{y}|\_2 = A^\dagger A\mathbf{y}$$

## Explained Variance

The Low Rank Approximation problem is to approximate a matrix $$A$$ with a rank $$k$$ matrix

$$\min\_{A\_k} |A - A\_k|\_F^2 \text{ such that rank}(A\_k) = k.$$

The solution to the low rank approximation problem is simply the first $$k$$ terms of the SVD:

$$A\_K^\star = \sum\_{i=1}^k \sigma\_i\mathbf{u}\_i\mathbf{v}^T\_i.$$

This is because the singular values give us a notion of how much of the Frobenius Norm (Total Variance) each dyad explains.

$$\eta = \frac{|A\_k|\_F^2}{|A|\_F^2} = \frac{\sum\_i^k \sigma\_i^2}{\sum\_i^r \sigma\_i^2}$$

### PCA

Suppose we had a matrix containing $$m$$ data points in $$\mathbb{R}^n$$ (each data point is a column), and without loss of generality, assume this data is centered around 0 (i.e $$\sum\_i \mathbf{x}\_i = 0$$). The variance of this data along a particular direction $$\mathbf{z}$$ is given by $$\mathbf{z}^TXX^T\mathbf{z}$$. Principle Component Analysis is finding the directions $$\mathbf{z}$$ such that the variance is maximized.

$$\max\_{z\in\mathbb{R}^n} \mathbf{z}^TXX^T\mathbf{z} \text{ such that } |\mathbf{z}|\_2 = 1$$

The left singular vector corresponding to the largest singular value of the $$XX^T$$ matrix is the optimizer of this problem, and the variance along this direction is $$\sigma\_1^2$$. If we wanted to find subsequent directions of maximal variance, they are just the left singular vectors corresponding to the largest singular values.

## Removing Constraints

Following from the Fundmental Theorem of Linear Algebra, if $$A\mathbf{x}=\mathbf{y}$$ has a solution, then the set of solutions can be expressed as

$$S = {\bar{\mathbf{x}} + N\mathbf{z}}$$

where $$A\bar{\mathbf{x}}=\mathbf{y}$$ and $$N$$ is a basis for $$\mathcal{N}(A)$$. This means if we have a constrained optimization problem

$$\min\_\mathbf{x} f\_0(\mathbf{x}) \ : \ A\mathbf{x} = \mathbf{b},$$

we can write an equivalent unconstrained problem

$$\min\_\mathbf{z} f\_0(\mathbf{x}\_0 + N\mathbf{z})$$

where $$A\mathbf{x}\_0 = \mathbf{b}$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aparande.gitbook.io/berkeley-notes/eecs127-0/eecs127-3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
