Berkeley Notes
  • Introduction
  • EE120
    • Introduction to Signals and Systems
    • The Fourier Series
    • The Fourier Transform
    • Generalized transforms
    • Linear Time-Invariant Systems
    • Feedback Control
    • Sampling
    • Appendix
  • EE123
    • The DFT
    • Spectral Analysis
    • Sampling
    • Filtering
  • EECS126
    • Introduction to Probability
    • Random Variables and their Distributions
    • Concentration
    • Information Theory
    • Random Processes
    • Random Graphs
    • Statistical Inference
    • Estimation
  • EECS127
    • Linear Algebra
    • Fundamentals of Optimization
    • Linear Algebraic Optimization
    • Convex Optimization
    • Duality
  • EE128
    • Introduction to Control
    • Modeling Systems
    • System Performance
    • Design Tools
    • Cascade Compensation
    • State-Space Control
    • Digital Control Systems
    • Cayley-Hamilton
  • EECS225A
    • Hilbert Space Theory
    • Linear Estimation
    • Discrete Time Random Processes
    • Filtering
  • EE222
    • Real Analysis
    • Differential Geometry
    • Nonlinear System Dynamics
    • Stability of Nonlinear Systems
    • Nonlinear Feedback Control
Powered by GitBook
On this page
  • Projection
  • Matrix Pseudo-inverses
  • Explained Variance
  • PCA
  • Removing Constraints

Was this helpful?

  1. EECS127

Linear Algebraic Optimization

Many optimization problems can be solved using the machinery of Linear Algebra. These problems do not have inequality constraints or non-euclidean norms in the objective function.

Projection

The idea behind projection is to find the closest point in a set closest (with respect to particular norm) to a given point.

Definition 28

Given a vector x\mathbf{x}x in inner product space X\mathcal{X}X and a subspace S⊆XS\subseteq\mathcal{X}S⊆X, the projection of x\mathbf{x}x onto SSS is given by

ΠS(x)=argminy∈S∥y−x∥\Pi_S(\mathbf{x}) = \text{argmin}_{\mathbf{y}\in S}\|\mathbf{y}-\mathbf{x}\|ΠS​(x)=argminy∈S​∥y−x∥

where the norm is the one induced by the inner product.

Theorem 9

There exists a unique vector x∗∈S\mathbf{x}^*\in Sx∗∈S which solves

min⁡y∈S∥y−x∥.\min_{\mathbf{y}\in S} \|\mathbf{y}-\mathbf{x}\|.miny∈S​∥y−x∥.

It is necessary and sufficient for x∗\mathbf{x}^*x∗ to be optimal that (x−x∗)⊥S(\mathbf{x}-\mathbf{x}^*)\perp S(x−x∗)⊥S. The same condition applies when projecting onto an affine set.

Matrix Pseudo-inverses

Definition 29

A pseudoinverse is a matrix A†A^{\dagger}A† that satisfies:

AA†A=AA†AA†=A†(AA†)T=AA†(A†A)T=A†AA A^\dagger A = A \quad A^\dagger A A^\dagger = A^\dagger \quad (AA^\dagger)^T = A A^\dagger \quad (A^\dagger A)^T = A^\dagger AAA†A=AA†AA†=A†(AA†)T=AA†(A†A)T=A†A

There are several special cases of pseudoinverses.

  1. A†=Vrdiag(1σ1,⋯ ,1σr)UrTA^\dagger = V_r \text{diag}\left(\frac{1}{\sigma_1},\cdots,\frac{1}{\sigma_r}\right)U_r^TA†=Vr​diag(σ1​1​,⋯,σr​1​)UrT​ is the Moore-Penrose Pseudo-inverse.

  2. When AAA and non-singular, A†=A−1A^\dagger = A^{-1}A†=A−1.

  3. When AAA is full column rank, A†=(ATA)−1ATA^\dagger = (A^TA)^{-1}A^TA†=(ATA)−1AT.

  4. When AAA is full row rank, A†=AT(AAT)−1A^{\dagger} = A^T(AA^T)^{-1}A†=AT(AAT)−1

The pseudo-inverses are useful because they can easily compute the projection of a vector onto a related subspace of AAA.

  1. argminz∈R(A)∥z−y∥2=AA†y\text{argmin}_{z\in\mathcal{R}(A)}\|\mathbf{z}-\mathbf{y}\|_2 = AA^\dagger \mathbf{y}argminz∈R(A)​∥z−y∥2​=AA†y

  2. argminz∈R(A)⊥∥z−y∥2=(I−AA†)y\text{argmin}_{z\in\mathcal{R}(A)^\perp}\|\mathbf{z}-\mathbf{y}\|_2 = (I - AA^\dagger)\mathbf{y}argminz∈R(A)⊥​∥z−y∥2​=(I−AA†)y

  3. argminz∈N(A)∥z−y∥2=(I−A†A)y\text{argmin}_{z\in\mathcal{N}(A)}\|\mathbf{z}-\mathbf{y}\|_2 = (I - A^\dagger A)\mathbf{y}argminz∈N(A)​∥z−y∥2​=(I−A†A)y

  4. argminz∈N(A)⊥∥z−y∥2=A†Ay\text{argmin}_{z\in\mathcal{N}(A)^\perp}\|\mathbf{z}-\mathbf{y}\|_2 = A^\dagger A\mathbf{y}argminz∈N(A)⊥​∥z−y∥2​=A†Ay

Explained Variance

The Low Rank Approximation problem is to approximate a matrix AAA with a rank kkk matrix

min⁡Ak∥A−Ak∥F2 such that rank(Ak)=k.\min_{A_k} \|A - A_k\|_F^2 \text{ such that rank}(A_k) = k.minAk​​∥A−Ak​∥F2​ such that rank(Ak​)=k.

The solution to the low rank approximation problem is simply the first kkk terms of the SVD:

AK⋆=∑i=1kσiuiviT.A_K^\star = \sum_{i=1}^k \sigma_i\mathbf{u}_i\mathbf{v}^T_i.AK⋆​=∑i=1k​σi​ui​viT​.

This is because the singular values give us a notion of how much of the Frobenius Norm (Total Variance) each dyad explains.

η=∥Ak∥F2∥A∥F2=∑ikσi2∑irσi2\eta = \frac{\|A_k\|_F^2}{\|A\|_F^2} = \frac{\sum_i^k \sigma_i^2}{\sum_i^r \sigma_i^2}η=∥A∥F2​∥Ak​∥F2​​=∑ir​σi2​∑ik​σi2​​

PCA

Suppose we had a matrix containing mmm data points in Rn\mathbb{R}^nRn (each data point is a column), and without loss of generality, assume this data is centered around 0 (i.e ∑ixi=0\sum_i \mathbf{x}_i = 0∑i​xi​=0). The variance of this data along a particular direction z\mathbf{z}z is given by zTXXTz\mathbf{z}^TXX^T\mathbf{z}zTXXTz. Principle Component Analysis is finding the directions z\mathbf{z}z such that the variance is maximized.

max⁡z∈RnzTXXTz such that ∥z∥2=1\max_{z\in\mathbb{R}^n} \mathbf{z}^TXX^T\mathbf{z} \text{ such that } \|\mathbf{z}\|_2 = 1maxz∈Rn​zTXXTz such that ∥z∥2​=1

The left singular vector corresponding to the largest singular value of the XXTXX^TXXT matrix is the optimizer of this problem, and the variance along this direction is σ12\sigma_1^2σ12​. If we wanted to find subsequent directions of maximal variance, they are just the left singular vectors corresponding to the largest singular values.

Removing Constraints

Following from the Fundmental Theorem of Linear Algebra, if Ax=yA\mathbf{x}=\mathbf{y}Ax=y has a solution, then the set of solutions can be expressed as

S={xˉ+Nz}S = \{\bar{\mathbf{x}} + N\mathbf{z}\}S={xˉ+Nz}

where Axˉ=yA\bar{\mathbf{x}}=\mathbf{y}Axˉ=y and NNN is a basis for N(A)\mathcal{N}(A)N(A). This means if we have a constrained optimization problem

min⁡xf0(x) : Ax=b,\min_\mathbf{x} f_0(\mathbf{x}) \ : \ A\mathbf{x} = \mathbf{b},minx​f0​(x) : Ax=b,

we can write an equivalent unconstrained problem

min⁡zf0(x0+Nz)\min_\mathbf{z} f_0(\mathbf{x}_0 + N\mathbf{z})minz​f0​(x0​+Nz)

where Ax0=bA\mathbf{x}_0 = \mathbf{b}Ax0​=b

PreviousFundamentals of OptimizationNextConvex Optimization

Last updated 3 years ago

Was this helpful?