Berkeley Notes
  • Introduction
  • EE120
    • Introduction to Signals and Systems
    • The Fourier Series
    • The Fourier Transform
    • Generalized transforms
    • Linear Time-Invariant Systems
    • Feedback Control
    • Sampling
    • Appendix
  • EE123
    • The DFT
    • Spectral Analysis
    • Sampling
    • Filtering
  • EECS126
    • Introduction to Probability
    • Random Variables and their Distributions
    • Concentration
    • Information Theory
    • Random Processes
    • Random Graphs
    • Statistical Inference
    • Estimation
  • EECS127
    • Linear Algebra
    • Fundamentals of Optimization
    • Linear Algebraic Optimization
    • Convex Optimization
    • Duality
  • EE128
    • Introduction to Control
    • Modeling Systems
    • System Performance
    • Design Tools
    • Cascade Compensation
    • State-Space Control
    • Digital Control Systems
    • Cayley-Hamilton
  • EECS225A
    • Hilbert Space Theory
    • Linear Estimation
    • Discrete Time Random Processes
    • Filtering
  • EE222
    • Real Analysis
    • Differential Geometry
    • Nonlinear System Dynamics
    • Stability of Nonlinear Systems
    • Nonlinear Feedback Control
Powered by GitBook
On this page
  • Norms
  • Inner Products
  • Functions
  • Types of Functions
  • Vector Calculus
  • Matrices
  • Symmetric Matrices
  • QR Factorization
  • Singular Value Decomposition

Was this helpful?

  1. EECS127

Linear Algebra

Definition 1

An affine set is one of the form A={x∈X: x=v+x0, v∈V}\mathcal{A}=\{ \mathbf{x}\in\mathcal{X}:\ \mathbf{x}=\mathbf{v}+\mathbf{x_0},\ \mathbf{v}\in\mathcal{V}\}A={x∈X: x=v+x0​, v∈V} where V\mathcal{V}V is a subspace of a vector space X\mathcal{X}X and x0x_0x0​is a given point.

Notice that by Definition 1, a subspace is simply an affine set containing the origin. Also notice that the dimension of an affine set A\mathcal{A}A is the same as the dimension of V\mathcal{V}V.

Norms

Definition 2

A norm on the vector space X\mathcal{X}X is a function ∥⋅∥:X→R\|\cdot\|:\mathcal{X}\rightarrow\mathbb{R}∥⋅∥:X→R which satisfies:

  1. ∥x∥≥0\|\mathbf{x}\|\geq 0∥x∥≥0 with equality if and only if x=0\mathbf{x}=\boldsymbol{0}x=0

  2. ∥x+y∥≤∥x∥+∥y∥\|\mathbf{x}+\mathbf{y}\|\leq\|\mathbf{x}\|+\|\mathbf{y}\|∥x+y∥≤∥x∥+∥y∥

  3. ∥αx∥=∣α∣∥x∥\|\alpha \mathbf{x}\| = |\alpha|\|\mathbf{x}\|∥αx∥=∣α∣∥x∥ for any scalar α\alphaα.

Definition 3

The lpl_plp​ norms are defined by

∥x∥p=(∑k=1n∣xk∣p)1p, 1≤p≤∞\|\mathbf{x}\|_p=\left( \sum_{k=1}^n|x_k|^p \right)^{\frac{1}{p}},\ 1\leq p\leq \infty∥x∥p​=(∑k=1n​∣xk​∣p)p1​, 1≤p≤∞

In the limit as p→∞p\to\inftyp→∞,

∥x∥∞=max⁡k∣xk∣.\|\mathbf{x}\|_{\infty} = \max_k|x_k|.∥x∥∞​=maxk​∣xk​∣.

Similar to vectors, matrices can also have norms.

Definition 4

A function f:Rm×n→Rf: \mathbb{R}^{m\times n} \to \mathbb{R}f:Rm×n→R is a matrix norm if

f(A)≥0f(A)=0⇔A=0f(αA)=∣α∣f(A)f(A+B)≤f(A)+f(B)f(A) \geq 0 \quad f(A) = 0 \Leftrightarrow A = 0 \quad f(\alpha A) = |\alpha| f(A) \quad f(A+B) \leq f(A) + f(B)f(A)≥0f(A)=0⇔A=0f(αA)=∣α∣f(A)f(A+B)≤f(A)+f(B)

Definition 5

The Froebenius norm is the l2l_2l2​ norm applied to all elements of the matrix.

∥A∥F=traceAAT=∑i=1m∑j=1n∣aij∣2\|A\|_F = \sqrt{\text{trace} AA^T} = \sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2}∥A∥F​=traceAAT​=∑i=1m​∑j=1n​∣aij​∣2​

One useful way to characterize matrices is by measuring their “gain” relative to some lpl_plp​ norm.

Definition 6

The operator norms is defined as

∥A∥p=max⁡u≠0∥Au∥p∥u∥p\|A\|_p = \max_{\mathbf{u}\ne0} \frac{\|A\mathbf{u}\|_p}{\|u\|_p}∥A∥p​=maxu=0​∥u∥p​∥Au∥p​​

When p=2p=2p=2, the norm is called the spectral norm because it relates to the largest eigenvalue of ATAA^TAATA.

∥A∥2=λmax(ATA)\|A\|_2 = \sqrt{\lambda_{max}(A^TA)}∥A∥2​=λmax​(ATA)​

Inner Products

Definition 7

An inner product on real vector space is a function that maps x,y∈X\mathbf{x},\mathbf{y} \in \mathcal{X}x,y∈X to a non-negative scalar, is distributive, is commutative, and ⟨x,x,⟩=0⇔x=0\langle \mathbf{x}, \mathbf{x}, \rangle = 0 \Leftrightarrow \mathbf{x}=0⟨x,x,⟩=0⇔x=0.

Inner products induce a norm ∥x∥=⟨x,x⟩\|\mathbf{x}\| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle}∥x∥=⟨x,x⟩​. In Rn\mathbb{R}^nRn, the standard inner product is xTy\mathbf{x}^T\mathbf{y}xTy. The angle bewteen two vectors is given by

cos⁡θ=xTy∥x∥2∥y∥2.\cos\theta = \frac{\mathbf{x}^T\mathbf{y}}{\|\mathbf{x}\|_2\|\mathbf{y}\|_2}.cosθ=∥x∥2​∥y∥2​xTy​.

In general, we can bound the absolute value of the standard inner product between two vectors.

Theorem 1 (Holder Inequality)

∣xTy∣≤∑k=1n∣xkyk∣≤∥x∥p∥y∥q, p,q≥1 s.t p−1+q−1=1.|\mathbf{x}^T\mathbf{y}| \leq \sum_{k=1}^n |x_ky_k| \leq \|\mathbf{x}\|_p\|\mathbf{y}\|_q,\ p, q\geq 1 \text{ s.t } p^{-1}+q^{-1}=1.∣xTy∣≤∑k=1n​∣xk​yk​∣≤∥x∥p​∥y∥q​, p,q≥1 s.t p−1+q−1=1.

Notice that for p=q=2p=q = 2p=q=2, Theorem 1 turns into the Cauchy-Schwartz Inequality (∣xTy∣≤∥x∥2∥y∥2|\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\|_2\|\mathbf{y}\|_2∣xTy∣≤∥x∥2​∥y∥2​).

Functions

We consider functions to be of the form f:Rn→Rf:\mathbb{R}^n\rightarrow\mathbb{R}f:Rn→R. By contrast, a map is of the form f:Rn→Rmf:\mathbb{R}^n\rightarrow\mathbb{R}^mf:Rn→Rm. The components of the map fff are the scalar valued functions fif_ifi​ that produce each component of a map.

Definition 8

The graph of a function fff is the set of input-output pairs that fff can attain.

{(x,f(x))∈Rn+1: x∈Rn}\left\{ (x, f(x))\in \mathbb{R}^{n+1}:\ x\in\mathbb{R}^n \right\}{(x,f(x))∈Rn+1: x∈Rn}

Definition 9

The epigraph of a function is the set of input-output pairs that fff can achieve and anything above.

{(x,t)∈Rn+1: x∈Rn+1, t≥f(x)}\left\{ (x,t) \in \mathbb{R}^{n+1}:\ \mathbf{x}\in\mathbb{R}^{n+1},\ t\geq f(x) \right\}{(x,t)∈Rn+1: x∈Rn+1, t≥f(x)}

Definition 10

The t-level set is the set of points that achieve exactly some value of fff.

{x∈Rn: f(x)=t}\{ \mathbf{x}\in\mathbb{R}^n:\ f(x)=t \}{x∈Rn: f(x)=t}

Definition 11

The t-sublevel set of fff is the set of points achieving at most a value ttt.

{x∈Rn: f(x)≤t}\{ x\in\mathbb{R}^n:\ f(x)\leq t \}{x∈Rn: f(x)≤t}

Definition 12

The half-spaces are the regions of space which a hyper-plane separates.

H_={x:aTx≤b}H+={x:aTx>b}H_{\_} = \{ x: \mathbf{a}^T\mathbf{x}\leq b \} \qquad H_{+} = \{ x: \mathbf{a}^T\mathbf{x} > b \}H_​={x:aTx≤b}H+​={x:aTx>b}

Definition 13

A polyhedron is the intersection of mmm half-spaces given by aiTx≤bi\mathbf{a}_i^T\mathbf{x}\leq b_iaiT​x≤bi​ for i∈[1,m]i\in[1,m]i∈[1,m].

When a polyhedron is bounded, it is called a polytope.

Types of Functions

Theorem 2

A function is linear if and only if it can be expressed as f(x)=aTx+bf(\mathbf{x}) = \mathbf{a}^T\mathbf{x}+bf(x)=aTx+b for some unique pair (a,b)(\mathbf{a}, b)(a,b).

An affine function is linear when b=0b=0b=0. A hyperplane is simply a level set of a linear function.

Theorem 3

Any quadratic function can be written as the sum of a quadratic term involving a symmetric matrix and an affine term:

q(x)=12xTHx+cTx+d.q(x) = \frac{1}{2}\mathbf{x}^TH\mathbf{x}+\mathbf{c}^T\mathbf{x} + d.q(x)=21​xTHx+cTx+d.

Another special class of functions are polyhedral functions.

Definition 14

A function f:Rn→Rf:\mathbb{R}^n\to\mathbb{R}f:Rn→R is polyhedral if its epigraph is a polyhedron.

epi f={(x,t)∈Rn+1: C[xt]≤d}\text{epi } f = \left\{(x,t) \in \mathbb{R}^{n+1} :\ C \begin{bmatrix}\mathbf{x} \\ t \end{bmatrix} \leq d \right\}epi f={(x,t)∈Rn+1: C[xt​]≤d}

Vector Calculus

We can also do calculus with vector functions.

Definition 15

The gradient of a function at a point xxx where fff is differentiable is a column vector of first derivatives of fff with respsect to the components of x\mathbf{x}x

ablaf(x)=[∂f∂x1⋮∂f∂xn]abla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}\\ \vdots\\ \frac{\partial f}{\partial x_n} \end{bmatrix}ablaf(x)=​∂x1​∂f​⋮∂xn​∂f​​​

The gradient is perpendicular to the level sets of fff and points from a point x0\mathbf{x}_0x0​ to higher values of the function. In other words, it is the direction of steepest increase. It is akin to the derivative of a 1D function.

Definition 16

The Hessian of a function fff at point xxx is a matrix of second derivatives.

Hij=∂2f∂xi∂xjH_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}Hij​=∂xi​∂xj​∂2f​

The Hessian is akin to the second derivative in a 1D function. Note that the Hessian is a symmetric matrix.

Matrices

Matrices define a linear map between an input space and an output space. Any linear map f:Rn→Rmf: \mathbb{R}^n \to \mathbb{R}^mf:Rn→Rm can be represented by a matrix.

Theorem 4 (Fundamental Theorem of Linear Algebra)

For any matrix A∈Rm×nA\in\mathbb{R}^{m\times n}A∈Rm×n,

N(A)⊕R(AT)=RnR(A)⊕N(AT)=Rm.\mathcal{N}(A) \oplus \mathcal{R}(A^T) = \mathbb{R}^n \qquad \mathcal{R}(A) \oplus \mathcal{N}(A^T) = \mathbb{R}^m.N(A)⊕R(AT)=RnR(A)⊕N(AT)=Rm.

Symmetric Matrices

Recall that a symmetric matrix is one where A=ATA = A^TA=AT.

Theorem 5 (Spectral Theorem)

Any symmetric matrix is orthogonally similar to a real diagonal matrix.

A=AT  ⟹  A=UΛUT=∑iλiuiuiT,∥u∥=1,uiTuj=0 (i≠j)A = A^T \implies A = U \Lambda U^T = \sum_i \lambda_i \mathbf{u}_i\mathbf{u}_i^T,\quad \|\mathbf{u}\| = 1, \quad \mathbf{u}_i^T\mathbf{u}_j = 0 \ (i \ne j)A=AT⟹A=UΛUT=∑i​λi​ui​uiT​,∥u∥=1,uiT​uj​=0 (i=j)

Let λmin(A)\lambda_{min}(A)λmin​(A) be the smallest eigenvalue of symmetric matrix AAA and λmax(A)\lambda_{max}(A)λmax​(A) be the largest eigenvalue.

Definition 17

The Rayleigh Quotient for x≠0\mathbf{x} \ne \boldsymbol{0}x=0 is xTAx∥x∥2.\frac{\mathbf{x}^TA\mathbf{x}}{\|\mathbf{x}\|^2}.∥x∥2xTAx​.

Theorem 6

For any x≠0\mathbf{x} \ne \boldsymbol{0}x=0,

λmin(A)≤xTAx∥x∥2≤λmax(A).\lambda_{min}(A) \leq \frac{\mathbf{x}^TA\mathbf{x}}{\|\mathbf{x}\|^2} \leq \lambda_{max}(A).λmin​(A)≤∥x∥2xTAx​≤λmax​(A).

Two special types of symmetric matrices are those with non-negative eigenvalues.

Definition 18

A symmetric matrix is positive semi-definite if xTAx≥0  ⟹  λmin(A)≥0\mathbf{x}^TA\mathbf{x} \geq 0 \implies \lambda_{min}(A) \geq 0xTAx≥0⟹λmin​(A)≥0.

Definition 19

A symmetric matrix is poitive definite if xTAx>0  ⟹  λmin(A)>0\mathbf{x}^TA\mathbf{x} > 0 \implies \lambda_{min}(A) > 0xTAx>0⟹λmin​(A)>0.

These matrices are important because they often have very clear geometric structures. For example, an ellipsoid in multi-dimensional space can be defined as the set of points

E={x∈Rm: xTP−1x≤1}\mathcal{E} = \{ x\in\mathbb{R}^m : \ \mathbf{x}^T P^{-1} \mathbf{x} \leq 1 \}E={x∈Rm: xTP−1x≤1}

where PPP is a positive definite matrix. The eigenvectors of PPP give the principle axes of this ellipse, and λ\sqrt{\lambda}λ​ are the semi-axis lengths.

QR Factorization

Similar to how spectral theorem allows us to decompose symmetric matrices, QR factorization is another matrix decomposition technique that works for any general matrix.

Definition 20

The QR factorization matrix are the orthogonal matrix Q and the upper triangular matrix R such that A=QRA = QRA=QR

An easy way to find the QR factorization of a matrix is to apply Graham Schmidt to the columns of the matrix and express the result in matrix form. Suppose that our matrix AAA is full rank (i.e its columns ai\mathbf{a}_iai​ are linearly independent) and we have applied Graham-Schmidt to columns ai+1⋯an\mathbf{a}_{i+1}\cdots\mathbf{a}_nai+1​⋯an​ to get orthogonal vectors qi+1⋯qn\mathbf{q}_{i+1}\cdots\mathbf{q}_{n}qi+1​⋯qn​. Continuing the procedure, the ith orthogonal vector qi\mathbf{q}_iqi​ is

q~i=ai−∑k=i+1n(qkTak)qkqi=q~i∥q~i∥2.\mathbf{\tilde{q}}_i = \mathbf{a}_i - \sum_{k=i+1}^{n} (\mathbf{q}_k^T \mathbf{a}_k)\mathbf{q}_k \qquad \mathbf{q}_i = \frac{\mathbf{\tilde{q}}_i}{\|\mathbf{\tilde{q}}_i\|_2}.q~​i​=ai​−∑k=i+1n​(qkT​ak​)qk​qi​=∥q~​i​∥2​q~​i​​.

If we re-arrange this, to solve for ai\mathbf{a}_iai​, we see that

ai=∥q~i∥2qi+∑k=i+1n(qkTak)qk.\mathbf{a}_i = \|\mathbf{\tilde{q}}_i\|_2 \mathbf{q}_i + \sum_{k=i+1}^{n} (\mathbf{q}_k^T \mathbf{a}_k)\mathbf{q}_k.ai​=∥q~​i​∥2​qi​+∑k=i+1n​(qkT​ak​)qk​.

Putting this in matrix form, we can see that

[∣∣∣a1a2⋯an∣∣∣]=[∣∣∣q1q2⋯qn∣∣∣][r11r12⋯r1n0r22⋯r2n⋮⋱⋱⋮0⋯0rnn]rij=aiTqj,rii=∥q~i∥2.\begin{bmatrix} | & | & & | \\ \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_{n}\\ | & | & & | \\ \end{bmatrix} = \begin{bmatrix} | & | & & | \\ \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_{n}\\ | & | & & | \\ \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n}\\ 0 & r_{22} & \cdots & r_{2n}\\ \vdots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & r_{nn} \end{bmatrix} \qquad r_{ij} = \mathbf{a}_i^T\mathbf{q_j}, r_{ii} = \|\mathbf{\tilde{q}}_i\|_2.​∣a1​∣​∣a2​∣​⋯​∣an​∣​​=​∣q1​∣​∣q2​∣​⋯​∣qn​∣​​​r11​0⋮0​r12​r22​⋱⋯​⋯⋯⋱0​r1n​r2n​⋮rnn​​​rij​=aiT​qj​,rii​=∥q~​i​∥2​.

Singular Value Decomposition

Definition 21

A matrix A∈Rm×nA\in\mathbb{R}^{m\times n}A∈Rm×n is a dyad if it can be written as pqT\mathbf{p}\mathbf{q}^TpqT.

A dyad is a rank-one matrix. It turns out that all matrices can be decomposed into a sum of dyads.

Definition 22

The Singular Value Decomposition of a matrix AAA is

A=∑i=1rσiuiviTA = \sum_{i=1}^{r} \sigma_i \mathbf{u}_i\mathbf{v}_i^TA=∑i=1r​σi​ui​viT​

where σi\sigma_iσi​ are the singular values of AAA and ui\mathbf{u}_iui​ and vi\mathbf{v}_ivi​are the left and right singular vectors.

Th singular values are ordered such that σ1>=σ2>=⋯\sigma_1 >= \sigma_2 >= \cdotsσ1​>=σ2​>=⋯. The left singular values are the eigenvectors of AATAA^TAAT and the right singular values are the eigenvectors of ATAA^TAATA. The singular values are λi\sqrt{\lambda}_iλ​i​ where λi\lambda_iλi​ are the eigenvalues of ATAA^TAATA. Since AATAA^TAAT and ATAA^TAATA are symmetric, ui\mathbf{u}_iui​ and vi\mathbf{v}_ivi​ are orthogonal. The number of non-zero singular values is equal to the rank of the matrix. We can write the SVD in matrix form as

A=[UrUn−r]diag(σ1,⋯ ,σr,0,⋯ ,0)[VrTVn−rT]A = \left[U_r\quad U_{n-r}\right]\text{diag}(\sigma_1,\cdots,\sigma_r,0,\cdots,0)\begin{bmatrix}V^T_r\\V^T_{n-r}\end{bmatrix}A=[Ur​Un−r​]diag(σ1​,⋯,σr​,0,⋯,0)[VrT​Vn−rT​​]

Writing the SVD tells us that

  1. Vn−rV_{n-r}Vn−r​ forms a basis for N(A)\mathcal{N}(A)N(A)

  2. UrU_{r}Ur​ form a basis for R(A)\mathcal{R}(A)R(A)

The Frobenius norm and spectral norm are tightly related to the SVD.

∥A∥F=∑iσi2\|A\|_F = \sum_{i}\sigma_i^2∥A∥F​=∑i​σi2​

∥A∥22=σ12\|A\|_2^2 = \sigma_1^2∥A∥22​=σ12​

PreviousEECS127NextFundamentals of Optimization

Last updated 3 years ago

Was this helpful?