# Linear Algebra

{% hint style="info" %}

#### Definition 1

An affine set is one of the form $$\mathcal{A}={ \mathbf{x}\in\mathcal{X}:\ \mathbf{x}=\mathbf{v}+\mathbf{x\_0},\ \mathbf{v}\in\mathcal{V}}$$ where $$\mathcal{V}$$ is a subspace of a vector space $$\mathcal{X}$$ and $$x\_0$$is a given point.
{% endhint %}

Notice that by Definition 1, a subspace is simply an affine set containing the origin. Also notice that the dimension of an affine set $$\mathcal{A}$$ is the same as the dimension of $$\mathcal{V}$$.

## Norms

{% hint style="info" %}

#### Definition 2

A norm on the vector space $$\mathcal{X}$$ is a function $$|\cdot|:\mathcal{X}\rightarrow\mathbb{R}$$ which satisfies:

1. $$|\mathbf{x}|\geq 0$$ with equality if and only if $$\mathbf{x}=\boldsymbol{0}$$
2. $$|\mathbf{x}+\mathbf{y}|\leq|\mathbf{x}|+|\mathbf{y}|$$
3. $$|\alpha \mathbf{x}| = |\alpha||\mathbf{x}|$$ for any scalar $$\alpha$$.
   {% endhint %}

{% hint style="info" %}

#### Definition 3

The $$l\_p$$ norms are defined by

$$|\mathbf{x}|*p=\left( \sum*{k=1}^n|x\_k|^p \right)^{\frac{1}{p}},\ 1\leq p\leq \infty$$
{% endhint %}

In the limit as $$p\to\infty$$,

$$|\mathbf{x}|\_{\infty} = \max\_k|x\_k|.$$

Similar to vectors, matrices can also have norms.

{% hint style="info" %}

#### Definition 4

A function $$f: \mathbb{R}^{m\times n} \to \mathbb{R}$$ is a matrix norm if

$$f(A) \geq 0 \quad f(A) = 0 \Leftrightarrow A = 0 \quad f(\alpha A) = |\alpha| f(A) \quad f(A+B) \leq f(A) + f(B)$$
{% endhint %}

{% hint style="info" %}

#### Definition 5

The Froebenius norm is the $$l\_2$$ norm applied to all elements of the matrix.

$$|A|*F = \sqrt{\text{trace} AA^T} = \sqrt{\sum*{i=1}^m \sum\_{j=1}^n |a\_{ij}|^2}$$
{% endhint %}

One useful way to characterize matrices is by measuring their “gain” relative to some $$l\_p$$ norm.

{% hint style="info" %}

#### Definition 6

The operator norms is defined as

$$|A|*p = \max*{\mathbf{u}\ne0} \frac{|A\mathbf{u}|\_p}{|u|\_p}$$
{% endhint %}

When $$p=2$$, the norm is called the spectral norm because it relates to the largest eigenvalue of $$A^TA$$.

$$|A|*2 = \sqrt{\lambda*{max}(A^TA)}$$

## Inner Products

{% hint style="info" %}

#### Definition 7

An inner product on real vector space is a function that maps $$\mathbf{x},\mathbf{y} \in \mathcal{X}$$ to a non-negative scalar, is distributive, is commutative, and $$\langle \mathbf{x}, \mathbf{x}, \rangle = 0 \Leftrightarrow \mathbf{x}=0$$.
{% endhint %}

Inner products induce a norm $$|\mathbf{x}| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle}$$. In $$\mathbb{R}^n$$, the standard inner product is $$\mathbf{x}^T\mathbf{y}$$. The angle bewteen two vectors is given by

$$\cos\theta = \frac{\mathbf{x}^T\mathbf{y}}{|\mathbf{x}|\_2|\mathbf{y}|\_2}.$$

In general, we can bound the absolute value of the standard inner product between two vectors.

{% hint style="info" %}

#### Theorem 1 (Holder Inequality) <a href="#theorem-1" id="theorem-1"></a>

$$|\mathbf{x}^T\mathbf{y}| \leq \sum\_{k=1}^n |x\_ky\_k| \leq |\mathbf{x}|\_p|\mathbf{y}|\_q,\ p, q\geq 1 \text{ s.t } p^{-1}+q^{-1}=1.$$
{% endhint %}

Notice that for $$p=q = 2$$, Theorem 1 turns into the Cauchy-Schwartz Inequality ($$|\mathbf{x}^T\mathbf{y}| \leq |\mathbf{x}|\_2|\mathbf{y}|\_2$$).

## Functions

We consider functions to be of the form $$f:\mathbb{R}^n\rightarrow\mathbb{R}$$. By contrast, a map is of the form $$f:\mathbb{R}^n\rightarrow\mathbb{R}^m$$. The components of the map $$f$$ are the scalar valued functions $$f\_i$$ that produce each component of a map.

{% hint style="info" %}

#### Definition 8

The graph of a function $$f$$ is the set of input-output pairs that $$f$$ can attain.

$$\left{ (x, f(x))\in \mathbb{R}^{n+1}:\ x\in\mathbb{R}^n \right}$$
{% endhint %}

{% hint style="info" %}

#### Definition 9

The epigraph of a function is the set of input-output pairs that $$f$$ can achieve and anything above.

$$\left{ (x,t) \in \mathbb{R}^{n+1}:\ \mathbf{x}\in\mathbb{R}^{n+1},\ t\geq f(x) \right}$$
{% endhint %}

{% hint style="info" %}

#### Definition 10

The t-level set is the set of points that achieve exactly some value of $$f$$.

$${ \mathbf{x}\in\mathbb{R}^n:\ f(x)=t }$$
{% endhint %}

{% hint style="info" %}

#### Definition 11

The t-sublevel set of $$f$$ is the set of points achieving at most a value $$t$$.

$${ x\in\mathbb{R}^n:\ f(x)\leq t }$$
{% endhint %}

{% hint style="info" %}

#### Definition 12

The half-spaces are the regions of space which a hyper-plane separates.

$$H\_{\_} = { x: \mathbf{a}^T\mathbf{x}\leq b } \qquad H\_{+} = { x: \mathbf{a}^T\mathbf{x} > b }$$
{% endhint %}

{% hint style="info" %}

#### Definition 13

A polyhedron is the intersection of $$m$$ half-spaces given by $$\mathbf{a}\_i^T\mathbf{x}\leq b\_i$$ for $$i\in\[1,m]$$.
{% endhint %}

When a polyhedron is bounded, it is called a polytope.

### Types of Functions

{% hint style="info" %}

#### Theorem 2

A function is linear if and only if it can be expressed as $$f(\mathbf{x}) = \mathbf{a}^T\mathbf{x}+b$$ for some unique pair $$(\mathbf{a}, b)$$.
{% endhint %}

An affine function is linear when $$b=0$$. A hyperplane is simply a level set of a linear function.

{% hint style="info" %}

#### Theorem 3

Any quadratic function can be written as the sum of a quadratic term involving a symmetric matrix and an affine term:

$$q(x) = \frac{1}{2}\mathbf{x}^TH\mathbf{x}+\mathbf{c}^T\mathbf{x} + d.$$
{% endhint %}

Another special class of functions are polyhedral functions.

{% hint style="info" %}

#### Definition 14

A function $$f:\mathbb{R}^n\to\mathbb{R}$$ is polyhedral if its epigraph is a polyhedron.

$$\text{epi } f = \left{(x,t) \in \mathbb{R}^{n+1} :\ C \begin{bmatrix}\mathbf{x} \ t \end{bmatrix} \leq d \right}$$
{% endhint %}

### Vector Calculus

We can also do calculus with vector functions.

{% hint style="info" %}

#### Definition 15

The gradient of a function at a point $$x$$ where $$f$$ is differentiable is a column vector of first derivatives of $$f$$ with respsect to the components of $$\mathbf{x}$$

$$abla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x\_1}\ \vdots\ \frac{\partial f}{\partial x\_n} \end{bmatrix}$$
{% endhint %}

The gradient is perpendicular to the level sets of $$f$$ and points from a point $$\mathbf{x}\_0$$ to higher values of the function. In other words, it is the direction of steepest increase. It is akin to the derivative of a 1D function.

{% hint style="info" %}

#### Definition 16

The Hessian of a function $$f$$ at point $$x$$ is a matrix of second derivatives.

$$H\_{ij} = \frac{\partial^2 f}{\partial x\_i \partial x\_j}$$
{% endhint %}

The Hessian is akin to the second derivative in a 1D function. Note that the Hessian is a symmetric matrix.

## Matrices

Matrices define a linear map between an input space and an output space. Any linear map $$f: \mathbb{R}^n \to \mathbb{R}^m$$ can be represented by a matrix.

{% hint style="info" %}

#### Theorem 4 (Fundamental Theorem of Linear Algebra) <a href="#theorem-4" id="theorem-4"></a>

For any matrix $$A\in\mathbb{R}^{m\times n}$$,

$$\mathcal{N}(A) \oplus \mathcal{R}(A^T) = \mathbb{R}^n \qquad \mathcal{R}(A) \oplus \mathcal{N}(A^T) = \mathbb{R}^m.$$
{% endhint %}

### Symmetric Matrices

Recall that a symmetric matrix is one where $$A = A^T$$.

{% hint style="info" %}

#### Theorem 5 (Spectral Theorem) <a href="#theorem-5" id="theorem-5"></a>

Any symmetric matrix is orthogonally similar to a real diagonal matrix.

$$A = A^T \implies A = U \Lambda U^T = \sum\_i \lambda\_i \mathbf{u}\_i\mathbf{u}\_i^T,\quad |\mathbf{u}| = 1, \quad \mathbf{u}\_i^T\mathbf{u}\_j = 0 \ (i \ne j)$$
{% endhint %}

Let $$\lambda\_{min}(A)$$ be the smallest eigenvalue of symmetric matrix $$A$$ and $$\lambda\_{max}(A)$$ be the largest eigenvalue.

{% hint style="info" %}

#### Definition 17

The Rayleigh Quotient for $$\mathbf{x} \ne \boldsymbol{0}$$ is $$\frac{\mathbf{x}^TA\mathbf{x}}{|\mathbf{x}|^2}.$$
{% endhint %}

{% hint style="info" %}

#### Theorem 6

For any $$\mathbf{x} \ne \boldsymbol{0}$$,

$$\lambda\_{min}(A) \leq \frac{\mathbf{x}^TA\mathbf{x}}{|\mathbf{x}|^2} \leq \lambda\_{max}(A).$$
{% endhint %}

Two special types of symmetric matrices are those with non-negative eigenvalues.

{% hint style="info" %}

#### Definition 18

A symmetric matrix is positive semi-definite if $$\mathbf{x}^TA\mathbf{x} \geq 0 \implies \lambda\_{min}(A) \geq 0$$.
{% endhint %}

{% hint style="info" %}

#### Definition 19

A symmetric matrix is poitive definite if $$\mathbf{x}^TA\mathbf{x} > 0 \implies \lambda\_{min}(A) > 0$$.
{% endhint %}

These matrices are important because they often have very clear geometric structures. For example, an ellipsoid in multi-dimensional space can be defined as the set of points

$$\mathcal{E} = { x\in\mathbb{R}^m : \ \mathbf{x}^T P^{-1} \mathbf{x} \leq 1 }$$

where $$P$$ is a positive definite matrix. The eigenvectors of $$P$$ give the principle axes of this ellipse, and $$\sqrt{\lambda}$$ are the semi-axis lengths.

### QR Factorization

Similar to how spectral theorem allows us to decompose symmetric matrices, QR factorization is another matrix decomposition technique that works for any general matrix.

{% hint style="info" %}

#### Definition 20

The QR factorization matrix are the orthogonal matrix Q and the upper triangular matrix R such that $$A = QR$$
{% endhint %}

An easy way to find the QR factorization of a matrix is to apply Graham Schmidt to the columns of the matrix and express the result in matrix form. Suppose that our matrix $$A$$ is full rank (i.e its columns $$\mathbf{a}*i$$ are linearly independent) and we have applied Graham-Schmidt to columns $$\mathbf{a}*{i+1}\cdots\mathbf{a}*n$$ to get orthogonal vectors $$\mathbf{q}*{i+1}\cdots\mathbf{q}\_{n}$$. Continuing the procedure, the ith orthogonal vector $$\mathbf{q}\_i$$ is

$$\mathbf{\tilde{q}}\_i = \mathbf{a}*i - \sum*{k=i+1}^{n} (\mathbf{q}\_k^T \mathbf{a}\_k)\mathbf{q}\_k \qquad \mathbf{q}\_i = \frac{\mathbf{\tilde{q}}\_i}{|\mathbf{\tilde{q}}\_i|\_2}.$$

If we re-arrange this, to solve for $$\mathbf{a}\_i$$, we see that

$$\mathbf{a}\_i = |\mathbf{\tilde{q}}\_i|\_2 \mathbf{q}*i + \sum*{k=i+1}^{n} (\mathbf{q}\_k^T \mathbf{a}\_k)\mathbf{q}\_k.$$

Putting this in matrix form, we can see that

$$\begin{bmatrix} | & | & & | \ \mathbf{a}*1 & \mathbf{a}*2 & \cdots & \mathbf{a}*{n}\ | & | & & | \ \end{bmatrix} = \begin{bmatrix} | & | & & | \ \mathbf{q}*1 & \mathbf{q}*2 & \cdots & \mathbf{q}*{n}\ | & | & & | \ \end{bmatrix} \begin{bmatrix} r*{11} & r*{12} & \cdots & r\_{1n}\ 0 & r\_{22} & \cdots & r\_{2n}\ \vdots & \ddots & \ddots & \vdots\ 0 & \cdots & 0 & r\_{nn} \end{bmatrix} \qquad r\_{ij} = \mathbf{a}*i^T\mathbf{q\_j}, r*{ii} = |\mathbf{\tilde{q}}\_i|\_2.$$

### Singular Value Decomposition

{% hint style="info" %}

#### Definition 21

A matrix $$A\in\mathbb{R}^{m\times n}$$ is a dyad if it can be written as $$\mathbf{p}\mathbf{q}^T$$.
{% endhint %}

A dyad is a rank-one matrix. It turns out that all matrices can be decomposed into a sum of dyads.

{% hint style="info" %}

#### Definition 22

The Singular Value Decomposition of a matrix $$A$$ is

$$A = \sum\_{i=1}^{r} \sigma\_i \mathbf{u}\_i\mathbf{v}\_i^T$$

where $$\sigma\_i$$ are the singular values of $$A$$ and $$\mathbf{u}\_i$$ and $$\mathbf{v}\_i$$are the left and right singular vectors.
{% endhint %}

Th singular values are ordered such that $$\sigma\_1 >= \sigma\_2 >= \cdots$$. The left singular values are the eigenvectors of $$AA^T$$ and the right singular values are the eigenvectors of $$A^TA$$. The singular values are $$\sqrt{\lambda}\_i$$ where $$\lambda\_i$$ are the eigenvalues of $$A^TA$$. Since $$AA^T$$ and $$A^TA$$ are symmetric, $$\mathbf{u}\_i$$ and $$\mathbf{v}\_i$$ are orthogonal. The number of non-zero singular values is equal to the rank of the matrix. We can write the SVD in matrix form as

$$A = \left\[U\_r\quad U\_{n-r}\right]\text{diag}(\sigma\_1,\cdots,\sigma\_r,0,\cdots,0)\begin{bmatrix}V^T\_r\V^T\_{n-r}\end{bmatrix}$$

Writing the SVD tells us that

1. $$V\_{n-r}$$ forms a basis for $$\mathcal{N}(A)$$
2. $$U\_{r}$$ form a basis for $$\mathcal{R}(A)$$

The Frobenius norm and spectral norm are tightly related to the SVD.

$$|A|*F = \sum*{i}\sigma\_i^2$$

$$|A|\_2^2 = \sigma\_1^2$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aparande.gitbook.io/berkeley-notes/eecs127-0/eecs127-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
