# Hilbert Space Theory

Complex random variables form a Hilbert space with inner product $$\langle X, Y \rangle = \mathbb{E}\left\[XY^\*\right]$$. If we have a random complex vector, then we can use Hilbert Theory in a more efficient manner by looking at the matrix of inner products. For simplicity, we will call this the “inner product” of two complex vectors.

{% hint style="info" %}

#### Definition 1

Let the inner product between two random, complex vectors $$\boldsymbol{Z\_1}, \boldsymbol{Z\_2}$$ be defined as

$$\langle \boldsymbol{Z\_1}, \boldsymbol{Z\_2} \rangle = \mathbb{E}\left\[\boldsymbol{Z\_1}\boldsymbol{Z\_2}^\*\right]$$
{% endhint %}

The ij-th entry of the matrix is simply the scalar inner product $$\mathbb{E}\left\[X\_iY\_j^\*\right]$$where $$X\_i$$ and $$Y\_j$$ are the ith and jth entries of $$\boldsymbol{X}$$ and $$\boldsymbol{Y}$$ respectively. This means the matrix is equivalent to the cross correlation $$R\_{XY}$$ between the two vectors. We can also specify the auto-correlation $$R\_X = \langle \boldsymbol{X}, \boldsymbol{X} \rangle$$and auto-covariance $$\Sigma\_X = \langle \boldsymbol{X} - \mathbb{E}\left\[\boldsymbol{X}\right] , \boldsymbol{X} - \mathbb{E}\left\[\boldsymbol{X}\right] \rangle$$. One reason why we can think of this matrix as the inner product is because it also satisfies the properties of inner products. In particular, it is

1. Linear: $$\langle \alpha\_1\boldsymbol{V\_1}+\alpha\_2\boldsymbol{V\_2}, \boldsymbol{u} \rangle = \alpha\_1\langle \boldsymbol{V\_1}, \boldsymbol{u} \rangle + \alpha\_2\langle \boldsymbol{V\_2}, \boldsymbol{u} \rangle$$.
2. Reflexive: $$\langle \boldsymbol{U}, \boldsymbol{V} \rangle = \langle \boldsymbol{V}, \boldsymbol{U} \rangle ^\*$$.
3. Non-degeneracy: $$\langle \boldsymbol{V}, \boldsymbol{V} \rangle = \boldsymbol{0} \Leftrightarrow \boldsymbol{V} = \boldsymbol{0}$$.

Since we are thinking of the matrix as an inner product, we can also think of the norm as a matrix.

{% hint style="info" %}

#### Definition 2

The norm of a complex random vector is given by $$|\boldsymbol{Z}|^2 = \langle \boldsymbol{Z}, \boldsymbol{Z} \rangle$$.
{% endhint %}

When thinking of inner products as matrices instead of scalars, we must rewrite the Hilbert Projection Theorem to use matrices instead.

{% hint style="info" %}

#### Theorem 1 (Hilbert Projection Theorem) <a href="#theorem-1" id="theorem-1"></a>

The minimization problem $$\min\_{\hat{\boldsymbol{X}}(\boldsymbol{Y})}|\hat{\boldsymbol{X}}(\boldsymbol{Y}) - \boldsymbol{X}|^2$$ has a unique solution which is a linear function of $$\boldsymbol{Y}$$. The error is orthogonal to the linear subspace of $$\boldsymbol{Y}$$ (i.e $$\langle \boldsymbol{X} - \hat{\boldsymbol{X}}, \boldsymbol{Y} \rangle = \boldsymbol{0}$$)
{% endhint %}

When we do a minimization over a matrix, we are minimizing it in a PSD sense, so for any other linear function $$\boldsymbol{X}'$$,

$$|\boldsymbol{X}-\hat{\boldsymbol{X}}|^2 \preceq |\boldsymbol{X} - \boldsymbol{X}'|^2.$$

## Innovations

Suppose we have jointly distributed random variables $$Y\_0, Y\_1,\cdots,Y\_n$$. Ideally, we would be able to “de-correlate” them so each new vector $$E\_0$$ captures the new information which is orthogonal to previous random vectors in the sequence. Since vectors of a Hilbert Space operate like vectors in $$\mathbb{R}^n$$, we can simply do Gram-Schmidt on the $${Y\_i}\_{i=0}^n$$.

{% hint style="info" %}

#### Definition 3

Given jointly distributed random vectors $${Y\_i}\_{i=0}^n$$ with $$\mathcal{L}*i = \text{span}{Y\_j}*{j=0}^i$$, the ith innovation $$E\_i$$ is given by

$$E\_i = Y\_i - \text{proj}(Y\_i|\mathcal{L}*{i-1}) = Y\_i - \sum*{j=0}^{i-1}\frac{\langle Y\_i, E\_j \rangle }{|E\_j|^2}E\_j$$
{% endhint %}

Innovations have two key properties.

1. $$\forall i\neq j,\ \langle E\_i, E\_j \rangle =0$$
2. $$\forall i,\ \text{span}{Y\_j}*{j=0}^i = \text{span}{E\_j}*{j=0}^i$$

We can also write innovations in terms of a matrix where $$\boldsymbol{\varepsilon} = A\boldsymbol{Y}$$ where $$\boldsymbol{\varepsilon} = \begin{bmatrix}E\_0 & E\_1 & \cdots & E\_n\end{bmatrix}^T$$ and $$\boldsymbol{Y} = \begin{bmatrix}Y\_0 & Y\_1 & \cdots & Y\_n\end{bmatrix}^T$$. Since each $$E\_i$$ only depends on the previous $$Y\_i$$, then A must be lower triangular, and because we need each $$E\_i$$ to be mutually orthogonal, $$R\_{\varepsilon}$$ should be diagonal. $$R\_{\varepsilon} = AR\_YA^*$$, so if $$R\_Y \succ 0$$, then we can use its unique LDL decomposition $$R\_Y = LDL^*$$ and let $$A = L^{-1}$$.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aparande.gitbook.io/berkeley-notes/eecs225a-0/eecs225a-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
