Hilbert Space Theory

Complex random variables form a Hilbert space with inner product X,Y=E[XY]\langle X, Y \rangle = \mathbb{E}\left[XY^*\right] . If we have a random complex vector, then we can use Hilbert Theory in a more efficient manner by looking at the matrix of inner products. For simplicity, we will call this the “inner product” of two complex vectors.

Definition 1

Let the inner product between two random, complex vectors Z1,Z2\boldsymbol{Z_1}, \boldsymbol{Z_2} be defined as

Z1,Z2=E[Z1Z2]\langle \boldsymbol{Z_1}, \boldsymbol{Z_2} \rangle = \mathbb{E}\left[\boldsymbol{Z_1}\boldsymbol{Z_2}^*\right]

The ij-th entry of the matrix is simply the scalar inner product E[XiYj]\mathbb{E}\left[X_iY_j^*\right] where XiX_i and YjY_j are the ith and jth entries of X\boldsymbol{X} and Y\boldsymbol{Y} respectively. This means the matrix is equivalent to the cross correlation RXYR_{XY} between the two vectors. We can also specify the auto-correlation RX=X,XR_X = \langle \boldsymbol{X}, \boldsymbol{X} \rangle and auto-covariance ΣX=XE[X],XE[X]\Sigma_X = \langle \boldsymbol{X} - \mathbb{E}\left[\boldsymbol{X}\right] , \boldsymbol{X} - \mathbb{E}\left[\boldsymbol{X}\right] \rangle . One reason why we can think of this matrix as the inner product is because it also satisfies the properties of inner products. In particular, it is

  1. Linear: α1V1+α2V2,u=α1V1,u+α2V2,u\langle \alpha_1\boldsymbol{V_1}+\alpha_2\boldsymbol{V_2}, \boldsymbol{u} \rangle = \alpha_1\langle \boldsymbol{V_1}, \boldsymbol{u} \rangle + \alpha_2\langle \boldsymbol{V_2}, \boldsymbol{u} \rangle .

  2. Reflexive: U,V=V,U\langle \boldsymbol{U}, \boldsymbol{V} \rangle = \langle \boldsymbol{V}, \boldsymbol{U} \rangle ^*.

  3. Non-degeneracy: V,V=0V=0\langle \boldsymbol{V}, \boldsymbol{V} \rangle = \boldsymbol{0} \Leftrightarrow \boldsymbol{V} = \boldsymbol{0}.

Since we are thinking of the matrix as an inner product, we can also think of the norm as a matrix.

Definition 2

The norm of a complex random vector is given by Z2=Z,Z\|\boldsymbol{Z}\|^2 = \langle \boldsymbol{Z}, \boldsymbol{Z} \rangle .

When thinking of inner products as matrices instead of scalars, we must rewrite the Hilbert Projection Theorem to use matrices instead.

Theorem 1 (Hilbert Projection Theorem)

The minimization problem minX^(Y)X^(Y)X2\min_{\hat{\boldsymbol{X}}(\boldsymbol{Y})}\|\hat{\boldsymbol{X}}(\boldsymbol{Y}) - \boldsymbol{X}\|^2 has a unique solution which is a linear function of Y\boldsymbol{Y}. The error is orthogonal to the linear subspace of Y\boldsymbol{Y} (i.e XX^,Y=0\langle \boldsymbol{X} - \hat{\boldsymbol{X}}, \boldsymbol{Y} \rangle = \boldsymbol{0})

When we do a minimization over a matrix, we are minimizing it in a PSD sense, so for any other linear function X\boldsymbol{X}',

XX^2XX2.\|\boldsymbol{X}-\hat{\boldsymbol{X}}\|^2 \preceq \|\boldsymbol{X} - \boldsymbol{X}'\|^2.

Innovations

Suppose we have jointly distributed random variables Y0,Y1,,YnY_0, Y_1,\cdots,Y_n. Ideally, we would be able to “de-correlate” them so each new vector E0E_0 captures the new information which is orthogonal to previous random vectors in the sequence. Since vectors of a Hilbert Space operate like vectors in Rn\mathbb{R}^n, we can simply do Gram-Schmidt on the {Yi}i=0n\{Y_i\}_{i=0}^n.

Definition 3

Given jointly distributed random vectors {Yi}i=0n\{Y_i\}_{i=0}^n with Li=span{Yj}j=0i\mathcal{L}_i = \text{span}\{Y_j\}_{j=0}^i, the ith innovation EiE_i is given by

Ei=Yiproj(YiLi1)=Yij=0i1Yi,EjEj2EjE_i = Y_i - \text{proj}(Y_i|\mathcal{L}_{i-1}) = Y_i - \sum_{j=0}^{i-1}\frac{\langle Y_i, E_j \rangle }{\|E_j\|^2}E_j

Innovations have two key properties.

  1. ij, Ei,Ej=0\forall i\neq j,\ \langle E_i, E_j \rangle =0

  2. i, span{Yj}j=0i=span{Ej}j=0i\forall i,\ \text{span}\{Y_j\}_{j=0}^i = \text{span}\{E_j\}_{j=0}^i

We can also write innovations in terms of a matrix where ε=AY\boldsymbol{\varepsilon} = A\boldsymbol{Y} where ε=[E0E1En]T\boldsymbol{\varepsilon} = \begin{bmatrix}E_0 & E_1 & \cdots & E_n\end{bmatrix}^T and Y=[Y0Y1Yn]T\boldsymbol{Y} = \begin{bmatrix}Y_0 & Y_1 & \cdots & Y_n\end{bmatrix}^T. Since each EiE_i only depends on the previous YiY_i, then A must be lower triangular, and because we need each EiE_i to be mutually orthogonal, RεR_{\varepsilon} should be diagonal. Rε=ARYAR_{\varepsilon} = AR_YA^*, so if RY0R_Y \succ 0, then we can use its unique LDL decomposition RY=LDLR_Y = LDL^* and let A=L1A = L^{-1}.

Last updated