Hilbert Space Theory
Complex random variables form a Hilbert space with inner product
X,Y=E[XY]\langle X, Y \rangle = \mathbb{E}\left[XY^*\right]
. If we have a random complex vector, then we can use Hilbert Theory in a more efficient manner by looking at the matrix of inner products. For simplicity, we will call this the “inner product” of two complex vectors.

Definition 1

Let the inner product between two random, complex vectors
Z1,Z2\boldsymbol{Z_1}, \boldsymbol{Z_2}
be defined as
Z1,Z2=E[Z1Z2]\langle \boldsymbol{Z_1}, \boldsymbol{Z_2} \rangle = \mathbb{E}\left[\boldsymbol{Z_1}\boldsymbol{Z_2}^*\right]
The ij-th entry of the matrix is simply the scalar inner product
E[XiYj]\mathbb{E}\left[X_iY_j^*\right]
where
XiX_i
and
YjY_j
are the ith and jth entries of
X\boldsymbol{X}
and
Y\boldsymbol{Y}
respectively. This means the matrix is equivalent to the cross correlation
RXYR_{XY}
between the two vectors. We can also specify the auto-correlation
RX=X,XR_X = \langle \boldsymbol{X}, \boldsymbol{X} \rangle
and auto-covariance
ΣX=XE[X],XE[X]\Sigma_X = \langle \boldsymbol{X} - \mathbb{E}\left[\boldsymbol{X}\right] , \boldsymbol{X} - \mathbb{E}\left[\boldsymbol{X}\right] \rangle
. One reason why we can think of this matrix as the inner product is because it also satisfies the properties of inner products. In particular, it is
  1. 1.
    Linear:
    α1V1+α2V2,u=α1V1,u+α2V2,u\langle \alpha_1\boldsymbol{V_1}+\alpha_2\boldsymbol{V_2}, \boldsymbol{u} \rangle = \alpha_1\langle \boldsymbol{V_1}, \boldsymbol{u} \rangle + \alpha_2\langle \boldsymbol{V_2}, \boldsymbol{u} \rangle
    .
  2. 2.
    Reflexive:
    U,V=V,U\langle \boldsymbol{U}, \boldsymbol{V} \rangle = \langle \boldsymbol{V}, \boldsymbol{U} \rangle ^*
    .
  3. 3.
    Non-degeneracy:
    V,V=0V=0\langle \boldsymbol{V}, \boldsymbol{V} \rangle = \boldsymbol{0} \Leftrightarrow \boldsymbol{V} = \boldsymbol{0}
    .
Since we are thinking of the matrix as an inner product, we can also think of the norm as a matrix.

Definition 2

The norm of a complex random vector is given by
Z2=Z,Z\|\boldsymbol{Z}\|^2 = \langle \boldsymbol{Z}, \boldsymbol{Z} \rangle
.
When thinking of inner products as matrices instead of scalars, we must rewrite the Hilbert Projection Theorem to use matrices instead.

Theorem 1 (Hilbert Projection Theorem)

The minimization problem
minX^(Y)X^(Y)X2\min_{\hat{\boldsymbol{X}}(\boldsymbol{Y})}\|\hat{\boldsymbol{X}}(\boldsymbol{Y}) - \boldsymbol{X}\|^2
has a unique solution which is a linear function of
Y\boldsymbol{Y}
. The error is orthogonal to the linear subspace of
Y\boldsymbol{Y}
(i.e
XX^,Y=0\langle \boldsymbol{X} - \hat{\boldsymbol{X}}, \boldsymbol{Y} \rangle = \boldsymbol{0}
)
When we do a minimization over a matrix, we are minimizing it in a PSD sense, so for any other linear function
X\boldsymbol{X}'
,
XX^2XX2.\|\boldsymbol{X}-\hat{\boldsymbol{X}}\|^2 \preceq \|\boldsymbol{X} - \boldsymbol{X}'\|^2.

Innovations

Suppose we have jointly distributed random variables
Y0,Y1,,YnY_0, Y_1,\cdots,Y_n
. Ideally, we would be able to “de-correlate” them so each new vector
E0E_0
captures the new information which is orthogonal to previous random vectors in the sequence. Since vectors of a Hilbert Space operate like vectors in
Rn\mathbb{R}^n
, we can simply do Gram-Schmidt on the
{Yi}i=0n\{Y_i\}_{i=0}^n
.

Definition 3

Given jointly distributed random vectors
{Yi}i=0n\{Y_i\}_{i=0}^n
with
Li=span{Yj}j=0i\mathcal{L}_i = \text{span}\{Y_j\}_{j=0}^i
, the ith innovation
EiE_i
is given by
Ei=Yiproj(YiLi1)=Yij=0i1Yi,EjEj2EjE_i = Y_i - \text{proj}(Y_i|\mathcal{L}_{i-1}) = Y_i - \sum_{j=0}^{i-1}\frac{\langle Y_i, E_j \rangle }{\|E_j\|^2}E_j
Innovations have two key properties.
  1. 1.
    ij, Ei,Ej=0\forall i\neq j,\ \langle E_i, E_j \rangle =0
  2. 2.
    i, span{Yj}j=0i=span{Ej}j=0i\forall i,\ \text{span}\{Y_j\}_{j=0}^i = \text{span}\{E_j\}_{j=0}^i
We can also write innovations in terms of a matrix where
ε=AY\boldsymbol{\varepsilon} = A\boldsymbol{Y}
where
ε=[E0E1En]T\boldsymbol{\varepsilon} = \begin{bmatrix}E_0 & E_1 & \cdots & E_n\end{bmatrix}^T
and
Y=[Y0Y1Yn]T\boldsymbol{Y} = \begin{bmatrix}Y_0 & Y_1 & \cdots & Y_n\end{bmatrix}^T
. Since each
EiE_i
only depends on the previous
YiY_i
, then A must be lower triangular, and because we need each
EiE_i
to be mutually orthogonal,
RεR_{\varepsilon}
should be diagonal.
Rε=ARYAR_{\varepsilon} = AR_YA^*
, so if
RY0R_Y \succ 0
, then we can use its unique LDL decomposition
RY=LDLR_Y = LDL^*
and let
A=L1A = L^{-1}
.
Copy link