Complex random variables form a Hilbert space with inner product ⟨X,Y⟩=E[XY∗]. If we have a random complex vector, then we can use Hilbert Theory in a more efficient manner by looking at the matrix of inner products. For simplicity, we will call this the “inner product” of two complex vectors.
Definition 1
Let the inner product between two random, complex vectors Z1,Z2 be defined as
⟨Z1,Z2⟩=E[Z1Z2∗]
The ij-th entry of the matrix is simply the scalar inner product E[XiYj∗]where Xi and Yj are the ith and jth entries of X and Y respectively. This means the matrix is equivalent to the cross correlation RXY between the two vectors. We can also specify the auto-correlation RX=⟨X,X⟩and auto-covariance ΣX=⟨X−E[X],X−E[X]⟩. One reason why we can think of this matrix as the inner product is because it also satisfies the properties of inner products. In particular, it is
Linear: ⟨α1V1+α2V2,u⟩=α1⟨V1,u⟩+α2⟨V2,u⟩.
Reflexive: ⟨U,V⟩=⟨V,U⟩∗.
Non-degeneracy: ⟨V,V⟩=0⇔V=0.
Since we are thinking of the matrix as an inner product, we can also think of the norm as a matrix.
Definition 2
The norm of a complex random vector is given by ∥Z∥2=⟨Z,Z⟩.
When thinking of inner products as matrices instead of scalars, we must rewrite the Hilbert Projection Theorem to use matrices instead.
Theorem 1 (Hilbert Projection Theorem)
The minimization problem minX^(Y)∥X^(Y)−X∥2 has a unique solution which is a linear function of Y. The error is orthogonal to the linear subspace of Y (i.e ⟨X−X^,Y⟩=0)
When we do a minimization over a matrix, we are minimizing it in a PSD sense, so for any other linear function X′,
∥X−X^∥2⪯∥X−X′∥2.
Innovations
Suppose we have jointly distributed random variables Y0,Y1,⋯,Yn. Ideally, we would be able to “de-correlate” them so each new vector E0 captures the new information which is orthogonal to previous random vectors in the sequence. Since vectors of a Hilbert Space operate like vectors in Rn, we can simply do Gram-Schmidt on the {Yi}i=0n.
Definition 3
Given jointly distributed random vectors {Yi}i=0n with Li=span{Yj}j=0i, the ith innovation Ei is given by
We can also write innovations in terms of a matrix where ε=AY where ε=[E0E1⋯En]T and Y=[Y0Y1⋯Yn]T. Since each Ei only depends on the previous Yi, then A must be lower triangular, and because we need each Ei to be mutually orthogonal, Rε should be diagonal. Rε=ARYA∗, so if RY≻0, then we can use its unique LDL decomposition RY=LDL∗ and let A=L−1.