Berkeley Notes
  • Introduction
  • EE120
    • Introduction to Signals and Systems
    • The Fourier Series
    • The Fourier Transform
    • Generalized transforms
    • Linear Time-Invariant Systems
    • Feedback Control
    • Sampling
    • Appendix
  • EE123
    • The DFT
    • Spectral Analysis
    • Sampling
    • Filtering
  • EECS126
    • Introduction to Probability
    • Random Variables and their Distributions
    • Concentration
    • Information Theory
    • Random Processes
    • Random Graphs
    • Statistical Inference
    • Estimation
  • EECS127
    • Linear Algebra
    • Fundamentals of Optimization
    • Linear Algebraic Optimization
    • Convex Optimization
    • Duality
  • EE128
    • Introduction to Control
    • Modeling Systems
    • System Performance
    • Design Tools
    • Cascade Compensation
    • State-Space Control
    • Digital Control Systems
    • Cayley-Hamilton
  • EECS225A
    • Hilbert Space Theory
    • Linear Estimation
    • Discrete Time Random Processes
    • Filtering
  • EE222
    • Real Analysis
    • Differential Geometry
    • Nonlinear System Dynamics
    • Stability of Nonlinear Systems
    • Nonlinear Feedback Control
Powered by GitBook
On this page

Was this helpful?

  1. EECS225A

Hilbert Space Theory

PreviousEECS225ANextLinear Estimation

Last updated 3 years ago

Was this helpful?

Complex random variables form a Hilbert space with inner product ⟨X,Y⟩=E[XY∗]\langle X, Y \rangle = \mathbb{E}\left[XY^*\right] ⟨X,Y⟩=E[XY∗]. If we have a random complex vector, then we can use Hilbert Theory in a more efficient manner by looking at the matrix of inner products. For simplicity, we will call this the “inner product” of two complex vectors.

Definition 1

Let the inner product between two random, complex vectors Z1,Z2\boldsymbol{Z_1}, \boldsymbol{Z_2}Z1​,Z2​ be defined as

⟨Z1,Z2⟩=E[Z1Z2∗]\langle \boldsymbol{Z_1}, \boldsymbol{Z_2} \rangle = \mathbb{E}\left[\boldsymbol{Z_1}\boldsymbol{Z_2}^*\right]⟨Z1​,Z2​⟩=E[Z1​Z2​∗]

The ij-th entry of the matrix is simply the scalar inner product E[XiYj∗]\mathbb{E}\left[X_iY_j^*\right] E[Xi​Yj∗​]where XiX_iXi​ and YjY_jYj​ are the ith and jth entries of X\boldsymbol{X}X and Y\boldsymbol{Y}Y respectively. This means the matrix is equivalent to the cross correlation RXYR_{XY}RXY​ between the two vectors. We can also specify the auto-correlation RX=⟨X,X⟩R_X = \langle \boldsymbol{X}, \boldsymbol{X} \rangle RX​=⟨X,X⟩and auto-covariance ΣX=⟨X−E[X],X−E[X]⟩\Sigma_X = \langle \boldsymbol{X} - \mathbb{E}\left[\boldsymbol{X}\right] , \boldsymbol{X} - \mathbb{E}\left[\boldsymbol{X}\right] \rangle ΣX​=⟨X−E[X],X−E[X]⟩. One reason why we can think of this matrix as the inner product is because it also satisfies the properties of inner products. In particular, it is

  1. Linear: ⟨α1V1+α2V2,u⟩=α1⟨V1,u⟩+α2⟨V2,u⟩\langle \alpha_1\boldsymbol{V_1}+\alpha_2\boldsymbol{V_2}, \boldsymbol{u} \rangle = \alpha_1\langle \boldsymbol{V_1}, \boldsymbol{u} \rangle + \alpha_2\langle \boldsymbol{V_2}, \boldsymbol{u} \rangle ⟨α1​V1​+α2​V2​,u⟩=α1​⟨V1​,u⟩+α2​⟨V2​,u⟩.

  2. Reflexive: ⟨U,V⟩=⟨V,U⟩∗\langle \boldsymbol{U}, \boldsymbol{V} \rangle = \langle \boldsymbol{V}, \boldsymbol{U} \rangle ^*⟨U,V⟩=⟨V,U⟩∗.

  3. Non-degeneracy: ⟨V,V⟩=0⇔V=0\langle \boldsymbol{V}, \boldsymbol{V} \rangle = \boldsymbol{0} \Leftrightarrow \boldsymbol{V} = \boldsymbol{0}⟨V,V⟩=0⇔V=0.

Since we are thinking of the matrix as an inner product, we can also think of the norm as a matrix.

Definition 2

The norm of a complex random vector is given by ∥Z∥2=⟨Z,Z⟩\|\boldsymbol{Z}\|^2 = \langle \boldsymbol{Z}, \boldsymbol{Z} \rangle ∥Z∥2=⟨Z,Z⟩.

When thinking of inner products as matrices instead of scalars, we must rewrite the Hilbert Projection Theorem to use matrices instead.

Theorem 1 (Hilbert Projection Theorem)

The minimization problem min⁡X^(Y)∥X^(Y)−X∥2\min_{\hat{\boldsymbol{X}}(\boldsymbol{Y})}\|\hat{\boldsymbol{X}}(\boldsymbol{Y}) - \boldsymbol{X}\|^2minX^(Y)​∥X^(Y)−X∥2 has a unique solution which is a linear function of Y\boldsymbol{Y}Y. The error is orthogonal to the linear subspace of Y\boldsymbol{Y}Y (i.e ⟨X−X^,Y⟩=0\langle \boldsymbol{X} - \hat{\boldsymbol{X}}, \boldsymbol{Y} \rangle = \boldsymbol{0}⟨X−X^,Y⟩=0)

When we do a minimization over a matrix, we are minimizing it in a PSD sense, so for any other linear function X′\boldsymbol{X}'X′,

∥X−X^∥2⪯∥X−X′∥2.\|\boldsymbol{X}-\hat{\boldsymbol{X}}\|^2 \preceq \|\boldsymbol{X} - \boldsymbol{X}'\|^2.∥X−X^∥2⪯∥X−X′∥2.

Innovations

Definition 3

Innovations have two key properties.

Suppose we have jointly distributed random variables Y0,Y1,⋯ ,YnY_0, Y_1,\cdots,Y_nY0​,Y1​,⋯,Yn​. Ideally, we would be able to “de-correlate” them so each new vector E0E_0E0​ captures the new information which is orthogonal to previous random vectors in the sequence. Since vectors of a Hilbert Space operate like vectors in Rn\mathbb{R}^nRn, we can simply do Gram-Schmidt on the {Yi}i=0n\{Y_i\}_{i=0}^n{Yi​}i=0n​.

Given jointly distributed random vectors {Yi}i=0n\{Y_i\}_{i=0}^n{Yi​}i=0n​ with Li=span{Yj}j=0i\mathcal{L}_i = \text{span}\{Y_j\}_{j=0}^iLi​=span{Yj​}j=0i​, the ith innovation EiE_iEi​ is given by

Ei=Yi−proj(Yi∣Li−1)=Yi−∑j=0i−1⟨Yi,Ej⟩∥Ej∥2EjE_i = Y_i - \text{proj}(Y_i|\mathcal{L}_{i-1}) = Y_i - \sum_{j=0}^{i-1}\frac{\langle Y_i, E_j \rangle }{\|E_j\|^2}E_jEi​=Yi​−proj(Yi​∣Li−1​)=Yi​−∑j=0i−1​∥Ej​∥2⟨Yi​,Ej​⟩​Ej​

∀i≠j, ⟨Ei,Ej⟩=0\forall i\neq j,\ \langle E_i, E_j \rangle =0∀i=j, ⟨Ei​,Ej​⟩=0

∀i, span{Yj}j=0i=span{Ej}j=0i\forall i,\ \text{span}\{Y_j\}_{j=0}^i = \text{span}\{E_j\}_{j=0}^i∀i, span{Yj​}j=0i​=span{Ej​}j=0i​

We can also write innovations in terms of a matrix where ε=AY\boldsymbol{\varepsilon} = A\boldsymbol{Y}ε=AY where ε=[E0E1⋯En]T\boldsymbol{\varepsilon} = \begin{bmatrix}E_0 & E_1 & \cdots & E_n\end{bmatrix}^Tε=[E0​​E1​​⋯​En​​]T and Y=[Y0Y1⋯Yn]T\boldsymbol{Y} = \begin{bmatrix}Y_0 & Y_1 & \cdots & Y_n\end{bmatrix}^TY=[Y0​​Y1​​⋯​Yn​​]T. Since each EiE_iEi​ only depends on the previous YiY_iYi​, then A must be lower triangular, and because we need each EiE_iEi​ to be mutually orthogonal, RεR_{\varepsilon}Rε​ should be diagonal. Rε=ARYA∗R_{\varepsilon} = AR_YA^*Rε​=ARY​A∗, so if RY≻0R_Y \succ 0RY​≻0, then we can use its unique LDL decomposition RY=LDL∗R_Y = LDL^*RY​=LDL∗ and let A=L−1A = L^{-1}A=L−1.