Berkeley Notes
  • Introduction
  • EE120
    • Introduction to Signals and Systems
    • The Fourier Series
    • The Fourier Transform
    • Generalized transforms
    • Linear Time-Invariant Systems
    • Feedback Control
    • Sampling
    • Appendix
  • EE123
    • The DFT
    • Spectral Analysis
    • Sampling
    • Filtering
  • EECS126
    • Introduction to Probability
    • Random Variables and their Distributions
    • Concentration
    • Information Theory
    • Random Processes
    • Random Graphs
    • Statistical Inference
    • Estimation
  • EECS127
    • Linear Algebra
    • Fundamentals of Optimization
    • Linear Algebraic Optimization
    • Convex Optimization
    • Duality
  • EE128
    • Introduction to Control
    • Modeling Systems
    • System Performance
    • Design Tools
    • Cascade Compensation
    • State-Space Control
    • Digital Control Systems
    • Cayley-Hamilton
  • EECS225A
    • Hilbert Space Theory
    • Linear Estimation
    • Discrete Time Random Processes
    • Filtering
  • EE222
    • Real Analysis
    • Differential Geometry
    • Nonlinear System Dynamics
    • Stability of Nonlinear Systems
    • Nonlinear Feedback Control
Powered by GitBook
On this page
  • Linear Estimation
  • Kalman Filtering

Was this helpful?

  1. EECS126

Estimation

PreviousStatistical InferenceNextEECS127

Last updated 3 years ago

Was this helpful?

Whereas hypothesis testing is about discriminating between two or more hypotheses, estimation is about guessing the numerical value of or ground truth of a random variable.

In order to measure the quality of our estimation, we need a metric to measure error. One commonly used error is the mean squared error

E[(X−X^(Y))2].\mathbb{E}\left[(X - \hat{X}(Y))^2\right] .E[(X−X^(Y))2].

Theorem 45

The minimum mean square estimate (MMSE) of a random variable XXX is given by the conditional expectation.

X^(Y)=E[X∣Y]=argminX^E[(X−X^(Y))2].\hat{X}(Y) = \mathbb{E}\left[X|Y\right] = \text{argmin}_{\hat{X}} \mathbb{E}\left[(X - \hat{X}(Y))^2\right] .X^(Y)=E[X∣Y]=argminX^​E[(X−X^(Y))2].

This essentially follows from the definition of conditional expectation since it is orthogonal to all other functions of YYY, and so by the Hilbert Projection Theorem, it must be the projection of XXX onto the space of all functions of YYY. There are two problems with using MMSE all the time.

  1. We often don’t know pY∣Xp_{Y|X}pY∣X​ explicitly and only have a good model for it.

  2. Even if we knew the model pY∣Xp_{Y|X}pY∣X​, conditional expectations are difficult to compute.

Linear Estimation

Since finding the MMSE is difficult, we can restrict ourselves to funtions of a particular type.

Definition 86

The Linear Least Squares Estimator (LLSE) L[X∣Y]\mathbb{L}\left[\boldsymbol{X}|\boldsymbol{Y}\right]L[X∣Y] is the projection of a vector of random variables X\boldsymbol{X}X onto the subspace of linear functions of observations Yi, U={a+BY}Y_i,\ \mathcal{U} = \left\{ \boldsymbol{a} + B\boldsymbol{Y} \right\}Yi​, U={a+BY} where Y\boldsymbol{Y}Yis a vector of observations.

By the orthogonality principle,

  1. E[(X−L[X∣Y])1]=0  ⟹  E[L[X∣Y]]=E[X]\mathbb{E}\left[(\boldsymbol{X} - \mathbb{L}\left[\boldsymbol{X}|\boldsymbol{Y}\right])1\right] = 0 \implies \mathbb{E}\left[\mathbb{L}\left[\boldsymbol{X}|\boldsymbol{Y}\right]\right] = \mathbb{E}\left[\boldsymbol{X}\right]E[(X−L[X∣Y])1]=0⟹E[L[X∣Y]]=E[X]

  2. E[(X−L[X∣Y])Yi]=0\mathbb{E}\left[(\boldsymbol{X} - \mathbb{L}\left[\boldsymbol{X}|\boldsymbol{Y}\right])Y_i\right] = 0E[(X−L[X∣Y])Yi​]=0

From here, we can derive a closed form expression for the LLSE. Let μY=E[Y],μX=E[X],ΣY=E[(Y−μY)(Y−μY)T],ΣXY=E[(X−μX)(Y−μY)T]\boldsymbol{\mu_{Y}} = \mathbb{E}\left[\boldsymbol{Y}\right] , \boldsymbol{\mu_{X}} = \mathbb{E}\left[\boldsymbol{X}\right] , \Sigma_{\boldsymbol{Y}} = \mathbb{E}\left[(\boldsymbol{Y}-\boldsymbol{\mu_Y})(\boldsymbol{Y}-\boldsymbol{\mu_Y})^T\right] , \Sigma_{\boldsymbol{XY}} = \mathbb{E}\left[(\boldsymbol{X}-\boldsymbol{\mu_X})(\boldsymbol{Y}-\boldsymbol{\mu_Y})^T\right]μY​=E[Y],μX​=E[X],ΣY​=E[(Y−μY​)(Y−μY​)T],ΣXY​=E[(X−μX​)(Y−μY​)T]. By substituting L[X∣Y]=a+BY\mathbb{L}\left[\boldsymbol{X}|\boldsymbol{Y}\right] = \boldsymbol{a}+B\boldsymbol{Y}L[X∣Y]=a+BY into the equations we found from the orthogonality principle,

a+BμY=μXa(μY)i+BE[YYi]=E[XYi]  ⟹  a(μY)i+B(ΣY)i+B(μY)iμY=(ΣXY)i+(μY)iμx  ⟹  aμYT+BΣY+BμYμYT=ΣXY+μXμYT\begin{aligned} \boldsymbol{a}+B\boldsymbol{\mu_Y} &= \boldsymbol{\mu_X} \\ a(\boldsymbol{\mu_Y})_i + B \mathbb{E}\left[\boldsymbol{Y}Y_i\right] = \mathbb{E}\left[\boldsymbol{X}Y_i\right] &\implies \boldsymbol{a}(\boldsymbol{\mu_{Y}})_i + B(\Sigma_{\boldsymbol{Y}})_i + B(\boldsymbol{\mu_{Y}})_i\boldsymbol{\mu_Y} = (\Sigma_{\boldsymbol{XY}})_i + (\boldsymbol{\mu_{Y}})_i\boldsymbol{\mu_x}\\ &\implies \boldsymbol{a}\boldsymbol{\mu_Y}^T + B\Sigma_{\boldsymbol{Y}}+B\boldsymbol{\mu_Y\mu_Y}^T = \Sigma_{\boldsymbol{XY}}+\boldsymbol{\mu_X\mu_Y}^T\end{aligned}a+BμY​a(μY​)i​+BE[YYi​]=E[XYi​]​=μX​⟹a(μY​)i​+B(ΣY​)i​+B(μY​)i​μY​=(ΣXY​)i​+(μY​)i​μx​⟹aμY​T+BΣY​+BμY​μY​T=ΣXY​+μX​μY​T​

Solving this system yields

B=ΣXYΣY−1a=μX−ΣXYΣY−1μY.B = \Sigma_{\boldsymbol{XY}}\Sigma_{\boldsymbol{Y}}^{-1} \qquad \boldsymbol{a} = \boldsymbol{\mu_X} - \Sigma_{\boldsymbol{XY}}\Sigma_{\boldsymbol{Y}}^{-1}\boldsymbol{\mu_Y}.B=ΣXY​ΣY−1​a=μX​−ΣXY​ΣY−1​μY​.

Theorem 46

The Linear Least Squares Estimator for vector of random variables X\boldsymbol{X}X given a vector of random variables Y\boldsymbol{Y}Y is

L[X∣Y]=μX+ΣXYΣY−1(Y−μY)\mathbb{L}\left[\boldsymbol{X}|\boldsymbol{Y}\right] = \boldsymbol{\mu_X} + \Sigma_{\boldsymbol{XY}}\Sigma_{\boldsymbol{Y}}^{-1}(\boldsymbol{Y}-\boldsymbol{\mu_Y})L[X∣Y]=μX​+ΣXY​ΣY−1​(Y−μY​)

If XXX and YYY are both a single random variable, this reduces to

L[X∣Y]=μX+Cov(X,Y)Var(Y)(Y−μY)\mathbb{L}\left[X|Y\right] = \mu_X + \frac{\text{Cov}\left(X, Y\right) }{\text{Var}\left(Y\right) }(Y - \mu_Y)L[X∣Y]=μX​+Var(Y)Cov(X,Y)​(Y−μY​)

Since LLSE is essentially projection onto a Linear Subspace, if we have an orthogonal basis for the subspace, then we can do the projection onto the subspace one component at a time. The Gram-Schmidt Process turns vectors Y1,⋯ ,YnY_1,\cdots,Y_nY1​,⋯,Yn​ into an orthonormal set Y~1,⋯ ,Y~n\tilde{Y}_1, \cdots, \tilde{Y}_nY~1​,⋯,Y~n​. If we define Y(n)=(Y1,⋯ ,Yn)Y^{(n)}=(Y_1, \cdots, Y_n)Y(n)=(Y1​,⋯,Yn​),

  1. Y~1=Y1∥Y1∥\tilde{Y}_1 = \frac{Y_1}{\|Y_1\|}Y~1​=∥Y1​∥Y1​​

  2. Y~i+1=Yi+1−∑k=1i⟨Yi+1,Y~k⟩Y~k=Yi+1−L[Yi+1∣Y(i)]\tilde{Y}_{i+1} = Y_{i+1} - \sum_{k=1}^{i}\langle Y_{i+1}, \tilde{Y}_k \rangle \tilde{Y}_k = Y_{i+1} - \mathbb{L}\left[Y_{i+1}|Y^{(i)}\right]Y~i+1​=Yi+1​−∑k=1i​⟨Yi+1​,Y~k​⟩Y~k​=Yi+1​−L[Yi+1​∣Y(i)]

Definition 87

The linear innovation sequence of random variables Y1,⋯ ,YnY_1,\cdots,Y_nY1​,⋯,Yn​ is the orthogonal set Y1~,⋯ ,Yn~\tilde{Y_1}, \cdots, \tilde{Y_n}Y1​~​,⋯,Yn​~​produced by Gram Schmidt

Since Y~n\tilde{Y}_{n}Y~n​ is orthogonal to L[Yn∣Y~(n−1)]\mathbb{L}\left[Y_n|\tilde{Y}^{(n-1)}\right]L[Yn​∣Y~(n−1)], they belong to different parts of the subspace formed by Y1,⋯ ,YnY_1,\cdots,Y_nY1​,⋯,Yn​.

Theorem 47

L[X∣Y(n)]=L[X∣Y~n]+L[X∣Y~(n−1)]\mathbb{L}\left[X|Y^{(n)}\right] = \mathbb{L}\left[X|\tilde{Y}_n\right] + \mathbb{L}\left[X|\tilde{Y}^{(n-1)}\right]L[X∣Y(n)]=L[X∣Y~n​]+L[X∣Y~(n−1)]

Note that in general, the LLSE is not the same as the MMSE. However, if XXX and YYY are Jointly Gaussian, then the LLSE does, in fact, equal the MMSE.

Kalman Filtering

Definition 88

A system evolves according to a state space model if the state Xn\boldsymbol{X}_nXn​ at time nnn and observations Yn\boldsymbol{Y}_nYn​ at time nnn are related by

∀n≥0, Xn+1=AXn+Vn∀n≥1, Yn=CXn+Wn\forall n\geq 0,\ \boldsymbol{X}_{n+1} = A\boldsymbol{X}_n + \boldsymbol{V}_n \qquad \forall n\geq 1,\ \boldsymbol{Y}_n=C\boldsymbol{X}_n+\boldsymbol{W}_n∀n≥0, Xn+1​=AXn​+Vn​∀n≥1, Yn​=CXn​+Wn​

where VnV_nVn​ and WnW_nWn​are noise terms.

State space models are flexible and describe a variety of processes. Suppose we want to linearly estimate Xn\boldsymbol{X}_nXn​ from the Yn\boldsymbol{Y}_nYn​ we have seen so far.

Theorem 48

The linear estimate X^n∣n=L[Xn∣Y1,⋯ ,Yn]\hat{\boldsymbol{X}}_{n|n} = \mathbb{L}\left[\boldsymbol{X}_n|\boldsymbol{Y}_1,\cdots,\boldsymbol{Y}_n\right]X^n∣n​=L[Xn​∣Y1​,⋯,Yn​] can be computed recursively via the Kalman Filter.

  1. X^0∣0=0,Σ0∣0=Cov(X0)\hat{\boldsymbol{X}}_{0|0} = 0, \Sigma_{0|0} = \text{Cov}\left(\boldsymbol{X}_0\right)X^0∣0​=0,Σ0∣0​=Cov(X0​).

  2. For n≥1n\geq 1n≥1, update

X^n∣n=AX^n−1∣n−1+KnY~nY~n=Yn−CX^n∣n−1Σn∣n−1=AΣn−1∣n−1AT+ΣV\hat{\boldsymbol{X}}_{n|n} = A\hat{\boldsymbol{X}}_{n-1|n-1} + K_n\tilde{\boldsymbol{Y}}_n \qquad \tilde{\boldsymbol{Y}}_n = Y_n - C\hat{\boldsymbol{X}}_{n|n-1} \qquad \Sigma_{n|n-1} = A\Sigma_{n-1|n-1}A^T+\Sigma_{\boldsymbol{V}}X^n∣n​=AX^n−1∣n−1​+Kn​Y~n​Y~n​=Yn​−CX^n∣n−1​Σn∣n−1​=AΣn−1∣n−1​AT+ΣV​

Kn=Σn∣n−1CT(CΣn∣n−1CT+ΣW)−1Σn∣n=(I−KnC)Σn∣n−1K_n = \Sigma_{n|n-1}C^T(C\Sigma_{n|n-1}C^T+\Sigma_{\boldsymbol{W}})^{-1} \qquad \Sigma_{n|n}=(I - K_nC)\Sigma_{n|n-1}Kn​=Σn∣n−1​CT(CΣn∣n−1​CT+ΣW​)−1Σn∣n​=(I−Kn​C)Σn∣n−1​

Kalman filtering is a simple algorithm which lets us do online, optimal estimation. Variants of it can do things such as prediction or smoothing.

Figure 3: Estimation Setup