If we allow ourselves to consider an affine function for estimation
, then this is equivalent to instead finding an estimator
X^(Y′)=WY′ where Y′=[Y1]
This is equivalent to the following orthogonality conditions:
Solving gives us
X^(Y)=W(Y−μY)+μx where WΣY=ΣXY.
are the auto-covariance and cross-covariance respectively. Recall that if
Thus in the Joint Gaussian case, the mean of the conditional distribution is the best affine estimator of
, and the covariance is the estimation error. This has two interpretations.
Under the Gaussian assumption, the best nonlinear estimator
Gaussian random variables are the hardest predict because nonlinearity should improve our error, but it does not in the Gaussian case. This means if affine estimation works well, we shouldn’t try and find better non-linear estimators.
The theory of linear estimation is very closely connected with the theory behind least squares in linear algebra. In least squares, we have a deterministic
and assume nothing else about it, meaning we are looking for an unbiased estimator. Theorem 2 tells us how to find the best linear unbiased estimator in a linear setting.
Theorem 2 (Gauss Markov Theorem)
is zero-mean with
is full-column rank, then
is the best linear unbiased estimator.
Recursive Least Squares
Suppose we extend the least squares setup to allow a stochastic, but fixed,