Linear Estimation
In Linear Estimation, we are trying to estimate a random variable
using an observation
with a linear function of
. If
is finite dimensional, then we can say
where
is some matrix. Using Theorem 1 and the orthogonality principle, we know that
This is known as the Normal Equation. If
is invertible, then we can apply the inverse to find
. Otherwise, we can apply the pseudoinverse
to find
, which may not be unique. If we want to measure the quality of the estimation, since
,
If we allow ourselves to consider an affine function for estimation
, then this is equivalent to instead finding an estimator
This is equivalent to the following orthogonality conditions:
- 1.
- 2.
Solving gives us
and
are the auto-covariance and cross-covariance respectively. Recall that if
then
Thus in the Joint Gaussian case, the mean of the conditional distribution is the best affine estimator of
using
, and the covariance is the estimation error. This has two interpretations.
- 1.Under the Gaussian assumption, the best nonlinear estimatoris affine
- 2.Gaussian random variables are the hardest predict because nonlinearity should improve our error, but it does not in the Gaussian case. This means if affine estimation works well, we shouldn’t try and find better non-linear estimators.
The theory of linear estimation is very closely connected with the theory behind least squares in linear algebra. In least squares, we have a deterministic
and assume nothing else about it, meaning we are looking for an unbiased estimator. Theorem 2 tells us how to find the best linear unbiased estimator in a linear setting.
Suppose we extend the least squares setup to allow a stochastic, but fixed,
where
. At each timestep, we receive observations of
such that
where
and
. Define
Then our setup becomes
.
Applying Theorem 1 and solving the normal equation, we see
Suppose we want to do this in an online fashion where at each timestep
, we only use the current
and our previous estimate
. Let
. Then
By applying the Sherman-Morrison-Woodbury identity, we can see that
Notice that this formula scales an innovation in order to improve the current estimate of
.
Just as we could compute a recursive update, we can also compute a “downdate” where we forget a particular observation. More concretely, we want to use
to find
, the best linear estimator of
using
. Defining
,
Applying the Sherman-Morrison-Woodbury identity,
Last modified 1yr ago