If we think of our signal as a discrete time random process, then like a normal deterministic signal, we can try filtering our random process.
Figure 1: Filtering a Disrete Time Random Process with an LTI system with transfer function $H(z)$
Filtering can either be accomplished with an LTI system or some other non-linear/non-time-invariant system just like with deterministic signals.
LTI Filtering on WSS Processes
If we use an LTI filter on a WSS process, then we can easily compute how the filter impacts the spectrum of the signal.
Theorem 13
When
Y(n)
is formed by passing a WSS process
Xn
through a stable LTI system with impulse response
h[n]
and transfer function
H(z)
, then
SY(z)=H(z)SX(z)H∗(z−∗)
and
SYX(z)=H(z)SX(z)
. If we have a third process
Zn
that is jointly WSS with
(Yn,Xn)
, then
SZY(z)=SZX(z)H∗(z−∗)
.
This gives us an interesting interpretation of the spectral factorization (Definition 14) since it essentially passing a WSS process with auto-correlation
RW(k)=reδ[n]
through a minimum-phase filter with transfer function
L(z)
.
Wiener Filter
Suppose we have a stochastic WSS process
Yn
that is jointly WSS with
Xn
and that we want to find the best linear estimator of
Xn
using
Yn
. The best linear estimator of
Xn
given the observations
Yn
can be written as
X^n=∑m∈Zh(m)Yn−m=h[n]∗Yn.
This is identical to passing
Yn
through an LTI filter. If we restrict ourselves to using
{Yi}i=−∞n
to estimate
Xn
, then the best linear estimator can be written as
X^n=∑m=0∞h(m)Yn−m=h[n]∗Yn.
It is identical to passing
Yn
through a causal LTI filter. Since we are trying to find a best linear estimator, it would be nice if each of the random variables we are using for estimating were uncorrelated with each other. In other words, instead of using
Y
directly, we want to transform
Y
into a new process
W
where
RW(k)=δ[k]
. This transformation is known as whitening. From the spectral factorization of
is jointly WSS is given by the non-causal Wiener filter.
H(z)=SY(z)SXY(z)
If we interpret Definition 21 in the frequency domain, for a specific
ω
, we can understand
H(ejω)
as an optimal linear estimator for
FX(ω)
where
FX(ω)
is the the stochastic process given by the Cramer-Khinchin decomposition (Theorem 7). More specifically, we can use the Cramer-Khinchin decomposition of
Intuitively, this should make sense because we are using the same
W
process as in the non-causal case, but only the ones which we are allowed to use, hence use the unilateral Z-transform of the non-causal Wiener filter, which amounts to truncated the noncausal filter to make it causal.
Theorem 14
If
X^NC(n)
is the non-causal Wiener filter of
X
, then the causal wiener filter of
X
given
Y
is the same as the causal wiener filter of
X^NC
given
Y
, and if
Y
is white noise, then
X^C(n)=∑i=0∞h[i]Yn−i
Vector Case
Suppose that instead of a Wide-Sense Stationary process, we an
N
length signal
X
which we want to estimate with another
N
length signal
Y
. We can represent both
X
and
Y
as vectors in
CN
. If we are allowed to use all entries of
Y
to estimate
X
, this is identical to linear estimation.
Definition 23
The non-causal Wiener filter of a finite length
N
signal
Y
is given by
Ks=RXYRY−1.
Note that this requires
RY≻0
. Suppose that we wanted to design a causal filter for the vector case, so
Putting these equations gives us the Viterbi algorithm.
Kalman Filtering
In the Kalman Filter setup, we assume that the signal we would like to filter can be represented by a state-space model. We want to predict the state vectors
X^i
using some linear combination of the observations
Yi
.
Kalman Prediction Filter
Suppose that we want to compute the one-step prediction. In other words, given
Yi
, we want to predict
X^i+1
. Our observations
Y
are the only thing which give us information about the state, so it would be nice if we could de-correlate all of the
Y
. To do this, we can define the innovation process
ei=Yi−Y^i∣i−1=Yi−HiX^i∣i−1
The last equality follows from the state-space modela and that past observation noises are uncorrelated with the current one. Now, to compute the one-step prediction, we just need to project
The second to last equality follows from the Wide-Sense Markovity of state space models, and the last equality is due to the state evolution noises being uncorrelated. If we let
Kp,i=⟨Xi+1,ei⟩Re,i−1
(called the prediction gain), then we have a recursive estimate of the optimal one-step predictor.
X^i+1∣i=FiX^i∣i−1+Kp,iei.
Now, we just need to find a recursive formulation for
Putting this into a concrete algorithm, we get the Kalman Prediction Filter.
Schmidt’s Modification of the Kalman Filter
The predictive Kalman filter goes directly from
X^i∣i−1
to
X^i+1∣i
without ever determining
X^i∣i
. The Schmidt Modification of the Kalman filter separates the predictive kalman filter into two steps, allowing us to estimate the current state.
1.
Measurement Update: Find
X^i∣i
given the latest observation
Yi
and
X^i∣i−1
.
2.
State Evolution (Time) Update: Find
X^i+1∣i
using what we know about the state evolution.
This mimics the approach of the forward algorithm for Hidden Markov Models, which separated updates to the distribution using a time update and a measurement update. Using our innovation process,
The Kalman Prediction Filter and Schmidt’s modification of the Kalman filter are both causal filters. The Kalman Smoother provides a non-causal estimate in the same way that the Backward Recursion algorithm does for Hidden Markov Processes. In other words, the Kalman Smoother predicts
X^i∣n
, the best linear estimator of
X^i
from
{Y0,⋯,YN}
. As before, we can start with the innovation process