# Filtering

If we think of our signal as a discrete time random process, then like a normal deterministic signal, we can try filtering our random process.

![Figure 1: Filtering a Disrete Time Random Process with an LTI system with transfer function $H(z)$](https://3896527066-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MQYqMkpf2WPiP6MKWRe%2Fuploads%2Fgit-blob-f4a2fa0bfb220607637bf95230698b862c01d6b6%2F4517d99607bd6f574c5e4e874cab287c0797e1ab.png?alt=media)

Filtering can either be accomplished with an LTI system or some other non-linear/non-time-invariant system just like with deterministic signals.

## LTI Filtering on WSS Processes

If we use an LTI filter on a WSS process, then we can easily compute how the filter impacts the spectrum of the signal.

{% hint style="info" %}

#### Theorem 13

When $$Y(n)$$ is formed by passing a WSS process $$X\_n$$ through a stable LTI system with impulse response $$h\[n]$$ and transfer function $$H(z)$$, then $$S\_Y(z) = H(z)S\_X(z)H^*(z^{-*})$$ and $$S\_{YX}(z) = H(z)S\_X(z)$$. If we have a third process $$Z\_n$$ that is jointly WSS with $$(Y\_n, X\_n)$$, then $$S\_{ZY}(z) = S\_{ZX}(z)H^*(z^{-*})$$.
{% endhint %}

This gives us an interesting interpretation of the spectral factorization (Definition 14) since it essentially passing a WSS process with auto-correlation $$R\_W(k) = r\_e\delta\[n]$$ through a minimum-phase filter with transfer function $$L(z)$$.

## Wiener Filter

Suppose we have a stochastic WSS process $$Y\_n$$ that is jointly WSS with $$X\_n$$ and that we want to find the best linear estimator of $$X\_n$$ using $$Y\_n$$. The best linear estimator of $$X\_n$$ given the observations $$Y\_n$$ can be written as

$$\hat{X}*n = \sum*{m\in\mathbb{Z}}h(m)Y\_{n-m} = h\[n] \* Y\_n.$$

This is identical to passing $$Y\_n$$ through an LTI filter. If we restrict ourselves to using $${Y\_i}\_{i=-\infty}^{n}$$ to estimate $$X\_n$$, then the best linear estimator can be written as

$$\hat{X}*n = \sum*{m=0}^\infty h(m)Y\_{n-m} = h\[n] \* Y\_n.$$

It is identical to passing $$Y\_n$$ through a causal LTI filter. Since we are trying to find a best linear estimator, it would be nice if each of the random variables we are using for estimating were uncorrelated with each other. In other words, instead of using $$Y$$ directly, we want to transform $$Y$$ into a new process $$W$$ where $$R\_W(k) = \delta\[k]$$. This transformation is known as whitening. From the spectral factorization of $$Y$$, we know if we use the filter $$G(z) =\frac{1}{S\_Y^+(z)}$$ then

$$S\_W(z) = \frac{S\_Y(z)}{S\_Y^+(z)S\_Y^{+*}(z^{-*})} = \frac{S\_Y(z)}{S\_Y^+(z)S\_Y^-(z)} = 1.$$

Now we want to find the best linear estimator of $$X$$ using our new process $$W$$ by designing an LTI filter $$Q(z)$$.

![Figure 2: Finding the best linear estimator of $X$ using $W$ with a two-stage filter that first whitens the input.](https://3896527066-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MQYqMkpf2WPiP6MKWRe%2Fuploads%2Fgit-blob-771e0ab5524a08faf05998f4579ffbff9a2c37a8%2F1576d92fd9c4f44445aa6982cfd12933f45f528a.png?alt=media)

### Non-Causal Case

Starting with noncausal case, we can apply the orthogonality principle,

$$\begin{aligned} \mathbb{E}\left\[(X\_n-\hat{X}*n)W*{n-k}^*\right] = 0 &\implies \mathbb{E}\left\[X\_nW^**{n-k}\right] = \sum*{m\in\mathbb{Z}}q(m)\mathbb{E}\left\[W\_{n-m}W^**{n-k}\right] \ \therefore R*{XW}(k) = \sum\_{m\in\mathbb{Z}}q(m)R\_W(k-m) &\implies S\_{XW}(z) = Q(z)S\_W(z)\ \therefore Q(z) = \frac{S\_{XW}(z)}{S\_W(z)} = S\_{XW}(z) &= S\_{XY}(z)(S\_Y^+(z^{-*}))^{-\*} = \frac{S\_{XY}(z)}{S\_Y^-(z)}\end{aligned}$$

When we cascade these filters,

$$H(z) = Q(z)G(z) = \frac{S\_{XY}(z)}{S\_Y^-(z)} \frac{1}{S\_Y^+(z)}= \frac{S\_{XY}(z)}{S\_Y(z)}.$$

{% hint style="info" %}

#### Definition 21

The best linear estimator of $$X\_n$$ using $$Y\_n$$ where $$(X\_n, Y\_n)$$ is jointly WSS is given by the non-causal Wiener filter.

$$H(z) = \frac{S\_{XY}(z)}{S\_Y(z)}$$
{% endhint %}

If we interpret Definition 21 in the frequency domain, for a specific $$\omega$$, we can understand $$H(e^{j\omega})$$ as an optimal linear estimator for $$F\_X(\omega)$$ where $$F\_X(\omega)$$ is the the stochastic process given by the Cramer-Khinchin decomposition (Theorem 7). More specifically, we can use the Cramer-Khinchin decomposition of $$Y\_n$$.

$$\begin{aligned} \hat{X}*n &= \sum*{i\in\mathbb{Z}}h\[i]\int\_{-\pi}^\pi e^{j\omega(n-i)}dF\_Y(\omega)\ &= \int\_{-\pi}^{\pi}\left(\sum\_{i\in\mathbb{Z}}h\[i]e^{-j\omega i}\right)e^{j\omega n}dF\_Y(\omega) \ &= \int\_{-\pi}^\pi H(e^{j\omega})e^{j\omega n}dF\_Y(\omega)\\\end{aligned}$$

Since $$F\_X$$ and $$F\_Y$$ have jointly orthogonal increments, this tells us that $$H(e^{j\omega})$$ is just the optimal linear estimator of $$dF\_X(\omega)$$ using $$dF\_Y(\omega)$$. $$dF\_X(\omega)$$ and $$dF\_Y(\omega)$$ exist on a Hilbert space, meaning we are essentially projecting each frequency component of $$X\_n$$ onto the corresponding frequency component of $$Y\_n$$.

### Causal Case

First, note that in the causal case, whitening doesn’t break causality because $$\frac{1}{S\_Y^+(z)}$$ is causal. When we apply the orthogonality principle,

$$\begin{aligned} \mathbb{E}\left\[(X\_n-\hat{X}*n)W*{n-k}^*\right] = 0 &\implies \mathbb{E}\left\[X\_nW^**{n-k}\right] = \sum*{m=0}^\infty q(m)\mathbb{E}\left\[W\_{n-m}W^\**{n-k}\right] \ \therefore R*{XW}(k) &= \sum\_{m = 0}^\infty q\[m]R\_W(k-m) \qquad k \geq 0\end{aligned}$$

We can’t take the Z-transform of both sides because the equation is not necessarily true for $$k < 0$$. Instead, we can look at the function

$$f(k) = R\_{XW}(k) - \sum\_{m=0}^\infty R\_W(k-m)q\[m] = \begin{cases} 0 & k\geq 0,\ ? & \text{ else.}\end{cases}$$

Taking the unilateral Z-transform of both sides,

$$\begin{aligned} \left\[F(z)\right]*+ &= \left\[S*{XW}(z) - S\_W(z)Q(z)\right]*+ = \left\[S*{XW}(z)\right]*+ - Q(z) = 0\ Q(z) &= \left\[S*{XW}(z)\right]*+ = \left\[\frac{S*{XY}(z)}{S\_Y^-(z)}\right]\_+ \end{aligned}$$

Thus the filter $$H$$ which gives the causal best linear estimator of $$X$$ using $$Y$$ is

$$H(z) = Q(z) G(z)= \left\[\frac{S\_{XY}(z)}{S\_Y^-(z)}\right]\_+ \frac{1}{S\_Y^+(z)}.$$

{% hint style="info" %}

#### Definition 22

The best linear estimator of $$X\_n$$ using $${Y\_i}\_{i=-\infty}^{n}$$ is given by the causal Wiener filter.

$$H(z) = Q(z)G(z) = \left\[\frac{S\_{XY}(z)}{S\_Y^-(z)}\right]\_+ \frac{1}{S\_Y^+(z)}.$$
{% endhint %}

Intuitively, this should make sense because we are using the same $$W$$ process as in the non-causal case, but only the ones which we are allowed to use, hence use the unilateral Z-transform of the non-causal Wiener filter, which amounts to truncated the noncausal filter to make it causal.

{% hint style="info" %}

#### Theorem 14

If $$\hat{X}*{NC}(n)$$ is the non-causal Wiener filter of $$X$$, then the causal wiener filter of $$X$$ given $$Y$$ is the same as the causal wiener filter of $$\hat{X}*{NC}$$ given $$Y$$, and if $$Y$$ is white noise, then

$$\hat{X}*C(n) = \sum*{i=0}^{\infty}h\[i]Y\_{n-i}$$
{% endhint %}

### Vector Case

Suppose that instead of a Wide-Sense Stationary process, we an $$N$$ length signal $$\boldsymbol{X}$$ which we want to estimate with another $$N$$ length signal $$\boldsymbol{Y}$$. We can represent both $$\boldsymbol{X}$$ and $$\boldsymbol{Y}$$ as vectors in $$\mathbb{C}^N$$. If we are allowed to use all entries of $$\boldsymbol{Y}$$ to estimate $$\boldsymbol{X}$$, this is identical to linear estimation.

{% hint style="info" %}

#### Definition 23

The non-causal Wiener filter of a finite length $$N$$ signal $$\boldsymbol{Y}$$ is given by

$$K\_s = R\_{\boldsymbol{X}\boldsymbol{Y}}R\_{\boldsymbol{Y}}^{-1}.$$
{% endhint %}

Note that this requires $$R\_{\boldsymbol{Y}} \succ 0$$. Suppose that we wanted to design a causal filter for the vector case, so $$\hat{X}*i$$ only depends on $${Y\_j}*{j=1}^i$$. By the orthogonality principle,

$$\forall 1 \leq l \leq i,\ \mathbb{E}\left\[X\_i - \sum\_{j=1}^iK\_{f, ij}Y\_jY\_l^\*\right] = 0 \implies R\_{\boldsymbol{XY}}(i, l) = \sum\_{j=1}^i K\_{f, ij}R\_{\boldsymbol{Y}}(j, l)$$

In matrix form, this means

$$R\_{\boldsymbol{XY}} - K\_fR\_{\boldsymbol{Y}} = U^+$$

where $$U^+$$ is strictly upper triangular.

{% hint style="info" %}

#### Theorem 15

If matrix $$H \succ 0$$, then there exists a unique lower-diagonal upper triangular factorization of $$H=LDL^\*$$ where $$L$$ is lower diagonal and invertible with unit diagonal entries and $$D$$is diagonal with positive entries.
{% endhint %}

Applying the LDL decomposition, we see that

$$\begin{aligned} R\_{\boldsymbol{XY}} - K\_fLDL^\* = U^+ \implies R\_{\boldsymbol{XY}}L^{-*}D^{-1} -K\_f L = U^+L^{-*}D^{-1}\ \therefore \[R\_{\boldsymbol{XY}}L^{-\*}D^{-1}]\_L -K\_f L = 0\end{aligned}$$

where $$\[\cdot]\_L$$ represent the lower triangular part of a matrix.

{% hint style="info" %}

#### Definition 24

The causal Wiener filter of a finite length $$N$$ signal $$\boldsymbol{Y}$$ is given by

$$K\_f = \[R\_{\boldsymbol{XY}}L^{-\*}D^{-1}]\_LL^{-1}$$
{% endhint %}

## Hidden Markov Model State Estimation

Suppose we have a Hidden Markov Process $${Y\_n}*{n\geq1}$$. We can think of determining the state $${X\_n}*{n\geq1}$$ as filtering $${Y\_n}\_{n\geq1}$$.

### Causal Distribution Estimation

Suppose we want to know the distribution of $$X\_t$$ after we have observered $$Y^t$$.

$$\begin{aligned} p(x\_t|y^t) &= \frac{p(x\_t,y^t)}{p(y^t)} = \frac{p(x\_t)p(y\_t, y^{t-1}|x\_t)}{\sum\_{x}p(y\_t,y^{t-1}|x\_t=x)p(x\_t=x)}\ &= \frac{p(x\_t)p(y\_t|x\_t)p(y^{t-1}|x\_t)}{\sum\_xp(y\_t|x\_t=x)p(y^{t-1}|x\_t=x)p(x\_t=x)} = \frac{p(y\_t|x\_t)p(y^{t-1})p(x\_t|y^{t-1})}{\sum\_{x}p(y\_t|x\_t=x)p(y^{t-1})p(x\_t=x|y^{t-1})}\ &=\frac{p(y\_t|x\_t)p(x\_t|y^{t-1})}{\sum\_{x}p(y\_t|x\_t=x)p(x\_t=x|y^{t-1})}\end{aligned}$$

Now if we know $$p(x\_t|y^{t-1})$$, then we are set.

$$\begin{aligned} p(x\_t|y^{t-1}) &= \sum\_xp(x\_t,x\_{t-1}=x|y^{t-1}) = \sum\_x p(x\_{t-1}=x|y^{t-1})p(x\_t|x\_{t-1}=x,y^{t-1}) \ &= \sum\_x p(x\_{t-1}=x|y^t)p(x\_t|x\_{t-1}=x)\end{aligned}$$

Now we have a recursive algorithm for computing the distribution of $$x\_t$$.

![](https://3896527066-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MQYqMkpf2WPiP6MKWRe%2Fuploads%2Fgit-blob-eb685e58f95ce6ef026f629515ddd919b5053763%2F7574f1909060630e512aecd393c7ac61dd2dc5c1.png?alt=media)

### Non-Causal Distribution Estimation

Suppose we are allowed to non-causally filter our signal and we care about the distribution of $$X\_t$$ after we have observed $$Y^n$$. In other words, for $$t \geq n$$, we want to find $$\gamma\_t(x\_t) = p(x\_t|y^n)$$. When $$t=n$$, $$\gamma\_n(x\_n) = \alpha\_n(x\_n)$$. If we continue expanding backwards, then

$$\begin{aligned} p(x\_t|y^n) &= \sum\_x p(x\_t,x\_{t+1}=x|y^n) = \sum\_x p(x\_{t+1}=x|y^n)p(x\_t|x\_{t+1}=x,y^t,y\_{t+1}^n)\ &= \sum\_x p(x\_{t+1}=x|y^n)p(x\_t|x\_{t+1},y^t) = \sum\_x p(x\_{t+1}=x|y^n)\frac{p(x\_t|y^t)p(x\_{t+1}=x|x\_t,y^t)}{p(x\_{t+1}=x|y^t)}\ &= \sum\_x \gamma\_{t+1}(x)\frac{\alpha\_t(x\_t)p(x\_{t+1}=x|x\_t)}{\beta\_{t+1}(x)}\end{aligned}$$

This gives us a clear algorithm for non-causally computing the distribution of $$x\_t$$.

![](https://3896527066-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MQYqMkpf2WPiP6MKWRe%2Fuploads%2Fgit-blob-90606ff1431c4ab80435aac9c50c723baa2cfbe4%2F67f0bfc8995a36be138cd970660ac00ac9fbce40.png?alt=media)

### State Sequence Estimation

Suppose we want to find the most likely sequence of states given our observations. This means we should compute

$$\hat{X}^n = \text{argmax}\_{X^n}p(x^n|y^n)$$

$$\begin{aligned} p(x^t, y^t) &= p(x^{t-1}, y^{t-1})p(x\_t, y\_t|x^{t-1},y^{t-1})\ &= p(x^{t-1}, y^{t-1})p(x\_t|x^{t-1},y^{t-1})p(y\_t|x\_t,x^{t-1},y^{t-1}) \ &= p(x^{t-1},y^{t-1})p(x\_t|x\_{t-1})p(y\_t|x\_t)\end{aligned}$$

We see that there is a recursion in the joint distribution, so if we let $$V\_t(x\_t) = \max\_{x^{t-1}}p(x^t,y^t)$$, then

$$\begin{aligned} V\_t(x\_t) &= \max\_{x^{t-1}} p(x^t, y^t) = p(y\_t|x\_t)\max\_{x^{t-1}}p(x^{t-1},y^{t-1})p(x\_t|x\_{t-1})\ &= p(y\_t|x\_t)\max\_{x^{t-1}}\left\[p(x\_t|x\_{t-1}) \max\_{x^{t-2}} p(x^{t-1},y^{t-1})\right]\ &= p(y\_t|x\_t)\max\_{x^{t-1}}p(x\_t|x\_{t-1}) V\_{t-1}(x\_{t-1})\end{aligned}$$

The base case is that $$V\_1(x\_1) = p(x\_1)p(y\_1|x\_1)$$. $$V\_t$$ is useful because $$\hat{x}*n = \text{argmax}*{x\_n}V\_n(x\_n)$$. This is because we can first maximize over $$\hat{X}^{n-1}$$ and $$Y^n$$, so the only thing left to maximize is $$\hat{x}\_n$$. Once we have $$\hat{x}*t$$, then we can comptue $$\hat{x}*{t-1}$$ by

$$\hat{x}*{t-1} = \text{argmax}*{x\_{t-1}}p(\hat{x}*t|x*{t-1})V\_{t-1}(x\_{t-1}).$$

Putting these equations gives us the Viterbi algorithm.

![](https://3896527066-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MQYqMkpf2WPiP6MKWRe%2Fuploads%2Fgit-blob-fd1698ee580aec17cbe11f4406fc842a6fedc704%2F94adeeecc377d7b851af46f837e92eaa2a664c37.png?alt=media)

## Kalman Filtering

In the Kalman Filter setup, we assume that the signal we would like to filter can be represented by a state-space model. We want to predict the state vectors $$\boldsymbol{\hat{X}}\_i$$ using some linear combination of the observations $$\boldsymbol{Y}\_i$$.

### Kalman Prediction Filter

Suppose that we want to compute the one-step prediction. In other words, given $$\boldsymbol{Y}^i$$, we want to predict $$\boldsymbol{\hat{X}}\_{i+1}$$. Our observations $$\boldsymbol{Y}$$ are the only thing which give us information about the state, so it would be nice if we could de-correlate all of the $$\boldsymbol{Y}$$. To do this, we can define the innovation process

$$\boldsymbol{e}\_i = \boldsymbol{Y}*i - \boldsymbol{\hat{Y}*{i|i-1}} = \boldsymbol{Y}*i - H\_i\boldsymbol{\hat{X}}*{i|i-1}$$

The last equality follows from the state-space modela and that past observation noises are uncorrelated with the current one. Now, to compute the one-step prediction, we just need to project $$\boldsymbol{\hat{X}}\_i$$ onto the innovations.

$$\begin{aligned} \boldsymbol{\hat{X}}*{i+1|i} &= \sum*{j=0}^i\langle \boldsymbol{X}*{i+1}, \boldsymbol{e}*j \rangle R*{\boldsymbol{e},j}^{-1}\boldsymbol{e}*j \ &= \langle \boldsymbol{X}*{i+1}, \boldsymbol{e}*{i} \rangle R\_{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i + \sum*{j=0}^{i-1}\langle \boldsymbol{X}*{i+1}, \boldsymbol{e}*j \rangle R*{\boldsymbol{e},j}^{-1}\boldsymbol{e}*j\ &= \langle \boldsymbol{X}*{i+1}, \boldsymbol{e}*{i} \rangle R\_{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i + \boldsymbol{\hat{X}}*{i+1|i-1} = \langle \boldsymbol{X}*{i+1}, \boldsymbol{e}*{i} \rangle R\_{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i + \boldsymbol{\hat{X}}*{i+1|i}\ &= \langle \boldsymbol{X}*{i+1}, \boldsymbol{e}*{i} \rangle R\_{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i + F\_i\boldsymbol{\hat{X}}*{i|i-1}\end{aligned}$$

The second to last equality follows from the Wide-Sense Markovity of state space models, and the last equality is due to the state evolution noises being uncorrelated. If we let $$K\_{p,i} = \langle \boldsymbol{X}\_{i+1}, \boldsymbol{e}*i \rangle R*{\boldsymbol{e},i}^{-1}$$ (called the prediction gain), then we have a recursive estimate of the optimal one-step predictor.

$$\boldsymbol{\hat{X}}*{i+1|i} = F\_i\boldsymbol{\hat{X}}*{i|i-1} + K\_{p,i}\boldsymbol{e}\_i.$$

Now, we just need to find a recursive formulation for $$K\_{p,i}$$ and $$R\_{\boldsymbol{e},i}$$. Starting with $$R\_{\boldsymbol{e},i}$$, notice that we can write $$\boldsymbol{e}\_i = \boldsymbol{Y}*i-H\_i\boldsymbol{\hat{X}}*{i|i-1} = H\_i(\boldsymbol{X}*i-\boldsymbol{\hat{X}}*{i|i-1})+\boldsymbol{V}\_i$$.

$$\begin{aligned} R\_{\boldsymbol{e},i} &= \langle H\_i(\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1})+ \boldsymbol{V\_i}, H\_i(\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1})+\boldsymbol{V\_i} \rangle \ &= H\_i\langle \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1}, \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1} \rangle H\_i^\* + R\_i\end{aligned}$$

To find $$K\_{p,i}$$, we should first find $$\langle \boldsymbol{X}\_{i+1}, \boldsymbol{e}\_i \rangle$$.

$$\begin{aligned} \langle \boldsymbol{X}\_{i+1}, \boldsymbol{e}\_i \rangle &= F\_i\langle \boldsymbol{X}\_i, \boldsymbol{e}\_i \rangle + G\_i\langle \boldsymbol{U}\_i, \boldsymbol{e}\_i \rangle \ &= F\_i\langle \boldsymbol{X}\_i, H\_i(\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1})+\boldsymbol{V}\_i \rangle + \langle \boldsymbol{U}\_i, H\_i(\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1})+\boldsymbol{V}\_i \rangle \ &= F\_i\langle \boldsymbol{X}*i, \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1} \rangle H\_i^\* + G\_iS\_i\ &= F\_i\langle (\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1}) + \boldsymbol{\hat{X}}*{i|i-1}, \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1} \rangle H\_i^\* + G\_iS\_i\ &= F\_i\langle \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1}, \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1} \rangle H\_i^\* + G\_iS\_i\end{aligned}$$

Notice that the matrix $$P\_i = \langle \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1}, \boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1} \rangle$$is the auto-correlation of the estimation error, and it shows up in both $$K\_{p,i}$$ and $$R\_{\boldsymbol{e}\_i}$$. It would be useful to have a recursive solution for this matrix as well.

$$\begin{aligned} P\_{i+1} &= \Pi\_{i+1}-\langle \boldsymbol{\hat{X}}*{i+1|i}, \boldsymbol{\hat{X}}*{i+1|i} \rangle \ &= F\_i\Pi\_iF\_i^\* + G\_iQ\_iG\_i^\* - \langle F\_i\boldsymbol{\hat{X}}*{i|i-1}+K*{p,i}\boldsymbol{e}*i, F\_i\boldsymbol{\hat{X}}*{i|i-1}+K\_{p,i}\boldsymbol{e}*i \rangle \ &= F\_i\Pi\_iF\_i^\* + G\_iQ\_iG\_i^\* - F\_i\langle \boldsymbol{\hat{X}}*{i|i-1}, \boldsymbol{\hat{X}}*{i|i-1} \rangle F^\*+K*{p,i}R\_{\boldsymbol{e},i}K\_{p,i}^*\ &= F\_iP\_iF\_i^* + G\_iQ\_iG\_i^\* - K\_{p,i}R\_{\boldsymbol{e},j}K\_{p,i}^\*\end{aligned}$$

Putting this into a concrete algorithm, we get the Kalman Prediction Filter.

### Schmidt’s Modification of the Kalman Filter

The predictive Kalman filter goes directly from $$\boldsymbol{\hat{X}}*{i|i-1}$$ to $$\boldsymbol{\hat{X}}*{i+1|i}$$ without ever determining $$\boldsymbol{\hat{X}}\_{i|i}$$. The Schmidt Modification of the Kalman filter separates the predictive kalman filter into two steps, allowing us to estimate the current state.

1. Measurement Update: Find $$\boldsymbol{\hat{X}}*{i|i}$$ given the latest observation $$\boldsymbol{Y}*{i}$$ and $$\boldsymbol{\hat{X}}\_{i|i-1}$$.
2. State Evolution (Time) Update: Find $$\boldsymbol{\hat{X}}\_{i+1|i}$$ using what we know about the state evolution.

This mimics the approach of the forward algorithm for Hidden Markov Models, which separated updates to the distribution using a time update and a measurement update. Using our innovation process,

$$\begin{aligned} \boldsymbol{\hat{X}}*{i|i} &= \sum*{j=0}^i \langle \boldsymbol{X}*i, \boldsymbol{e}*j \rangle R*{\boldsymbol{e},j}^{-1}\boldsymbol{e}*j \ &= \boldsymbol{\hat{X}}*{i|i-1} + \langle \boldsymbol{X}*i, \boldsymbol{e}*j \rangle R*{\boldsymbol{e},i}^{-1}\boldsymbol{e}^j\ &= \boldsymbol{\hat{X}}*{i|i-1} + \langle (\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1}) + \boldsymbol{\hat{X}}*{i|i-1}, H\_i(\boldsymbol{X}*i - \boldsymbol{\hat{X}}*{i|i-1}) + \boldsymbol{V}*i \rangle R*{\boldsymbol{e},i}^{-1}\boldsymbol{e}^j\ &= \boldsymbol{\hat{X}}*{i|i-1} + P\_iH\_i^\*R*{\boldsymbol{e},i}^{-1}\boldsymbol{e}\_i\end{aligned}$$

The gain on the coefficient of the innovation $$K\_{f,i}=P\_iH\_i^\*R\_{\boldsymbol{e},i}$$ is called the Kalman Filter Gain. The error of our estimator $$P\_{i|i} = \langle \boldsymbol{\hat{X}}*{i|i}, \boldsymbol{\hat{X}}*{i|i} \rangle$$is given by

$$P\_{i|i} = P\_i - P\_iH\_i^\*R\_{\boldsymbol{e},i}^{-1}H\_iP\_i.$$

For the time update,

$$\begin{aligned} \boldsymbol{\hat{X}}*{i+1|i} &= F\_i\boldsymbol{\hat{X}}*{i|i} + G\_i\boldsymbol{\hat{U}}*{i|i} \ &= F\_i\boldsymbol{\hat{X}}*{i|i} + G\_i\langle \boldsymbol{U}\_i, \boldsymbol{e}*i \rangle R*{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i = F\_i\boldsymbol{\hat{X}}*{i|i} + G\_i\langle \boldsymbol{U}\_i, \boldsymbol{e}*i \rangle R*{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i\ &= F\_i\boldsymbol{\hat{X}}*{i|i} + G\_i\langle \boldsymbol{U}*i, H\boldsymbol{X}*i+\boldsymbol{V}*i - H\_i\boldsymbol{\hat{X}}*{i|i-1} \rangle R*{\boldsymbol{e},i}^{-1}\boldsymbol{e}*i\ &= F\_i\boldsymbol{\hat{X}}*{i|i} + G\_iS\_iR*{\boldsymbol{e},i}^{-1}\boldsymbol{e}\_i\end{aligned}$$

We can re-write the error if this estimator $$P\_{i+1}$$ as

$$P\_{i+1} = F\_iP\_{i|i}F\_i^\* + G\_i(Q\_i - S\_iR\_{\boldsymbol{e},i}^{-1}S\_i^*)G\_i^* - F\_iK\_{f,i}S\_i^*G\_i^* - G\_iS\_iK\_{f,i}^*F\_i^*$$

Writing this as an algorithm,

### Kalman Smoother

The Kalman Prediction Filter and Schmidt’s modification of the Kalman filter are both causal filters. The Kalman Smoother provides a non-causal estimate in the same way that the Backward Recursion algorithm does for Hidden Markov Processes. In other words, the Kalman Smoother predicts $$\hat{\boldsymbol{X}}\_{i|n}$$, the best linear estimator of $$\hat{\boldsymbol{X}}\_i$$ from $${\boldsymbol{Y}\_0, \cdots, \boldsymbol{Y}\_N}$$. As before, we can start with the innovation process $$\boldsymbol{e}\_j = \boldsymbol{Y}*j - H\_j\hat{\boldsymbol{X}}*{j|j-1}$$.

$$\hat{\boldsymbol{X}}*{i|N} = \sum*{j=0}^N \langle \boldsymbol{X}*i, \boldsymbol{e}*j \rangle R*{e,j}^{-1}\boldsymbol{e}*j = \hat{\boldsymbol{X}}*{i|i-1} + \sum*{j=i}^N \langle X\_i, \boldsymbol{e}*j \rangle R*{e,j}^{-1}\boldsymbol{e}\_j.$$

We can compute $$\hat{\boldsymbol{X}}\_{i|i-1}$$ using the Predictive Kalman filter, so we just need to compute the $$\langle \boldsymbol{X}\_i, \boldsymbol{e}*j \rangle R*{e,j}^{-1}\boldsymbol{e}\_j$$.

$$\begin{aligned} \langle \boldsymbol{X}\_i, \boldsymbol{e}\_j \rangle &= \langle \boldsymbol{X}\_i, H\_j(\boldsymbol{X}*i - \hat{\boldsymbol{X}}*{j|j-1}) + \boldsymbol{V}\_j \rangle = \langle \boldsymbol{X}*i, H\_j(\boldsymbol{X}*j - \hat{\boldsymbol{X}}*{j|j-1}) \rangle \ &= \langle \hat{\boldsymbol{X}}*{i|i-1} + (\boldsymbol{X}*i - \hat{\boldsymbol{X}}*{i|i-1}), \boldsymbol{X}*j - \hat{\boldsymbol{X}}*{j|j-1} \rangle H\_j^\*\end{aligned}$$

$$\boldsymbol{X}*j - \hat{\boldsymbol{X}}*{j|j-1}$$ is orthgonal to any linear function of $${\boldsymbol{Y}*0, \cdots, \boldsymbol{Y}*{j-1}}$$, so when $$j \geq i$$, it must be orthgonal to $$\hat{\boldsymbol{X}}\_{i|i-1}$$ since it is a function of $${\boldsymbol{Y}\_k}^{i-1}\_0$$. Thus, for $$j\geq i$$,

$$\langle \boldsymbol{X}\_i, \boldsymbol{e}\_j \rangle = \langle \boldsymbol{X}*i - \hat{\boldsymbol{X}}*{i|i-1}, \boldsymbol{X}*j - \hat{\boldsymbol{X}}*{j|j-1} \rangle H\_j^\*$$

If we denote $$P\_{ij} = \langle \boldsymbol{X}*i - \hat{\boldsymbol{X}}*{i|i-1}, \boldsymbol{X}*j - \hat{\boldsymbol{X}}*{j|j-1} \rangle$$, then

$$P\_{ij} = P\_{i}\Phi\_{p}^\*(j, i) = P\_{i} \begin{cases} I & \text{ if } j = i\ \prod\_{k=i}^{j-1} F\_{p,k} & \text{ if } j > i \end{cases}$$

where $$F\_{p,k} = (F\_k - K\_{p, k}H\_k)$$. This gives us the expression

$$\hat{\boldsymbol{X}}*{i|N} = \hat{\boldsymbol{X}}*{i|i-1} + P\_i \sum\_{j=i}^N \Phi\_p^\*(j, i)H\_i^\*R\_{e,j}^{-1}\boldsymbol{e}\_j.$$

If we let $$\lambda\_{i|N} = \sum\_{j=i}^N \Phi\_p^\*(j, i)H\_i^\*R\_{e,j}^{-1}\boldsymbol{e}\_j$$, then we get the recursion

$$\lambda{i|N} = F\_{p,i}^\*\lambda\_{i+1|N} + H\_i^\*R\_{e, i}^{-1}\boldsymbol{e}\_i.$$

If we want to look at the error of this estimator, we see that

$$P\_{i|N} = P\_i - P\_i \Lambda\_{i|N} P\_i \text{ where } \Lambda\_{i|N} = \langle \lambda\_{i|N}, \lambda\_{i|N} \rangle = F\_{p,i}^\*\Lambda\_{i+1|N}F\_{p,i}+H\_i^\*R\_{e, i}^{-1}H\_i.$$

Writing all of this as an algorithm,


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aparande.gitbook.io/berkeley-notes/eecs225a-0/eecs225a-4.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
