Berkeley Notes
  • Introduction
  • EE120
    • Introduction to Signals and Systems
    • The Fourier Series
    • The Fourier Transform
    • Generalized transforms
    • Linear Time-Invariant Systems
    • Feedback Control
    • Sampling
    • Appendix
  • EE123
    • The DFT
    • Spectral Analysis
    • Sampling
    • Filtering
  • EECS126
    • Introduction to Probability
    • Random Variables and their Distributions
    • Concentration
    • Information Theory
    • Random Processes
    • Random Graphs
    • Statistical Inference
    • Estimation
  • EECS127
    • Linear Algebra
    • Fundamentals of Optimization
    • Linear Algebraic Optimization
    • Convex Optimization
    • Duality
  • EE128
    • Introduction to Control
    • Modeling Systems
    • System Performance
    • Design Tools
    • Cascade Compensation
    • State-Space Control
    • Digital Control Systems
    • Cayley-Hamilton
  • EECS225A
    • Hilbert Space Theory
    • Linear Estimation
    • Discrete Time Random Processes
    • Filtering
  • EE222
    • Real Analysis
    • Differential Geometry
    • Nonlinear System Dynamics
    • Stability of Nonlinear Systems
    • Nonlinear Feedback Control
Powered by GitBook
On this page
  • Wide-Sense Stationary Random Processes
  • Spectral Density
  • Z-Spectrum
  • Markov Processes
  • Hidden Markov Processes
  • State-Space Models

Was this helpful?

  1. EECS225A

Discrete Time Random Processes

PreviousLinear EstimationNextFiltering

Last updated 3 years ago

Was this helpful?

Definition 4

A Discrete-Time Random Process is a countably infinite collection of random variables on the same probability space {Xn:n∈Z}\{X_n: n\in\mathbb{Z}\}{Xn​:n∈Z}.

Discrete Time Random Processes have a mean function μn=E[Xn]\mu_n = \mathbb{E}\left[X_n\right] μn​=E[Xn​]and an auto-correlation function RX(n1,n2)=E[Xn1Xn2∗]R_X(n_1, n_2) = \mathbb{E}\left[X_{n_1}X_{n_2}^*\right]RX​(n1​,n2​)=E[Xn1​​Xn2​∗​]

Wide-Sense Stationary Random Processes

Definition 5

A Wide-Sense Stationary Random Process is a disrete-time random process with constant mean, finite variance, and an autocorrelation function that can be re-written to only depend on n1−n2n_1-n_2n1​−n2​.

We call this wide-sense stationary because the mean and covariance do not change as the process evolves. In a strict-sense stationary process, the distribution of each random variable in the process would not change.

Definition 6

A WSS process Z∼WN(0,σ2)Z\sim \mathcal{WN}(0, \sigma^2)Z∼WN(0,σ2) is a white noise process with variance σ2\sigma^2σ2 if and only if E[Zn]=0\mathbb{E}\left[Z_n\right] = 0E[Zn​]=0 and E[ZnZm∗]=σ2δ[n,m]\mathbb{E}\left[Z_nZ_m^*\right] = \sigma^2\delta[n, m]E[Zn​Zm∗​]=σ2δ[n,m].

Spectral Density

Recall that the Discrete Time Fourier Transform is given by

X(ejω)=∑n=−∞∞x[n]e−jωn.X(e^{j\omega}) = \sum_{n=-\infty}^{\infty}x[n]e^{-j\omega n}.X(ejω)=∑n=−∞∞​x[n]e−jωn.

The Inverse Discrete Time Fourier Transform is given by

x[n]=12π∫−ππX(ejω)ejωndω.x[n] = \frac{1}{2\pi}\int_{-\pi}^{\pi}X(e^{j\omega})e^{j\omega n}d\omega.x[n]=2π1​∫−ππ​X(ejω)ejωndω.

Since the DTFT is an infinite summation, it may or may not converge.

Definition 7

This class covers most real-world signals.

Theorem 5

Definition 8

Theorem 6

Tempered distributions like the Dirac Delta function are other functions which are important for computing the DTFT, and they arise from the theory of generalized functions.

Suppose we want to characterize the signal using its DTFT.

Definition 9

Definition 10

We call the DTFT of the autocorrelation the energy spectral density because, by the Inverse DTFT,

Definition 11

The Power Spectral Density of a Wide-Sense Stationary random process is given by

Theorem 7 (Cramer-Khinchin)

Besides giving us a decomposition of a WSS random process, Theorem 7 tells a few important facts.

Z-Spectrum

Recall that the Z-transform converts a discrete-time signal into a complex representation. It is given by

It is a special type of series called a Laurent Series.

Theorem 8

A Laurent Series will converge absolutely on an open annulus

In some cases, it can be useful to only compute the Z-transform of the right side of the signal.

Definition 12

If the Z-transform of the sequence is a rational function, then we can quickly compute what the unilateral Z-transform will be by leveraging its partial fraction decomposition.

Theorem 9

Definition 13

Using this definition, we can see that

Definition 14

Markov Processes

Definition 15

Mathematically, Markov triplets satisfy three properties.

Theorem 10

Definition 16

Because of the conditional independence property, we can write the joint distribution of all states in the Markov process as

Definition 17

Definition 18

All Wide-Sense Markov models have a very succint representation.

Theorem 11

Hidden Markov Processes

Definition 19

Hidden Markov Models can be represented by undirected graphical models. To create an undirected graphical model,

  1. Create a node for each random variable.

  2. Draw an edge between two nodes if a factor of the joint distribution contains both nodes.

Undirected graphical models of Hidden Markov Processes are useful because they let us derive additional Markov dependepencies between groups of variables.

Theorem 12

State-Space Models

Suppose we have a discrete-time random process which evolves in a recursive fashion, meaning the current state depends in some way on the previous state. We can express this recursion with a set of equations.

Definition 20

with initial condition

$$\langle \boldsymbol{X}_i, \boldsymbol{X}_j \rangle  = \begin{cases}                 \Phi_{i,j}\Pi_j & i \geq j \\                 \Pi_i \Phi_{j,i}^* & i \leq j             \end{cases}$$
$$\langle \boldsymbol{Y}_i, \boldsymbol{Y}_j \rangle  = \begin{cases}                  H_i \Phi_{i,j+1}N_j & i > j\\                  R_i + H_i\Pi_iH_i^* & i=j \\                  N_i^*\Phi^*_{j,i+1}H_j^* & i < j             \end{cases} \text{ where } N_i=F_i\Pi_iH_i^*+G_iS_i$$

A signal x[n]x[n]x[n] belongs to the l1l^1l1 class of signals if the series converges absolutely. In other words,

∑k=−∞∞∣x[k]∣<∞.\sum_{k=-\infty}^{\infty}|x[k]| < \infty.∑k=−∞∞​∣x[k]∣<∞.

If x[n]x[n]x[n] is a l1l^1l1 signal, then the DTFT X(ejω)X(e^{j\omega})X(ejω) converges uniformly and is well-defined for every ω\omegaω. X(ejω)X(e^{j\omega})X(ejω)is also a continuous function.

A signal x[n]x[n]x[n] belongs to the l2l^2l2 class of signals if it is square summable. In other words,

∑k=−∞∞∣x[k]∣2<∞.\sum_{k=-\infty}^{\infty}|x[k]|^2 < \infty.∑k=−∞∞​∣x[k]∣2<∞.

The l2l^2l2 class contains important functions such as sinc\text{sinc}sinc.

If x[n]x[n]x[n] is a l2l^2l2 signal, then the DTFT X(ejω)X(e^{j\omega})X(ejω) is defined almost everywhere and only converges in the mean-squared sense:

lim⁡N→∞∫−ππ∣(∑k=−NNx[k]e−jωn)−X(ω)∣2dω=0\lim_{N\to\infty} \int_{-\pi}^{\pi}\left|\left(\sum_{k=-N}^N x[k]e^{-j\omega n}\right) - X(\omega)\right|^2d\omega = 0limN→∞​∫−ππ​​(∑k=−NN​x[k]e−jωn)−X(ω)​2dω=0

The energy of a deterministic, discrete-time signal x[n]x[n]x[n] is given by

∑n∈Z∣x[n]∣2.\sum_{n\in\mathbb{Z}}|x[n]|^2.∑n∈Z​∣x[n]∣2.

The autocorrelation of x[n]x[n]x[n], given by a[n]=x[n]∗x∗[−n]a[n] = x[n] * x^*[-n]a[n]=x[n]∗x∗[−n], is closely related to the energy of the signal since a[0]=∑n∈Z∣x(n)∣2a[0] = \sum_{n\in\mathbb{Z}}|x(n)|^2a[0]=∑n∈Z​∣x(n)∣2.

The Energy Spectral Density x[n]x[n]x[n] with auto-correlation a[n]a[n]a[n] is given by

A(ejω)=∑n∈Za[n]e−jωnA(e^{j\omega}) = \sum_{n\in\mathbb{Z}}a[n]e^{-j\omega n}A(ejω)=∑n∈Z​a[n]e−jωn

a[0]=12π∫−ππA(ejω)dω.a[0] = \frac{1}{2\pi}\int_{-\pi}^{\pi}A(e^{j\omega})d\omega.a[0]=2π1​∫−ππ​A(ejω)dω.

Since summing over each frequency gives us the energy, we can think of A(ejω)A(e^{j\omega})A(ejω) as storing the energy density of each spectral component of the signal. We can apply this same idea to wide-sense stationary stochastic processes.

SX(ejω)=∑k∈ZRX(k)e−jωk.S_X(e^{j\omega}) = \sum_{k\in\mathbb{Z}}R_X(k)e^{-j\omega k}.SX​(ejω)=∑k∈Z​RX​(k)e−jωk.

Note that when considering stochastic signals, the metric changes from energy to power. This is because if XnX_nXn​ is Wide-Sense Stationary, then

E[∑n∈Z∣Xn∣2]=∞,\mathbb{E}\left[\sum_{n\in\mathbb{Z}}|X_n|^2\right] = \infty,E[∑n∈Z​∣Xn​∣2]=∞,

so energy doesn’t even make sense. To build our notion of power, let AT(ω)A_T(\omega)AT​(ω) be a truncated DTFT of the auto-correlation of a wide-sense stationary process, then

lim⁡T→∞E[AT(ejω)]2T+1=lim⁡T→∞12T+1(∑n=−TTx[n]e−jωn)(∑m=−TTx∗[m]ejωm)=lim⁡T→∞12T+1∑n,m∈[−T,T]E[x[n]x∗[m]]e−jω(n−m)=lim⁡T→∞12T+1∑n,m∈[−T,T]Rx(n−m)e−jω(n−m)=lim⁡T→∞∑k=−2T2TRX(k)e−jωk(1−∣k∣2T+1)=∑k=−∞∞RX(k)e−jωk\begin{aligned} \lim_{T\to\infty} \frac{\mathbb{E}\left[A_T(e^{j\omega})\right] }{2T + 1} &= \lim_{T\to\infty}\frac{1}{2T+1}\left(\sum_{n=-T}^Tx[n]e^{-j\omega n}\right)\left(\sum_{m=-T}^Tx^*[m]e^{j\omega m}\right)\\ &= \lim_{T\to\infty} \frac{1}{2T+1} \sum_{n,m \in [-T,T]}\mathbb{E}\left[x[n]x^*[m]\right] e^{-j\omega(n-m)}\\ &= \lim_{T\to\infty} \frac{1}{2T+1} \sum_{n,m \in [-T,T]}R_x(n-m)e^{-j\omega(n-m)}\\ &= \lim_{T\to\infty} \sum_{k=-2T}^{2T}R_X(k)e^{-j\omega k}\left(1 - \frac{|k|}{2T+1}\right)\\ &= \sum_{k=-\infty}^{\infty}R_X(k)e^{-j\omega k}\end{aligned}T→∞lim​2T+1E[AT​(ejω)]​​=T→∞lim​2T+11​(n=−T∑T​x[n]e−jωn)(m=−T∑T​x∗[m]ejωm)=T→∞lim​2T+11​n,m∈[−T,T]∑​E[x[n]x∗[m]]e−jω(n−m)=T→∞lim​2T+11​n,m∈[−T,T]∑​Rx​(n−m)e−jω(n−m)=T→∞lim​k=−2T∑2T​RX​(k)e−jωk(1−2T+1∣k∣​)=k=−∞∑∞​RX​(k)e−jωk​

The DTFT of the auto-correlation function naturally arises out of taking the energy spectral density and normalizing it by time (the truncated sequence is made of 2T+12T+12T+1 points). In practice, this means to measure the PSD, we need to either use the distribution of the signal to compute RXR_XRX​, or estimate the PSDPSDPSD by averaging multiple realizations of the signal.

The inverse DTFT formula tells us that we can represent a deterministic, discrete-time signal x[n]x[n]x[n] as a sum of complex exponentials weighted by X(ejω)dω2π\frac{X(e^{j\omega})d\omega}{2\pi}2πX(ejω)dω​. This representation has an analog for stochastic signals as well.

For a complex-valued WSS stochastic process XnX_nXn​ with power spectral density SX(ω)S_X(\omega)SX​(ω), there exists a unique right-continuous stochastic process F(ω),ω∈(−π,π]F(\omega), \omega\in(-\pi,\pi]F(ω),ω∈(−π,π] with square-integrable, orthogonal increments such that

Xn=∫−ππejωndF(ω)X_n = \int_{-\pi}^{\pi}e^{j\omega n}dF(\omega)Xn​=∫−ππ​ejωndF(ω)

where for any interval [ω1,ω2],[ω3,ω4]⊂[−π,π][\omega_1,\omega_2], [\omega_3, \omega_4]\subset [-\pi,\pi][ω1​,ω2​],[ω3​,ω4​]⊂[−π,π],

E[(F(ω2)−F(ω1))(F(ω4)−F(ω3))∗]=f((ω1,ω2]∩(ω3,ω4])\mathbb{E}\left[(F(\omega_2)-F(\omega_1))(F(\omega_4) - F(\omega_3))^*\right] = f((\omega_1,\omega_2] \cap (\omega_3, \omega_4])E[(F(ω2​)−F(ω1​))(F(ω4​)−F(ω3​))∗]=f((ω1​,ω2​]∩(ω3​,ω4​])

where fff is the structural measure of the stochastic process and has Radon-Nikodym derivative SX(ejω)2π\frac{S_X(e^{j\omega})}{2\pi}2πSX​(ejω)​.

ω1≠ω2  ⟹  ⟨dF(ω1),dF(ω2)⟩=0\omega_1\neq\omega_2 \implies \langle dF(\omega_1), dF(\omega_2) \rangle = 0ω1​=ω2​⟹⟨dF(ω1​),dF(ω2​)⟩=0 (i.e different frequencies are uncorrelated).

E[∣dF(ω)∣2]=SX(ejω)dω2π\mathbb{E}\left[|dF(\omega)|^2\right] = \frac{S_X(e^{j\omega})d\omega}{2\pi}E[∣dF(ω)∣2]=2πSX​(ejω)dω​

X(z)=∑n=−∞∞x[n]z−n.X(z) = \sum_{n=-\infty}^{\infty}x[n]z^{-n}.X(z)=∑n=−∞∞​x[n]z−n.

A={z∣r<∣z∣<R}A = \{z | r < |z| < R \}A={z∣r<∣z∣<R}

for some rrr and RRR.

We can compute rrr and RRR using the signal x[n]x[n]x[n].

r=lim sup⁡n→∞∣x[n]∣1n,1R=lim sup⁡n→∞∣x[−n]∣1n.r = \limsup_{n\to\infty} |x[n]|^{\frac{1}{n}}, \qquad \frac{1}{R} = \limsup_{n\to\infty}|x[-n]|^{\frac{1}{n}}.r=limsupn→∞​∣x[n]∣n1​,R1​=limsupn→∞​∣x[−n]∣n1​.

The unilateral Z-transform of a sequence x[n]x[n]x[n] is given by

[X(z)]+=∑n=0∞x[n]z−n\left[X(z)\right]_+ = \sum_{n=0}^\infty x[n]z^{-n}[X(z)]+​=∑n=0∞​x[n]z−n

Any arbitrary rational function H(z)H(z)H(z) with region of convergence including the unit circle corresponds with the unilateral Z-transform

[H(z)]+=r0+∑i=1m∑k=1lirik(z+αi)k+∑i=m+1n∑k=1lirikβik\left[H(z)\right]_+ = r_0 + \sum_{i=1}^m\sum_{k=1}^{l_i}\frac{r_{ik}}{(z+\alpha_i)^k} + \sum_{i=m+1}^n\sum_{k=1}^{l_i}\frac{r_{ik}}{\beta_i^k}[H(z)]+​=r0​+∑i=1m​∑k=1li​​(z+αi​)krik​​+∑i=m+1n​∑k=1li​​βik​rik​​

where ∣αi∣<1<∣βi∣|\alpha_i| < 1 < |\beta_i|∣αi​∣<1<∣βi​∣.

For two jointly WSS processes Xn,YnX_n, Y_nXn​,Yn​, the z-cross spectrum is the Z-Transform of the correlation function RYX(k)=E[YnXn−k∗]R_{YX}(k) = \mathbb{E}\left[Y_nX^*_{n-k}\right] RYX​(k)=E[Yn​Xn−k∗​].

SYX(z)=∑k∈ZRYX(k)z−kS_{YX}(z) = \sum_{k\in\mathbb{Z}}R_{YX}(k)z^{-k}SYX​(z)=∑k∈Z​RYX​(k)z−k

SXY(z)=SYX∗(z−∗).S_{XY}(z) = S^*_{YX}(z^{-*}).SXY​(z)=SYX∗​(z−∗).

We can also look at the Z-transform of the auto-correlation function of a WSS process XXX to obtain SX(z)S_X(z)SX​(z).

For a rational function SX(z)S_X(z)SX​(z) with finite power (∫−ππSX(ejω)dω<∞)\left(\int_{-\pi}^\pi S_X(e^{j\omega})d\omega < \infty \right)(∫−ππ​SX​(ejω)dω<∞) and is strictly positive on the unit circle, the canonical spectral factorization decomposes SX(z)S_X(z)SX​(z) into a product of a re>0r_e>0re​>0 and the transfer function of a minimum phase system L(z)L(z)L(z) with L(∞)=1L(\infty) = 1L(∞)=1

SX(z)=L(z)reL∗(z−∗)S_X(z) = L(z)r_eL^*(z^{-*})SX​(z)=L(z)re​L∗(z−∗)

Because L(z)L(z)L(z) is minimum phase and L(∞)=1L(\infty)=1L(∞)=1, it must take the form

L(z)=1+∑i=1∞l[i]z−iL(z) = 1 + \sum_{i=1}^\infty l[i]z^{-i}L(z)=1+∑i=1∞​l[i]z−i

since minimum phase systems are causal. Using Definition 14, we can express SX(z)S_X(z)SX​(z) as the product of a right-sided and left-sided process.

SX(z)=(reL(z))(reL∗(z−∗))=SX+(z)SX−(z)S_X(z) = (\sqrt{r_e}L(z))(\sqrt{r_e}L^*(z^{-*})) = S_X^+(z)S_X^-(z)SX​(z)=(re​​L(z))(re​​L∗(z−∗))=SX+​(z)SX−​(z)

Note that SX−(ejω)=(SX+(ejω))∗S_X^-(e^{j\omega}) = \left(S_X^+(e^{j\omega})\right)^*SX−​(ejω)=(SX+​(ejω))∗. Using the assumptions built into Definition 14, we can find a general form for L(z)L(z)L(z) since we know SY(z)S_Y(z)SY​(z) takes the following form

SY(z)=re∏i=1m(z−αi)(z−1−αi∗)∏i=1n(z−βi)(z−1−βi∗)∣αi∣<1,∣βi∣<1,re>0.S_Y(z) = r_e \frac{\prod_{i=1}^m(z-\alpha_i)(z^{-1}-\alpha_i^*)}{\prod_{i=1}^n(z-\beta_i)(z^{-1}-\beta_i^*)}\quad |\alpha_i| < 1, |\beta_i| < 1, r_e > 0.SY​(z)=re​∏i=1n​(z−βi​)(z−1−βi∗​)∏i=1m​(z−αi​)(z−1−αi∗​)​∣αi​∣<1,∣βi​∣<1,re​>0.

If we let the z−αiz - \alpha_iz−αi​ and z−βiz-\beta_iz−βi​ terms be part of L(z)L(z)L(z), then

L(z)=zn−m∏i=1m(z−αi)∏i=1n(z−βi).L(z) = z^{n-m}\frac{\prod_{i=1}^m(z-\alpha_i)}{\prod_{i=1}^n(z-\beta_i)}.L(z)=zn−m∏i=1n​(z−βi​)∏i=1m​(z−αi​)​.

We say that random variables X,Y,ZX, Y, ZX,Y,Z form a Markov Triplet X \textemdash Y \textemdash Z if and only if XXX and ZZZ are conditionally independent on YYY

p(x,z∣y)=p(x∣y)p(z∣y)p(x, z | y) = p(x|y)p(z|y)p(x,z∣y)=p(x∣y)p(z∣y)

p(z∣x,y)=p(z∣y)p(z|x, y) = p(z|y)p(z∣x,y)=p(z∣y)

p(x∣y,z)=p(x∣y)p(x|y, z) = p(x|y)p(x∣y,z)=p(x∣y)

Because of these rules, the joint distribution can be written as p(x,y,z)=p(x)p(y∣x)p(z∣y)p(x, y, z) = p(x)p(y|x)p(z|y)p(x,y,z)=p(x)p(y∣x)p(z∣y).

Random variables X,Y,ZX,Y,ZX,Y,Z form a Markov triplet if and only if there exist ϕ1,ϕ2\phi_1, \phi_2ϕ1​,ϕ2​ such that p(x,y,z)=ϕ1(x,y)ϕ2(y,z)p(x, y, z) = \phi_1(x, y)\phi_2(y, z)p(x,y,z)=ϕ1​(x,y)ϕ2​(y,z).

To simplify notation, we can define Xmn=(Xm,Xm+1,⋯ ,Xn)X_m^n = \left(X_m,X_{m+1},\cdots, X_n\right)Xmn​=(Xm​,Xm+1​,⋯,Xn​) and Xn=X1nX^n=X_1^nXn=X1n​.

A Markov Process is a Discrete Time Random Process {Xn}n≥1\{X_n\}_{n\geq1}{Xn​}n≥1​ where X_n \textemdash X_{n-1} \textemdash X^{n-2} for all n≥2n\geq 2n≥2

p(xn)=∏t=1np(xt∣xt−1)=∏t=1np(xt∣xt−1).p(x^n) = \prod_{t=1}^n p(x_t|x^{t-1}) = \prod_{t=1}^np(x_t|x_{t-1}).p(xn)=∏t=1n​p(xt​∣xt−1)=∏t=1n​p(xt​∣xt−1​).

The requirement for X \textemdash Y \textemdash Z to satisfy p(x,y,z)=p(x)p(y∣x)p(z∣y)p(x, y, z) = p(x)p(y|x)p(z|y)p(x,y,z)=p(x)p(y∣x)p(z∣y) is a very strict requirement. If we wanted to create a “wider” requirement of Markovity, then we could settle for X^(Y)=X^(Y,Z)\hat{X}(Y) = \hat{X}(Y, Z)X^(Y)=X^(Y,Z) where X^\hat{X}X^ is the best linear estimator of XXX since this property is satisfied by all Markov triplets, but does not imply a Markov Triplet.

Random variables X,Y,ZX, Y, ZX,Y,Z form a Wide Sense Markov Triplet X \textemdash Y \textemdash Z if and only if the best linear estimator of X given Y is identical to the best linear estimator of X given Y and Z.

X^(Y)=X^(Y,Z)\hat{X}(Y) = \hat{X}(Y, Z)X^(Y)=X^(Y,Z)

A stochastic process {Yi}i=0n\{Y_i\}_{i=0}^n{Yi​}i=0n​ is a Wide-Sense Markov Process if and only if for any 1≤i≤n−11 \leq i \leq n - 11≤i≤n−1, Y_{i+1} \textemdash Y_i \textemdash Y^{i-1}forms a Wide-Sense Markov Triplet.

A process X\boldsymbol{X}X is Wide-Sense Markov if and only if Xi+1=FiXi+GiUi\boldsymbol{X}_{i+1} = F_i \boldsymbol{X}_i + G_i \boldsymbol{U}_iXi+1​=Fi​Xi​+Gi​Ui​ and

⟨[UiX0],[UjX0]⟩=[Qiδ[i−j]00Π0]\langle \begin{bmatrix} U_i \\ \boldsymbol{X}_0 \end{bmatrix}, \begin{bmatrix} U_j \\ \boldsymbol{X}_0 \end{bmatrix} \rangle = \begin{bmatrix} Q_i \delta[i-j] & 0\\ 0 & \Pi_0 \end{bmatrix}⟨[Ui​X0​​],[Uj​X0​​]⟩=[Qi​δ[i−j]0​0Π0​​]

If {Xn}n≥1\{X_n\}_{n\geq1}{Xn​}n≥1​ is a Markov Process, then {Yn}n≥1\{Y_n\}_{n\geq1}{Yn​}n≥1​ is a Hidden Markov Process if we can factorize the conditional probability density

p(yn,xn)=∏i=1np(yi∣xi)p(y^n, x^n) = \prod_{i=1}^np(y_i|x_i)p(yn,xn)=∏i=1n​p(yi​∣xi​)

We can think of YYY as a noisy observation of an underlying Markov Process. The joint distribution of {Xn}n≥1\{X_n\}_{n\geq1}{Xn​}n≥1​ and {Yn}n≥1\{Y_n\}_{n\geq1}{Yn​}n≥1​ can be written as

p(xn,yn)=p(xn)p(yn∣xn)=∏t=1np(xt∣xt−1)∏i=1np(yi∣xi).p(x^n, y^n) = p(x^n)p(y^n|x^n) = \prod_{t=1}^np(x_t|x_{t-1})\prod_{i=1}^np(y_i|x_i).p(xn,yn)=p(xn)p(yn∣xn)=∏t=1n​p(xt​∣xt−1​)∏i=1n​p(yi​∣xi​).

For 3 disjoint sets S1,S2,S3S_1, S_2, S_3S1​,S2​,S3​ of notes in a graphical model, if any path from S1S_1S1​ to S3S_3S3​ passes through a node in S2S_2S2​, then S_1 \textemdash S_2 \textemdash S_3.

The standard state space model describes random processes which describe the evolution of state vectors Xi\boldsymbol{X}_iXi​ and observation vectors Yi\boldsymbol{Y}_iYi​ according to the equations

{Xi+1=FiXi+GiUiYi=HiXi+Vi\begin{cases} \boldsymbol{X}_{i+1} = F_i \boldsymbol{X}_i + G_i \boldsymbol{U}_i\\ \boldsymbol{Y}_{i} = H_i\boldsymbol{X}_i + \boldsymbol{V}_i \end{cases}{Xi+1​=Fi​Xi​+Gi​Ui​Yi​=Hi​Xi​+Vi​​

⟨[X0UiVi],[X0UjVj]⟩=[Π0000Qiδ[i−j]Siδ[i−j]0Si∗δ[i−j]Riδ[i−j]]\langle \begin{bmatrix}\boldsymbol{X}_0 \\ \boldsymbol{U}_i \\ \boldsymbol{V}_i\end{bmatrix}, \begin{bmatrix}\boldsymbol{X}_0 \\ \boldsymbol{U}_j \\ \boldsymbol{V}_j\end{bmatrix} \rangle = \begin{bmatrix} \Pi_0 & 0 & 0\\ 0 & Q_i\delta[i-j] & S_i\delta[i-j]\\ 0 & S_i^*\delta[i-j] & R_i\delta[i-j] \end{bmatrix}⟨​X0​Ui​Vi​​​,​X0​Uj​Vj​​​⟩=​Π0​00​0Qi​δ[i−j]Si∗​δ[i−j]​0Si​δ[i−j]Ri​δ[i−j]​​

From Theorem 11, we can easily see that state space models are Wide-Sense Markov. Note that UiU_iUi​ and ViV_iVi​ are white noise, and that the dynamics of the system can change at every time step. From these equations, we can derive six different properties. Let Πi=⟨Xi,Xi⟩\Pi_i = \langle \boldsymbol{X}_i, \boldsymbol{X}_i \rangle Πi​=⟨Xi​,Xi​⟩and Φi,j=∏k=ji−1Fk\Phi_{i,j} = \prod_{k=j}^{i-1}F_kΦi,j​=∏k=ji−1​Fk​ and Φi,u=I\Phi_{i,u} = IΦi,u​=I.

∀i≥j, ⟨Ui,Xj⟩=0, ⟨Vi,Xj⟩=0\forall i \geq j,\ \langle \boldsymbol{U}_i, \boldsymbol{X}_j \rangle = 0,\ \langle \boldsymbol{V}_i, \boldsymbol{X}_j \rangle = 0∀i≥j, ⟨Ui​,Xj​⟩=0, ⟨Vi​,Xj​⟩=0

∀i>j, ⟨Ui,Yj⟩=0, ⟨Vi,Yj⟩=0\forall i > j,\ \langle \boldsymbol{U}_i, \boldsymbol{Y}_j \rangle = 0,\ \langle \boldsymbol{V}_i, \boldsymbol{Y}_j \rangle = 0∀i>j, ⟨Ui​,Yj​⟩=0, ⟨Vi​,Yj​⟩=0

∀i, ⟨Ui,Yi⟩=Si, ⟨Vi,Yi⟩=Ri\forall i,\ \langle \boldsymbol{U}_i, \boldsymbol{Y}_i \rangle = S_i,\ \langle \boldsymbol{V}_i, \boldsymbol{Y}_i \rangle = R_i∀i, ⟨Ui​,Yi​⟩=Si​, ⟨Vi​,Yi​⟩=Ri​

Πi+1=FiΠiFi∗+GiQiGi∗\Pi_{i+1} = F_i\Pi_iF_i^* + G_iQ_iG_i^*Πi+1​=Fi​Πi​Fi∗​+Gi​Qi​Gi∗​