Random Processes

Definition 47

A random/stochastic process is a sequence of random variables
(Xn)n0(X_n)_{n\geq 0}
.
The random variables in a stochastic process do not have to be independently and identically distributed. In fact, if they are not, then we can get additional modeling power.

Definition 48

A random process
(Xn)nN(X_n)_{n\in\mathbb{N}}
is stationary if for all
k,n>0k, n > 0
and all events
A1,,AnA_1,\cdots,A_n
, then
Pr{X1A1,,XnAn}=Pr{Xk+1A1,,Ak+nAn}\text{Pr}\left\{X_1\in A_1,\cdots,X_n\in A_n\right\} = \text{Pr}\left\{X_{k+1}\in A_1,\cdots,A_{k+n}\in A_n\right\}
Stationarity is often a good assumption that can simplify systems which have been running for a long period of time.

Discrete Time Markov Chains

Definition 49

(Xn)n0(X_n)_{n\geq 0}
is a Markov Chain if each random variable
XiX_i
takes values in a discrete set
SS
(the state space), and,
n0, i,jS, Pr{Xn+1=jXn=i,,X0=x0}=Pr{Xn+1=iXn=j}\forall n \geq 0,\ i,j\in S,\ \text{Pr}\left\{X_{n+1}=j|X_n=i,\cdots,X_0=x_0\right\} = \text{Pr}\left\{X_{n+1}=i|X_n=j\right\}
In words, a Markov Chain is a sequence of random variables satisfying the Markov Property where probability of being in a state during the next time step only depends on the current state.

Definition 50

A temporally homogenous Markov Chain is one where the transition probabilities
Pr{Xn+1=jXn=i}=pij\text{Pr}\left\{X_{n+1}=j|X_n=i\right\} = p_{ij}
for all
i,jSi,j\in S
and
n0n\geq 0
.
Temporally Homogenous Markov Chains don’t change their transition probabilities over time. Since the
pijp_{ij}
are conditional probabilities, they must satisfy
  1. 1.
    i,jS, pij0\forall i,j\in S,\ p_{ij} \geq 0
  2. 2.
    iS, jSpij=1\forall i\in S,\ \sum_{j\in S}p_{ij} = 1

Definition 51

The transition matrix of a Markov Chain is a matrix
PP
where the ijth entry
Pij=pijP_{ij} = p_{ij}
for all
i,jSi,j\in S
.
The transition matrix encodes the one-step transition probabilities of the Markov Chain.

Theorem 27 (Chapman-Kolmogorov Equation)

The n-step transition probabilities (i.e starting in
ii
and ending in
jj
nn
steps later) of the Markov Chain are given by
pij(n)=Pijnp_{ij}^{(n)} = P^n_{ij}
.
One useful thing we can comptue with Markov Chain is when the chain first enters a particular state.

Definition 52

For a
ASA \subset S
, the hitting time of
AA
is given by
TA=minn{n0:XnA}T_A = \min_n \{ n\geq 0: X_n\in A\}
Computing the expected hitting time is an example of a broader type of Markov Chain Analysis called First Step Analysis. In First Step Analysis, we set up a system of equations that relies on the Markov property to generate a system of equations that only look at the first transition in the chain. For expected hitting time, these look like
  1. 1.
    For
    i∉Ai\not\in A
    ,
    E[TAX0=i]=1+jpijE[TAX0=j]\mathbb{E}\left[T_A|X_0 = i\right] = 1 + \sum_j p_{ij} \mathbb{E}\left[T_A|X_0 = j\right]
  2. 2.
    For
    iAi\in A
    ,
    E[TAX0=i]=0\mathbb{E}\left[T_A|X_0 = i\right] = 0

Properties of Markov Chains

Definition 53

If
n1\exists n \geq 1
such that
pij(n)0p_{ij}^{(n)} \ne 0
, then
jj
is accessible from
ii
, and we write
iji\rightarrow j
.

Definition 54

States
ii
and
jj
communicate with each other when
iji\rightarrow j
and
jij\rightarrow i
. We write this as
iji\leftrightarrow j
.
By convention, we say that
iii\leftrightarrow i
. It turns out that
\leftrightarrow
is an equivalence relation on the state space
SS
. An equivalence relation means that
  1. 1.
    iS, ii\forall i\in S,\ i \leftrightarrow i
  2. 2.
    i,jS, ijji\forall i,j\in S,\ i\leftrightarrow j \Leftrightarrow j \leftrightarrow i
  3. 3.
    i,j,kS,ik,kjRightarrowij\forall i,j,k \in S, i\leftrightarrow k, k\leftrightarrow j \mathbb{R}ightarrow i \leftrightarrow j
This means that
\leftrightarrow
partitions the state-space
SS
into equivalence classes (i.e classes of communicating states).

Definition 55

A Markov Chain is irreducible if
SS
is the only class.

Definition 56

An irreducible Markov Chain is reversible if and only if there exists a probability vector
π\pi
that satisfies the Detailed Balance Equations
i,jS, πjpij=πipji\forall i,j \in S,\ \pi_j p_{ij} = \pi_i p_{ji}
Markov Chains which satisfy the detailed balance equations are called reversible because if
X0πX_0\sim \pi
, then the random vectors
(X0,X1,,Xn)(X_0, X_1, \cdots, X_n)
and
(Xn,Xn1,,X0)(X_n, X_{n-1}, \cdots, X_0)
are equal in distribution.

Theorem 28

If the graph of a Markov Chain (transform the state transition diagram by making edges undirected, removing self-loops, and removing multiple edges) is a tree, then the Markov Chain is reversible.

Class Properties

A class property is a property where if one element of a class has the property, all elements of the class have the property. Markov Chains have several of these properties which allow us to classify states.

Definition 57

A state
iSi\in S
is recurrent if given that
X0=iX_0=i
, the process revisits state
ii
with probability 1.

Definition 58

A state is
iSi\in S
is transient if it is not recurrent.
Recurrence means that we will visit a state infinitely often in the future if we start in that state, while transience means we will only visit the state finitely many times. Recurrence and transience can be easily identified from the transition diagram.
  1. 1.
    Any finite communicating class which has no edges leaving the class is recurrent
  2. 2.
    If a state has an edge leading outside its communicating class, then it is transient
  3. 3.
    If a state is recurrent, then any state it can reach is recurrent
We can further break recurrence down if we modify the definition of hitting time to be
Ti=minn{n1:Xn=i}T_i = \min_n \{ n \geq 1 : X_n=i \}
(the first time the chain enters state
ii
).

Definition 59

State
ii
is positive recurrent if it is recurrent and
E[TiX0=i]\mathbb{E}\left[T_i|X_0=i\right]
is finite.

Definition 60

State
ii
is null recurrent if it is recurrent and
E[TiX0=i]\mathbb{E}\left[T_i|X_0=i\right]
is infinite.
Positive recurrence means we visit a recurrent state so frequently that we spend a positive fraction of time in that state. Null recurrencce means we visit a recurrent state so infrequently (but still infinitely many times) that we spend virtually no time in that state.

Theorem 29

Every irreducible finite state Markov Chain is positive recurrent.

Definition 61

For a state
iSi\in S
, we define the period of the state to be
period(i)=GCD{n1:pii(n)>0}.\text{period}(i) = \text{GCD}\{n\geq 1 : p_{ii}^{(n)} > 0 \}.
If we start in state
ii
, then revists to
ii
only occur at integer multiples of the period.

Definition 62

An irreducible markov chain is aperiodic if any state has period 1.
All of the above properties are class properties.

Long-Term Behavior of Markov Chains

Since the
pijp_{ij}
completely characterize the Markov Chain, we can also describe what happens to the chain in the limit.

Definition 63

A probability distribution
π\pi
over the states is a stationary distribution if
π=πP\pi = \pi P
It is called a stationary distribution because the distribution over states is invariant with time. A Markov Chain is only at stationarity if and only if it has been started from the stationary distribution. The relationship
π=πP\pi = \pi P
can be expanded for the jth element to show that any stationary distribution must satisfy the Global Balance Equations:
πj=ipijπi.\pi_j = \sum_i p_{ij}\pi_i.
Note that if a distribution
π\pi
satisfies the detailed balance equations from Definition 56, then
π\pi
also satisfies Definition 63.
Both the global balance equations and detailed balance equations can be conceptualized as statements of flow. If each
πj\pi_j
indicates how much mass is placed on state
jj
, then the global balance equations tell us the mass leaving the node (going to each neighbor
ii
in proportion to
pijp_{ij}
) is equal to the mass entering the node (which must sum to
πj\pi_j
since it is a stationary distribution. Rather than looking at the flow of the whole chain, the detailed balance equations look at the flow between two states. The mass
ii
gives to
jj
is equal to the mass
jj
gives to
ii
.

Theorem 30

If an irreducible Markov Chain is at stationarity, then the flow-in equals flow-out relationship holds for any cut of the Markov Chain where a cut is a partition of the chain into two disjoint subsets.
Theorem 30 is one useful result can help solve for stationary distributions.

Theorem 31 (Big Theorem for Markov Chains)

Let
(Xn)n0(X_n)_{n\geq 0}
be an irreducible Markov Chain. Then one of the following is true.
  1. 1.
    Either all states are transient, or all states are null recurrent, and no stationary distribution exists, and
    limnpij(n)=0\lim_{n\to\infty}p_{ij}^{(n)} = 0
    .
  2. 2.
    All states are positive recurrent and the stationary distribution exists, is unique, and satisfies
πj=limn1nk=0nPij(k)=1E[TjX0=j].\pi_j = \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n}P_{ij}^{(k)} = \frac{1}{\mathbb{E}\left[T_j|X_0=j\right] }.
If the Markov Chain is aperiodic, then
limnpij(n)=πj\lim_{n\to\infty}p_{ij}^{(n)} = \pi_j
One consequence of Theorem 31 is that it means the stationary distribution
π\pi
of a reversible Markov Chain is unique. This makes solving the detailed balance equations a good technique of solving for the stationary distribution. If a stationary distribution exists, then we can also say when the chain will converge to the stationary distribution.

Theorem 32 (Convergence Theorem)

If a chain is irreducible, positive, recurrent, and aperiodic with stationary distribution
π\pi
, then the distribution at time
nn
πnπ\pi_n \to \pi

Continuous Time Markov Chains

Definition 64

A process
(Xt)t0(X_t)_{t\geq 0}
taking values in a countable state space
SS
is a temporally homogenous continuous time markov chain if it satisfies the Markov Property
Pr{Xt+τ=jXt=i,Xs=is,0st}=Pr{Xt+τ=jXt=i}=Pr{Xτ=jX0=i}\text{Pr}\left\{X_{t+\tau}=j|X_t=i,X_s=i_s, 0 \leq s \leq t\right\} = \text{Pr}\left\{X_{t+\tau}=j|X_t=i\right\} = \text{Pr}\left\{X_\tau = j | X_0 = i\right\}
To characterize how a CTMC functions, we need to define some additional quantities.
  1. 1.
    qiq_i
    is the transition rate of state
    ii
  2. 2.
    pijp_{ij}
    is the transition probability bewteen states
    ii
    and
    jj
Every time a CTMC enters a state
ii
, it will hold in that state for
Exp(qi)\text{Exp}(q_i)
time before transitioning to the next state
jj
with probability
pijp_{ij}
.

Definition 65

The jump chain is a DTMC which describes the transition probabilities between states in the CTMC
Note that the jump chain cannot have self-loops (
pii=0p_{ii}=0
) because otherwise the amount of time spent in state
ii
would not be exponentially distributed. An alternative interpretation of a CTMC is
  1. 1.
    Define jump rates
    qij=qipijq_{ij} = q_i p_{ij}
  2. 2.
    On entering state
    ii
    , jump to
    j=argminjTjj^\star = \text{argmin}_j T_j
    where
    TjExp(qij)T_j \sim \text{Exp}(q_{ij})
    for all
    jij\neq i
    and are independent from each other.
Essentially, every time we enter a state, we set an alarm clock for all other states, and then jump to the state whose alarm clock goes off first. This equivalent interpretation allows us to summarize a CTMC using the rate matrix.
Qij={qi if i=jqij if ijQ_{ij} = \begin{cases} -q_i & \text{ if } i=j\\ q_{ij} & \text{ if } i \neq j \end{cases}
Following from the first interprentation, all entries of
QQ
are non-negative, and the rows must sum to 0. One useful quantity which we can define is how long it takes to come back to a particular state.

Definition 66

The time to first re-entry of state
jj
is
Tj=min{t0:Xt=j and Xsj for some s<t}T_j = \min \{t \geq 0: X_t=j \text{ and } X_s \neq j \text{ for some } s < t\}
Since a CTMC is essentially a DTMC where we hold in each state for an exponential amount of time, we can apply First Step Analysis in essentially the same way that we do for DTMCs. In fact, hitting probabilities will look exactly the same since we can just use the jump chain to comute the transition probabilities. The only differences will arise when we consider the time dependent quantities. For hitting times (how long it takes to enter a state from
ASA\subseteq S
),
  1. 1.
    If
    iA,E[TAX0=i]=0i\in A, \mathbb{E}\left[T_A|X_0=i\right] = 0
  2. 2.
    If
    i∉A,E[TAX0=i]=1qi+jSpijE[TAX0=j]i \not \in A, \mathbb{E}\left[T_A|X_0=i\right] = \frac{1}{q_i} + \sum_{j\in S} p_{ij}\mathbb{E}\left[T_A|X_0=j\right]

Class Properties

Just like in DTMCs, we can classify states in the CTMC.

Definition 67

States
ii
and
jj
communicate with eachc other if
ii
and
jj
communicate in the jump chain.

Definition 68

State
jj
is transient if given
X0=jX_0=j
, the process enters
jj
finitely many times with probability 1. Otherwise, it is recurrent.

Definition 69

A state
jj
is positive recurrent if its time to first re-entry is finite, and null recurrent otherwise.

Long Term Behavior of CTMCs

CTMCs also have stationary distributions.

Definition 70

A probability vector
π\pi
is a stationary ditribution for a CTMC with rate matrix
QQ
if
πQ=0πjqj=ijπiqij.\pi Q = 0 \Leftrightarrow \pi_jq_j = \sum_{i\neq j}\pi_iq_{ij}.
The stationary distribution of the CTMC is also related to the jump chain, but we need to normalize for the hold times.

Theorem 33

If
π\pi
is a stationary distribution for a CTMC, then the stationary distribution of the jump chain is given by
π~i=πiqijπjqj\tilde{\pi}_i = \frac{\pi_i q_i}{\sum_j \pi_j q_j}
To describe how a CTMC behaves over time, first define
pij(t)=Pr{Xt=jX0=i}p_{ij}^{(t)} = \text{Pr}\left\{X_t=j|X_0=i\right\}
and
mj=E[TjX0=j]m_j = \mathbb{E}\left[T_j|X_0=j\right]
.

Theorem 34 (Big Theorem for CTMCs)

For an irreducible CTMC, exactly one of the following is true.
  1. 1.
    All states are transient or null recurrent, no stationary distribution exists, and
    limtpij(t)=0\lim_{t\to\infty}p_{ij}^{(t)} = 0
  2. 2.
    All states are positive recurrent, a unique stationary distribution exists, and the stationary distribution satisfies
πj=1mjqj=limtpij(t)\pi_j = \frac{1}{m_jq_j} = \lim_{t\to\infty}p_{ij}^{(t)}

Uniformization

Let
P(t)P^{(t)}
denote the matrix of transition probabiltiies at time
t>0t>0
. By the Markov property, we know that
P(s+t)=P(s)P(t)P^{(s+t)} = P^{(s)}P^{(t)}
. For
h0,P(h)I+hQ+o(h)h \approx 0, P^{(h)} \approx I + hQ + o(h)
. This approximation allows us to compute the derivative of
P(t)P^{(t)}
.

Theorem 35 (Forward Kolmogorov Equation)

tP(t)=limh0P(t+h)P(t)h=P(t)Q\frac{\partial}{\partial t}P^{(t)} = \lim_{h\to 0}\frac{P^{(t+h)} - P^{(t)}}{h} = P^{(t)}Q
Theorem 35 tells us that the transition probabilties
P(t)=etQP^{(t)} = e^{tQ}
for all
t0t \geq 0
. This is why Q is sometimes called the generator matrix: it generates the transition probabilities. However, matrix exponentials are difficult to compute. Instead, we can turn to Uniformization, which allows us to estimate
P(t)P^{(t)}
by simulating it through a DTMC.

Definition 71

Given a CTMC where
M\exists M
such that
qiMq_{i} \leq M
for all
i,jSi,j\in S
. Fix a
γM\gamma \geq M
, and the uniformized chain will be a DTMC with transition probabilities
pij=qijγp_{ij} = \frac{q_{ij}}{\gamma}
and
pii=1qiγp_{ii} = 1 - \frac{q_i}{\gamma}
.
Pu=I+1γQ.P_u = I + \frac{1}{\gamma}Q.
It turns out that
Pun=(I+1γQ)nenγQP_u^n = \left( I + \frac{1}{\gamma}Q \right)^n \approx e^{\frac{n}{\gamma}Q}
when
1γ\frac{1}{\gamma}
is small. This means that we can approximate the transition probabilties of the CTMC using the uniformized chain. Observe that uniformization also helps in finding the stationary distribution since the stationary distribution of the uniformized chain is identical to the original chain.
πPu=π+1γπQ=ππQ=0.\pi P_u = \pi + \frac{1}{\gamma}\pi Q = \pi \Leftrightarrow \pi Q = 0.

Poisson Processes

Definition 72

A counting process
(Nt)t0(N_t)_{t\geq 0}
is a non-decreasing, continuous time, integer valued random process which has right continuous sample paths.
There are two important metrics which describe counting processes.

Definition 73

The ith arrival time
TiT_i
is given by
Ti=mint{t0: Nti}T_i = \min_t \{ t \geq 0: \ N_t \geq i \}

Definition 74

The ith inter-arrival time
SiS_i
is given by
Si=TiTi1,i>0S_i = T_i - T_{i-1}, i > 0

Definition 75

A rate
λ\lambda
Poisson Process is a counting process with independently and identically distributed inter-arrival times
SiExp(λ)S_i \sim \text{Exp}(\lambda)
.
The name Poisson comes from the distribution of each varible in the process.

Theorem 36

If
(Nt)t0(N_t)_{t\geq 0}
is a rate
λ\lambda
Poisson Process, then for each
t0t\geq 0
,
NtPoisson(λt)N_t\sim \text{Poisson}(\lambda t)
A Poisson Process is a special case of a CTMC where the transition rates
qi=λq_i = \lambda
and the transition probabilties
pijp_{ij}
are 1 if
j=i+1j=i+1
and 0 otherwise. Since the inter-arrival times are memoryless and i.i.d, Poisson Processes have many useful properties.

Theorem 37

If
(Nt)t0(N_t)_{t\geq 0}
is a rate
λ\lambda
Poisson Process, then
(Nt+sNs)t0(N_{t+s} - N_s)_{t\geq0}
is also a rate
λ\lambda
Poisson Process for all
s0s \geq 0
and is independent of the original process.

Theorem 38

For
t0<t1<<tkt_0 < t_1 <\ldots< t_k
, then the increments of a rate
λ\lambda
Poisson Process
(Nt1Nt0),(Nt2Nt1),,(NtkNtk1)(N_{t_1} - N_{t_0}), (N_{t_2} - N_{t_1}),\ldots,(N_{t_k} - N_{t_{k-1}})
are independent and
NtiNti1Poisson(λ(titi1))N_{t_i} - N_{t_{i-1}} \sim \text{Poisson}(\lambda(t_i - t_{i-1}))
Poisson Processes are the only counting processes with these particular properties.
It turns out that Poisson Processes can be connected with the Order Statistics of Uniform Random Variables.

Theorem 39 (Conditional Distribution of Arrivals)

Conditioned on
Nt=nN_t = n
, the random vector
T1,T2,,TnT_1, T_2, \cdots, T_n
has the same distribution as the order statistics of
nn
random variables
UUniform(0,t)U\sim \text{Uniform}(0, t)
.
What Theorem 39 says is that given
nn
arrivals up to time
tt
occur, the distribution of arrival times is equivalent to taking
nn
i.i.d uniform random variables and sorting them.
Two other useful properties of Poisson Processes involve combining and separating them.

Theorem 40 (Poisson Merging)

If
N1,tN_{1,t}
and
N2,tN_{2,t}
are independent Poisson Processes with rates
λ1\lambda_1
and
λ2\lambda_2
, then
N1,t+N2,tN_{1, t} + N_{2,t}
is a Poisson Process with rate
λ1+λ2\lambda_1+\lambda_2
.

Theorem 41 (Poisson Splitting)

Let
p(x)p(x)
be a probability distribution and
NtN_t
be a rate
λ\lambda
Poisson process. If each arrival is marked with the label
ii
independently with probability
p(x=i)p(x=i)
, then
Ni,tN_{i,t}
, the process counting the number of arrivals labeled
ii
is an independent Poisson Process with rate
λpi\lambda p_i
.