# Random Processes

The random variables in a stochastic process do not have to be independently and identically distributed. In fact, if they are not, then we can get additional modeling power.

Stationarity is often a good assumption that can simplify systems which have been running for a long period of time.

In words, a Markov Chain is a sequence of random variables satisfying the Markov Property where probability of being in a state during the next time step only depends on the current state.

Temporally Homogenous Markov Chains don’t change their transition probabilities over time. Since the

$p_{ij}$

are conditional probabilities, they must satisfy- 1.$\forall i,j\in S,\ p_{ij} \geq 0$
- 2.$\forall i\in S,\ \sum_{j\in S}p_{ij} = 1$

The transition matrix encodes the one-step transition probabilities of the Markov Chain.

One useful thing we can comptue with Markov Chain is when the chain first enters a particular state.

Computing the expected hitting time is an example of a broader type of Markov Chain Analysis called

**First Step Analysis**. In First Step Analysis, we set up a system of equations that relies on the Markov property to generate a system of equations that only look at the first transition in the chain. For expected hitting time, these look like- 1.For$i\not\in A$,$\mathbb{E}\left[T_A|X_0 = i\right] = 1 + \sum_j p_{ij} \mathbb{E}\left[T_A|X_0 = j\right]$
- 2.For$i\in A$,$\mathbb{E}\left[T_A|X_0 = i\right] = 0$

By convention, we say that

$i\leftrightarrow i$

. It turns out that $\leftrightarrow$

is an equivalence relation on the state space $S$

. An equivalence relation means that- 1.$\forall i\in S,\ i \leftrightarrow i$
- 2.$\forall i,j\in S,\ i\leftrightarrow j \Leftrightarrow j \leftrightarrow i$
- 3.$\forall i,j,k \in S, i\leftrightarrow k, k\leftrightarrow j \mathbb{R}ightarrow i \leftrightarrow j$

This means that

$\leftrightarrow$

partitions the state-space $S$

into equivalence classes (i.e classes of communicating states).Markov Chains which satisfy the detailed balance equations are called reversible because if

$X_0\sim \pi$

, then the random vectors $(X_0, X_1, \cdots, X_n)$

and $(X_n, X_{n-1}, \cdots, X_0)$

are equal in distribution.A class property is a property where if one element of a class has the property, all elements of the class have the property. Markov Chains have several of these properties which allow us to classify states.

Recurrence means that we will visit a state infinitely often in the future if we start in that state, while transience means we will only visit the state finitely many times. Recurrence and transience can be easily identified from the transition diagram.

- 1.Any finite communicating class which has no edges leaving the class is recurrent
- 2.If a state has an edge leading outside its communicating class, then it is transient
- 3.If a state is recurrent, then any state it can reach is recurrent

We can further break recurrence down if we modify the definition of hitting time to be

$T_i = \min_n \{ n \geq 1 : X_n=i \}$

(the first time the chain enters state $i$

).Positive recurrence means we visit a recurrent state so frequently that we spend a positive fraction of time in that state. Null recurrencce means we visit a recurrent state so infrequently (but still infinitely many times) that we spend virtually no time in that state.

If we start in state

$i$

, then revists to $i$

only occur at integer multiples of the period.All of the above properties are class properties.

Since the

$p_{ij}$

completely characterize the Markov Chain, we can also describe what happens to the chain in the limit.It is called a stationary distribution because the distribution over states is invariant with time. A Markov Chain is only at stationarity if and only if it has been started from the stationary distribution. The relationship

$\pi = \pi P$

can be expanded for the jth element to show that any stationary distribution must satisfy the **Global Balance Equations**:

$\pi_j = \sum_i p_{ij}\pi_i.$

Note that if a distribution

$\pi$

satisfies the detailed balance equations from Definition 56, then $\pi$

also satisfies Definition 63.Both the global balance equations and detailed balance equations can be conceptualized as statements of flow. If each

$\pi_j$

indicates how much mass is placed on state $j$

, then the global balance equations tell us the mass leaving the node (going to each neighbor $i$

in proportion to $p_{ij}$

) is equal to the mass entering the node (which must sum to $\pi_j$

since it is a stationary distribution. Rather than looking at the flow of the whole chain, the detailed balance equations look at the flow between two states. The mass $i$

gives to $j$

is equal to the mass $j$

gives to $i$

.Theorem 30 is one useful result can help solve for stationary distributions.

Let

$(X_n)_{n\geq 0}$

be an irreducible Markov Chain. Then one of the following is true.- 1.Either all states are transient, or all states are null recurrent, and no stationary distribution exists, and$\lim_{n\to\infty}p_{ij}^{(n)} = 0$.
- 2.All states are positive recurrent and the stationary distribution exists, is unique, and satisfies

$\pi_j = \lim_{n\to\infty}\frac{1}{n}\sum_{k=0}^{n}P_{ij}^{(k)} = \frac{1}{\mathbb{E}\left[T_j|X_0=j\right] }.$

If the Markov Chain is aperiodic, then

$\lim_{n\to\infty}p_{ij}^{(n)} = \pi_j$

One consequence of Theorem 31 is that it means the stationary distribution

$\pi$

of a reversible Markov Chain is unique. This makes solving the detailed balance equations a good technique of solving for the stationary distribution. If a stationary distribution exists, then we can also say when the chain will converge to the stationary distribution.A process

$(X_t)_{t\geq 0}$

taking values in a countable state space $S$

is a temporally homogenous continuous time markov chain if it satisfies the Markov Property

$\text{Pr}\left\{X_{t+\tau}=j|X_t=i,X_s=i_s, 0 \leq s \leq t\right\} = \text{Pr}\left\{X_{t+\tau}=j|X_t=i\right\} = \text{Pr}\left\{X_\tau = j | X_0 = i\right\}$

To characterize how a CTMC functions, we need to define some additional quantities.

- 1.$q_i$is the transition rate of state$i$
- 2.$p_{ij}$is the transition probability bewteen states$i$and$j$

Every time a CTMC enters a state

$i$

, it will hold in that state for $\text{Exp}(q_i)$

time before transitioning to the next state $j$

with probability $p_{ij}$

.Note that the jump chain cannot have self-loops (

$p_{ii}=0$

) because otherwise the amount of time spent in state $i$

would not be exponentially distributed. An alternative interpretation of a CTMC is- 1.Define jump rates$q_{ij} = q_i p_{ij}$
- 2.On entering state$i$, jump to$j^\star = \text{argmin}_j T_j$where$T_j \sim \text{Exp}(q_{ij})$for all$j\neq i$and are independent from each other.

Essentially, every time we enter a state, we set an alarm clock for all other states, and then jump to the state whose alarm clock goes off first. This equivalent interpretation allows us to summarize a CTMC using the rate matrix.

$Q_{ij} = \begin{cases} -q_i & \text{ if } i=j\\ q_{ij} & \text{ if } i \neq j \end{cases}$

Following from the first interprentation, all entries of

$Q$

are non-negative, and the rows must sum to 0. One useful quantity which we can define is how long it takes to come back to a particular state.Since a CTMC is essentially a DTMC where we hold in each state for an exponential amount of time, we can apply First Step Analysis in essentially the same way that we do for DTMCs. In fact, hitting probabilities will look exactly the same since we can just use the jump chain to comute the transition probabilities. The only differences will arise when we consider the time dependent quantities. For hitting times (how long it takes to enter a state from

$A\subseteq S$

),- 1.If$i\in A, \mathbb{E}\left[T_A|X_0=i\right] = 0$
- 2.If$i \not \in A, \mathbb{E}\left[T_A|X_0=i\right] = \frac{1}{q_i} + \sum_{j\in S} p_{ij}\mathbb{E}\left[T_A|X_0=j\right]$

Just like in DTMCs, we can classify states in the CTMC.

CTMCs also have stationary distributions.

The stationary distribution of the CTMC is also related to the jump chain, but we need to normalize for the hold times.

To describe how a CTMC behaves over time, first define

$p_{ij}^{(t)} = \text{Pr}\left\{X_t=j|X_0=i\right\}$

and $m_j = \mathbb{E}\left[T_j|X_0=j\right]$

.For an irreducible CTMC, exactly one of the following is true.

- 1.All states are transient or null recurrent, no stationary distribution exists, and$\lim_{t\to\infty}p_{ij}^{(t)} = 0$
- 2.All states are positive recurrent, a unique stationary distribution exists, and the stationary distribution satisfies

$\pi_j = \frac{1}{m_jq_j} = \lim_{t\to\infty}p_{ij}^{(t)}$

Let

$P^{(t)}$

denote the matrix of transition probabiltiies at time $t>0$

. By the Markov property, we know that $P^{(s+t)} = P^{(s)}P^{(t)}$

. For $h \approx 0, P^{(h)} \approx I + hQ + o(h)$

. This approximation allows us to compute the derivative of $P^{(t)}$

.Theorem 35 tells us that the transition probabilties

$P^{(t)} = e^{tQ}$

for all $t \geq 0$

. This is why Q is sometimes called the generator matrix: it generates the transition probabilities. However, matrix exponentials are difficult to compute. Instead, we can turn to **Uniformization**, which allows us to estimate$P^{(t)}$

by simulating it through a DTMC.It turns out that

$P_u^n = \left( I + \frac{1}{\gamma}Q \right)^n \approx e^{\frac{n}{\gamma}Q}$

when

$\frac{1}{\gamma}$

is small. This means that we can approximate the transition probabilties of the CTMC using the uniformized chain. Observe that uniformization also helps in finding the stationary distribution since the stationary distribution of the uniformized chain is identical to the original chain.

$\pi P_u = \pi + \frac{1}{\gamma}\pi Q = \pi \Leftrightarrow \pi Q = 0.$

There are two important metrics which describe counting processes.

The name Poisson comes from the distribution of each varible in the process.

A Poisson Process is a special case of a CTMC where the transition rates

$q_i = \lambda$

and the transition probabilties $p_{ij}$

are 1 if $j=i+1$

and 0 otherwise. Since the inter-arrival times are memoryless and i.i.d, Poisson Processes have many useful properties.Poisson Processes are the only counting processes with these particular properties.

It turns out that Poisson Processes can be connected with the Order Statistics of Uniform Random Variables.

What Theorem 39 says is that given

$n$

arrivals up to time $t$

occur, the distribution of arrival times is equivalent to taking $n$

i.i.d uniform random variables and sorting them.Two other useful properties of Poisson Processes involve combining and separating them.

Let

$p(x)$

be a probability distribution and $N_t$

be a rate $\lambda$

Poisson process. If each arrival is marked with the label $i$

independently with probability $p(x=i)$

, then $N_{i,t}$

, the process counting the number of arrivals labeled $i$

is an independent Poisson Process with rate $\lambda p_i$

.Last modified 6mo ago