# Concentration

In real life, for the most part, we can’t compute probabilities in closed form. Instead, we either bound them, or we want to show that

$P(A) \approx 0$

or $P(A) \approx 1$

.Intuitively, Theorem 18 gives gives a “better” bound than Theorem 17 because it incorporates the variance of the random variable. Using this idea, we can define an even better bound that incorporates information from all moments of the random variable.

After computing the Chernoff bound for a general

$t$

, we can then optimize over it to compute the best bound possible.The idea of convergence brings the mathematical language of limits into probability. The fundamental question we want to answer is given random variables

$X_1, X_2, \cdots$

, what does it mean to compute

$\lim_{n\to\infty}X_n.$

This question is not as straightforward as it seems because random variables are functions, and there are many ways to define the convergence of functions.

One result of almost sure convergence deals with deviations around the mean of many samples.

The strong law tells us that for any observed realization, there is a point after which there are no deviations from the mean.

Convergence in probability can help us formalize the intuition that we have which says probability is the frequency with which an even happens over many trials of an event.

It tells us that the probability of a deviation of

$\epsilon$

from the true mean will go to 0 in the limit, but we can still observe these deviations. Nevertheless, the weak law helps us formalize our intuition about probability. If $X_1, X_2, \cdots, X_n$

are independently and identically distributed according to $X$

, then we can define the empirical frequency

$F_n = \frac{\sum\mathbb{1}_{X_i\in B}}{n} \implies \mathbb{E}\left[F_n\right] = P(X \in B).$

By Theorem 20,

$\lim_{n\to\infty}\text{Pr}\left\{|F_n - P(X\in B)| > \epsilon\right\} = 0,$

meaning over many trials, the empirical frequency is equal to the probility of the event, matching intuition.

An example of convergence in distribution is the central limit theorem.

If

$X_1, X_2, \cdots$

are independently and identically distributed according to $X$

with $\text{Var}\left(X\right) = \sigma^2$

and $\mathbb{E}\left[X\right] = \mu$

, then

$\lim_{n\to\infty}P\left(\frac{\sum_{i=1}^nX_i - n\mu}{\sigma\sqrt{n}} \leq x\right) = \Phi(x)$

In other words, a sequence of random variables converges in distribution to a normal distribution with variance

$\sigma^2$

and mean $\mu$

.These notions of convergence are not identical, and they do not necessarily imply each other. It is true that almost sure convergence implies convergence in probability, and convergence in probability implies convergence in distribution, but the implication is only one way.

Once we know how a random variable converges, we can then also find how functions of that random variable converge.

Last modified 11mo ago