Berkeley Notes
  • Introduction
  • EE120
    • Introduction to Signals and Systems
    • The Fourier Series
    • The Fourier Transform
    • Generalized transforms
    • Linear Time-Invariant Systems
    • Feedback Control
    • Sampling
    • Appendix
  • EE123
    • The DFT
    • Spectral Analysis
    • Sampling
    • Filtering
  • EECS126
    • Introduction to Probability
    • Random Variables and their Distributions
    • Concentration
    • Information Theory
    • Random Processes
    • Random Graphs
    • Statistical Inference
    • Estimation
  • EECS127
    • Linear Algebra
    • Fundamentals of Optimization
    • Linear Algebraic Optimization
    • Convex Optimization
    • Duality
  • EE128
    • Introduction to Control
    • Modeling Systems
    • System Performance
    • Design Tools
    • Cascade Compensation
    • State-Space Control
    • Digital Control Systems
    • Cayley-Hamilton
  • EECS225A
    • Hilbert Space Theory
    • Linear Estimation
    • Discrete Time Random Processes
    • Filtering
  • EE222
    • Real Analysis
    • Differential Geometry
    • Nonlinear System Dynamics
    • Stability of Nonlinear Systems
    • Nonlinear Feedback Control
Powered by GitBook
On this page
  • Concentration Inequalities
  • Convergence

Was this helpful?

  1. EECS126

Concentration

PreviousRandom Variables and their DistributionsNextInformation Theory

Last updated 3 years ago

Was this helpful?

In real life, for the most part, we can’t compute probabilities in closed form. Instead, we either bound them, or we want to show that P(A)≈0P(A) \approx 0P(A)≈0 or P(A)≈1P(A) \approx 1P(A)≈1.

Concentration Inequalities

Theorem 17 (Markov's Inequality)

For a non-negative random variable XXX,

Pr{X≥t}≤E[X]t,t≥0.\text{Pr}\left\{X \geq t\right\} \leq \frac{\mathbb{E}\left[X\right] }{t}, \quad t \geq 0.Pr{X≥t}≤tE[X]​,t≥0.

Theorem 18 (Chebyshev's Inequality)

If XXX is a random variable, then

Pr{∣X−E[X]∣≥t}≤Var(X)t2.\text{Pr}\left\{|X - \mathbb{E}\left[X\right] | \geq t\right\} \leq \frac{\text{Var}\left(X\right) }{t^2}.Pr{∣X−E[X]∣≥t}≤t2Var(X)​.

Intuitively, Theorem 18 gives gives a “better” bound than Theorem 17 because it incorporates the variance of the random variable. Using this idea, we can define an even better bound that incorporates information from all moments of the random variable.

Definition 36 (Chernoff Bound)

For a random variable XXX and a∈Ra\in\mathbb{R}a∈R,

Pr{X≥a}≤E[etX]eta=e−taMx(t).\text{Pr}\left\{X \geq a\right\} \leq \frac{\mathbb{E}\left[e^{tX}\right] }{e^{ta}} = e^{-ta}M_x(t).Pr{X≥a}≤etaE[etX]​=e−taMx​(t).

After computing the Chernoff bound for a general ttt, we can then optimize over it to compute the best bound possible.

Convergence

This question is not as straightforward as it seems because random variables are functions, and there are many ways to define the convergence of functions.

Definition 37

One result of almost sure convergence deals with deviations around the mean of many samples.

Theorem 19 (Strong Law of Large Numbers)

The strong law tells us that for any observed realization, there is a point after which there are no deviations from the mean.

Definition 38

A sequence of random variables converges in probability if

Convergence in probability can help us formalize the intuition that we have which says probability is the frequency with which an even happens over many trials of an event.

Theorem 20 (Weak Law of Large Numbers)

By Theorem 20,

meaning over many trials, the empirical frequency is equal to the probility of the event, matching intuition.

Definition 39

A sequence of random variables converges in distribution if

An example of convergence in distribution is the central limit theorem.

Theorem 21 (Central Limit Theorem)

These notions of convergence are not identical, and they do not necessarily imply each other. It is true that almost sure convergence implies convergence in probability, and convergence in probability implies convergence in distribution, but the implication is only one way.

Once we know how a random variable converges, we can then also find how functions of that random variable converge.

Theorem 22 (Continuous Mapping Theorem)

The idea of convergence brings the mathematical language of limits into probability. The fundamental question we want to answer is given random variables X1,X2,⋯X_1, X_2, \cdotsX1​,X2​,⋯, what does it mean to compute

lim⁡n→∞Xn.\lim_{n\to\infty}X_n.limn→∞​Xn​.

A sequence of random variables converges almost surely to XXX if

P(lim⁡n→∞Xn=X)=1P\left(\lim_{n\to \infty}X_n = X\right) = 1P(limn→∞​Xn​=X)=1

If X1,X2,⋯ ,XnX_1, X_2, \cdots, X_nX1​,X2​,⋯,Xn​ are independently and identically distributed to XXX where E[X]<∞\mathbb{E}\left[X\right] < \inftyE[X]<∞, then 1n∑iXi\frac{1}{n}\sum_i X_in1​∑i​Xi​ converges almost surely to E[X]\mathbb{E}\left[X\right] E[X].

∀ϵ>0,lim⁡n→∞P(∣Xn−X∣>ϵ)=0\forall \epsilon > 0, \quad \lim_{n\to\infty}P(|X_n - X| > \epsilon) = 0∀ϵ>0,limn→∞​P(∣Xn​−X∣>ϵ)=0

Let X1,X2,⋯ ,XnX_1, X_2, \cdots, X_nX1​,X2​,⋯,Xn​ be independently and identically distributed according to XXX, and let Mn=1n∑XiM_n = \frac{1}{n}\sum X_iMn​=n1​∑Xi​. Then for ϵ>0\epsilon > 0ϵ>0,

lim⁡n→∞Pr{∣Mn−E[X]∣>ϵ}=0.\lim_{n\to\infty} \text{Pr}\left\{|M_n - \mathbb{E}\left[X\right] | > \epsilon\right\} = 0.limn→∞​Pr{∣Mn​−E[X]∣>ϵ}=0.

It tells us that the probability of a deviation of ϵ\epsilonϵ from the true mean will go to 0 in the limit, but we can still observe these deviations. Nevertheless, the weak law helps us formalize our intuition about probability. If X1,X2,⋯ ,XnX_1, X_2, \cdots, X_nX1​,X2​,⋯,Xn​ are independently and identically distributed according to XXX, then we can define the empirical frequency

Fn=∑1Xi∈Bn  ⟹  E[Fn]=P(X∈B).F_n = \frac{\sum\mathbb{1}_{X_i\in B}}{n} \implies \mathbb{E}\left[F_n\right] = P(X \in B).Fn​=n∑1Xi​∈B​​⟹E[Fn​]=P(X∈B).

lim⁡n→∞Pr{∣Fn−P(X∈B)∣>ϵ}=0,\lim_{n\to\infty}\text{Pr}\left\{|F_n - P(X\in B)| > \epsilon\right\} = 0,limn→∞​Pr{∣Fn​−P(X∈B)∣>ϵ}=0,

lim⁡n→∞FXn(x)=Fx(x).\lim_{n\to\infty}F_{X_n}(x) = F_x(x).limn→∞​FXn​​(x)=Fx​(x).

If X1,X2,⋯X_1, X_2, \cdotsX1​,X2​,⋯ are independently and identically distributed according to XXX with Var(X)=σ2\text{Var}\left(X\right) = \sigma^2Var(X)=σ2 and E[X]=μ\mathbb{E}\left[X\right] = \muE[X]=μ, then

lim⁡n→∞P(∑i=1nXi−nμσn≤x)=Φ(x)\lim_{n\to\infty}P\left(\frac{\sum_{i=1}^nX_i - n\mu}{\sigma\sqrt{n}} \leq x\right) = \Phi(x)limn→∞​P(σn​∑i=1n​Xi​−nμ​≤x)=Φ(x)

In other words, a sequence of random variables converges in distribution to a normal distribution with variance σ2\sigma^2σ2 and mean μ\muμ.

If fff is a continuous function, then if XnX_nXn​ converges to XXX, then f(Xn)f(X_n)f(Xn​) converges to f(X)f(X)f(X). The convergence can be almost surely, in probability, or in distribution.