Statistical Inference

Suppose we have a variable

$X$

(may or may not be a random variable) that represents the state of nature. We observe a variable $Y$

which is obtained by some model of the world $P_{Y|X}$

.Figure 2: Inference Setup

Suppose we know that *a posteriori* estimate of

$X\sim \pi$

where $\pi$

is a probability distribution. If we observe $Y=y$

, then the $X$

is given by Bayes Rule

$\text{Pr}\left\{X=x | Y=y\right\} = \frac{P_{Y|X}(y|x)\pi(x)}{\sum_{\tilde{x}}P_{Y|X}(y|\tilde{x})\pi(\tilde{x})} \propto P_{Y|X}(y|x)\pi(x).$

Since the estimate is only dependent on the model and the prior, we don’t actually need to compute the probabilities to figure out the most likely

$X$

.Definition 77

The Maximum A Posteriori (MAP) estimate is given by

$\hat{X}_{MAP}(y) = \text{argmax}_x P_{Y|X}(y|x)\pi(x)$

If we have no prior information on

$X$

, then we can assume $\pi$

is uniform, reducing Definition 77 to only optimize over the model.Definition 78

The Maximum Likelihood (ML) estimate is given by

$\hat{X}_{ML}(y) = \text{argmax}_x P_{Y|X}(y|x)$

Binary Hypothesis Testing

Definition 79

A Binary Hypothesis Test is a type of statistical inference where the unknown variable

$X\in\{ 0, 1 \}$

.Since there are only two possible values of

$X$

in a binary test, there are two “hypotheses” that we have, and we want to accept the more likely one.Definition 80

The Null Hypothesis

$H_0$

says that $Y\sim P_{Y|X=0}$

Definition 81

The Alternate Hypothesis

$H_1$

says that $Y\sim P_{Y|X=1}$

With two possible hypotheses, there are two kinds of errors we can make.

Definition 82

A Type I error (false positive) is when we incorrectly reject the null hypothesis. The Type I error probability is then

$\text{Pr}\left\{\hat{X}(Y) = 1 | X = 0\right\}$

Definition 83

A Type II error (false negative) is when we incorrectly accept the null hypothesis. The Type II error probability is then

$\text{Pr}\left\{\hat{X}(Y) = 0 | X = 1\right\}$

Our goal is to create a decision rule

$\hat{X}: \mathcal{Y} \to \{0, 1\}$

that we can use to predict $X$

. Based on what the decision rule is used for, there will be requirements on how large the probability of Type I and Type II errors can be. We can formulate the search for a hypothesis test as an optimization. For some $\beta \in [0, 1]$

, we want to find

$\hat{X}_\beta(Y) = \text{argmin} \text{Pr}\left\{\hat{X}(Y)=0 | X=1\right\} \quad : \quad \text{Pr}\left\{\hat{X}(Y)=1|X=0\right\} \leq \beta. \qquad (1)$

Intuitively, our test should depend on

$p_{Y|X}(y|1)$

and $p_{Y|X}(y|0)$

since these quantities give us how likely we are to get our observations if we knew the ground truth. We can define a ratio that formally compares these two quantities.Definition 84

The likelihood ratio is given by

$L(y) = \frac{p_{Y|X}(y|1)}{p_{Y|X}(y|0)}$

Notice that we can write MLE as a threshold on the likelihood ratio since if

$L(y) \geq 1$

, then we say $X=1$

, and vice versa. However, there is no particular reason that $1$

must always be the number at which we threshold our likelihood ratio, and so we can generalize this idea to form different forms of tests.Definition 85

For some threshold

$c$

and randomization probability $\gamma$

, a threshold test is of the form

$\hat{X}(y) = \begin{cases} 1 & \text{ if } L(y) > c\\ 0 & \text{ if } L(y) < c\\ \text{ Bernoulli}(\gamma) & \text { if } L(y) = c. \end{cases}$

MAP fits into the framework of a threshold test since we can write

$\hat{X}_{MAP} = \begin{cases} 1 & \text{ if } L(y) \geq \frac{\pi_0}{\pi_1}\\ 0 & \text{ if } L(y) < \frac{\pi_0}{\pi_1} \end{cases}$

It turns out that threshold tests are optimal with respect to solving Equation 1.

Theorem 44 (Neyman Pearson Lemma)

Given

$\beta\in[0, 1]$

, the optimal decision rule to

$\hat{X}_\beta(Y) = \text{argmin} \text{Pr}\left\{\hat{X}(Y)=0 | X=1\right\} \quad : \quad \text{Pr}\left\{\hat{X}(Y)=1|X=0\right\} \leq \beta$

is a threshold test.

When

$L(y)$

is monotonically increasing or decreasing, we can make the decision rule even simpler since it can be turned into a threshold on $y$

. For example, if $L(y)$

is monotonically inreasing, then an optimal decision rule might be

$\hat{X}(y) = \begin{cases} 1 & \text{ if } y > c\\ 0 & \text{ if } y < c\\ \text{Bernoulli}(\gamma) & \text{ if } y = c. \end{cases}$

Last modified 4mo ago

Copy link

Contents

Binary Hypothesis Testing