# Statistical Inference

Suppose we have a variable $$X$$ (may or may not be a random variable) that represents the state of nature. We observe a variable $$Y$$ which is obtained by some model of the world $$P\_{Y|X}$$.

![Figure 2: Inference Setup](/files/-M_CL1zgzp51G0CZI0Uv)

Suppose we know that $$X\sim \pi$$ where $$\pi$$ is a probability distribution. If we observe $$Y=y$$, then the *a posteriori* estimate of $$X$$ is given by Bayes Rule

$$\text{Pr}\left{X=x | Y=y\right} = \frac{P\_{Y|X}(y|x)\pi(x)}{\sum\_{\tilde{x}}P\_{Y|X}(y|\tilde{x})\pi(\tilde{x})} \propto P\_{Y|X}(y|x)\pi(x).$$

Since the estimate is only dependent on the model and the prior, we don’t actually need to compute the probabilities to figure out the most likely $$X$$.

{% hint style="info" %}

#### Definition 77

The Maximum A Posteriori (MAP) estimate is given by

$$\hat{X}\_{MAP}(y) = \text{argmax}*x P*{Y|X}(y|x)\pi(x)$$
{% endhint %}

If we have no prior information on $$X$$, then we can assume $$\pi$$ is uniform, reducing Definition 77 to only optimize over the model.

{% hint style="info" %}

#### Definition 78

The Maximum Likelihood (ML) estimate is given by

$$\hat{X}\_{ML}(y) = \text{argmax}*x P*{Y|X}(y|x)$$
{% endhint %}

## Binary Hypothesis Testing

{% hint style="info" %}

#### Definition 79

A Binary Hypothesis Test is a type of statistical inference where the unknown variable $$X\in{ 0, 1 }$$.
{% endhint %}

Since there are only two possible values of $$X$$ in a binary test, there are two “hypotheses” that we have, and we want to accept the more likely one.

{% hint style="info" %}

#### Definition 80

The Null Hypothesis $$H\_0$$ says that $$Y\sim P\_{Y|X=0}$$
{% endhint %}

{% hint style="info" %}

#### Definition 81

The Alternate Hypothesis $$H\_1$$ says that $$Y\sim P\_{Y|X=1}$$
{% endhint %}

With two possible hypotheses, there are two kinds of errors we can make.

{% hint style="info" %}

#### Definition 82

A Type I error (false positive) is when we incorrectly reject the null hypothesis. The Type I error probability is then

$$\text{Pr}\left{\hat{X}(Y) = 1 | X = 0\right}$$
{% endhint %}

{% hint style="info" %}

#### Definition 83

A Type II error (false negative) is when we incorrectly accept the null hypothesis. The Type II error probability is then

$$\text{Pr}\left{\hat{X}(Y) = 0 | X = 1\right}$$
{% endhint %}

Our goal is to create a decision rule $$\hat{X}: \mathcal{Y} \to {0, 1}$$ that we can use to predict $$X$$. Based on what the decision rule is used for, there will be requirements on how large the probability of Type I and Type II errors can be. We can formulate the search for a hypothesis test as an optimization. For some $$\beta \in \[0, 1]$$, we want to find

$$\hat{X}\_\beta(Y) = \text{argmin} \text{Pr}\left{\hat{X}(Y)=0 | X=1\right} \quad : \quad \text{Pr}\left{\hat{X}(Y)=1|X=0\right} \leq \beta. \qquad (1)$$

Intuitively, our test should depend on $$p\_{Y|X}(y|1)$$ and $$p\_{Y|X}(y|0)$$ since these quantities give us how likely we are to get our observations if we knew the ground truth. We can define a ratio that formally compares these two quantities.

{% hint style="info" %}

#### Definition 84

The likelihood ratio is given by

$$L(y) = \frac{p\_{Y|X}(y|1)}{p\_{Y|X}(y|0)}$$
{% endhint %}

Notice that we can write MLE as a threshold on the likelihood ratio since if $$L(y) \geq 1$$, then we say $$X=1$$, and vice versa. However, there is no particular reason that $$1$$ must always be the number at which we threshold our likelihood ratio, and so we can generalize this idea to form different forms of tests.

{% hint style="info" %}

#### Definition 85

For some threshold $$c$$ and randomization probability $$\gamma$$, a threshold test is of the form

$$\hat{X}(y) = \begin{cases} 1 & \text{ if } L(y) > c\ 0 & \text{ if } L(y) < c\ \text{ Bernoulli}(\gamma) & \text { if } L(y) = c. \end{cases}$$
{% endhint %}

MAP fits into the framework of a threshold test since we can write

$$\hat{X}\_{MAP} = \begin{cases} 1 & \text{ if } L(y) \geq \frac{\pi\_0}{\pi\_1}\ 0 & \text{ if } L(y) < \frac{\pi\_0}{\pi\_1} \end{cases}$$

It turns out that threshold tests are optimal with respect to solving Equation 1.

{% hint style="info" %}

#### Theorem 44 (Neyman Pearson Lemma) <a href="#theorem-44" id="theorem-44"></a>

Given $$\beta\in\[0, 1]$$, the optimal decision rule to

$$\hat{X}\_\beta(Y) = \text{argmin} \text{Pr}\left{\hat{X}(Y)=0 | X=1\right} \quad : \quad \text{Pr}\left{\hat{X}(Y)=1|X=0\right} \leq \beta$$

is a threshold test.
{% endhint %}

When $$L(y)$$ is monotonically increasing or decreasing, we can make the decision rule even simpler since it can be turned into a threshold on $$y$$. For example, if $$L(y)$$ is monotonically inreasing, then an optimal decision rule might be

$$\hat{X}(y) = \begin{cases} 1 & \text{ if } y > c\ 0 & \text{ if } y < c\ \text{Bernoulli}(\gamma) & \text{ if } y = c. \end{cases}$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aparande.gitbook.io/berkeley-notes/eecs126-0/eecs126-7.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
