Suppose we have a variable X (may or may not be a random variable) that represents the state of nature. We observe a variable Y which is obtained by some model of the world PY∣X.
Suppose we know that X∼π where π is a probability distribution. If we observe Y=y, then the a posteriori estimate of X is given by Bayes Rule
Since the estimate is only dependent on the model and the prior, we don’t actually need to compute the probabilities to figure out the most likely X.
X^MAP(y)=argmaxxPY∣X(y∣x)π(x)
If we have no prior information on X, then we can assume π is uniform, reducing Definition 77 to only optimize over the model.
X^ML(y)=argmaxxPY∣X(y∣x)
A Binary Hypothesis Test is a type of statistical inference where the unknown variable X∈{0,1}.
Since there are only two possible values of X in a binary test, there are two “hypotheses” that we have, and we want to accept the more likely one.
The Null Hypothesis H0 says that Y∼PY∣X=0
The Alternate Hypothesis H1 says that Y∼PY∣X=1
Pr{X^(Y)=1∣X=0}
Pr{X^(Y)=0∣X=1}
Our goal is to create a decision rule X^:Y→{0,1} that we can use to predict X. Based on what the decision rule is used for, there will be requirements on how large the probability of Type I and Type II errors can be. We can formulate the search for a hypothesis test as an optimization. For some β∈[0,1], we want to find
Intuitively, our test should depend on pY∣X(y∣1) and pY∣X(y∣0) since these quantities give us how likely we are to get our observations if we knew the ground truth. We can define a ratio that formally compares these two quantities.
L(y)=pY∣X(y∣0)pY∣X(y∣1)
Notice that we can write MLE as a threshold on the likelihood ratio since if L(y)≥1, then we say X=1, and vice versa. However, there is no particular reason that 1 must always be the number at which we threshold our likelihood ratio, and so we can generalize this idea to form different forms of tests.
For some threshold c and randomization probability γ, a threshold test is of the form
X^(y)=⎩⎨⎧10 Bernoulli(γ) if L(y)>c if L(y)<c if L(y)=c.
X^MAP={10 if L(y)≥π1π0 if L(y)<π1π0
Given β∈[0,1], the optimal decision rule to
X^β(Y)=argminPr{X^(Y)=0∣X=1}:Pr{X^(Y)=1∣X=0}≤β
When L(y) is monotonically increasing or decreasing, we can make the decision rule even simpler since it can be turned into a threshold on y. For example, if L(y) is monotonically inreasing, then an optimal decision rule might be