Since the estimate is only dependent on the model and the prior, we don’t actually need to compute the probabilities to figure out the most likely
X
.
Definition 77
The Maximum A Posteriori (MAP) estimate is given by
X^MAP(y)=argmaxxPY∣X(y∣x)π(x)
If we have no prior information on
X
, then we can assume
π
is uniform, reducing Definition 77 to only optimize over the model.
Definition 78
The Maximum Likelihood (ML) estimate is given by
X^ML(y)=argmaxxPY∣X(y∣x)
Binary Hypothesis Testing
Definition 79
A Binary Hypothesis Test is a type of statistical inference where the unknown variable
X∈{0,1}
.
Since there are only two possible values of
X
in a binary test, there are two “hypotheses” that we have, and we want to accept the more likely one.
Definition 80
The Null Hypothesis
H0
says that
Y∼PY∣X=0
Definition 81
The Alternate Hypothesis
H1
says that
Y∼PY∣X=1
With two possible hypotheses, there are two kinds of errors we can make.
Definition 82
A Type I error (false positive) is when we incorrectly reject the null hypothesis. The Type I error probability is then
Pr{X^(Y)=1∣X=0}
Definition 83
A Type II error (false negative) is when we incorrectly accept the null hypothesis. The Type II error probability is then
Pr{X^(Y)=0∣X=1}
Our goal is to create a decision rule
X^:Y→{0,1}
that we can use to predict
X
. Based on what the decision rule is used for, there will be requirements on how large the probability of Type I and Type II errors can be. We can formulate the search for a hypothesis test as an optimization. For some
since these quantities give us how likely we are to get our observations if we knew the ground truth. We can define a ratio that formally compares these two quantities.
Definition 84
The likelihood ratio is given by
L(y)=pY∣X(y∣0)pY∣X(y∣1)
Notice that we can write MLE as a threshold on the likelihood ratio since if
L(y)≥1
, then we say
X=1
, and vice versa. However, there is no particular reason that
1
must always be the number at which we threshold our likelihood ratio, and so we can generalize this idea to form different forms of tests.
Definition 85
For some threshold
c
and randomization probability
γ
, a threshold test is of the form
X^(y)=⎩⎨⎧10 Bernoulli(γ) if L(y)>c if L(y)<c if L(y)=c.
MAP fits into the framework of a threshold test since we can write
X^MAP={10 if L(y)≥π1π0 if L(y)<π1π0
It turns out that threshold tests are optimal with respect to solving Equation 1.
Theorem 44 (Neyman Pearson Lemma)
Given
β∈[0,1]
, the optimal decision rule to
X^β(Y)=argminPr{X^(Y)=0∣X=1}:Pr{X^(Y)=1∣X=0}≤β
is a threshold test.
When
L(y)
is monotonically increasing or decreasing, we can make the decision rule even simpler since it can be turned into a threshold on
y
. For example, if
L(y)
is monotonically inreasing, then an optimal decision rule might be