Conditional entropy has a natural interpretation which is that it tells us how surprised we are to see
given that we know
are independent, then
gives no additional information about
Theorem 23 (Chain Rule of Entropy)
In addition to knowing how much our surprise changes for a random variable when we observe a different random variable, we can also quantify how much additional information observing a random variable gives us about another.
For random variables
, the mutual information is given by
Source coding deals with finding the minimal number of bits required to represent data. This is essentially the idea of lossless compression. In this case, our message is the sequence of realizations of independently and identically distributed random variables
. The probability of observing a particular sequence is then
Theorem 24 (Asymptotic Equipartition Property)
If we have a sequence of independently and identically distributed random variables
Theorem 24 tells us that with overwhelming probability, we will observe a sequence that is assigned probability
. Using this idea, we can define a subset of possible observed sequences that in the limit, our observed sequence must belong to with overwhelming probability.