Comment on page
Information Theory
Information Theory is a field which addresses two questions
- 1.Source Coding: How many bits do I need to losslessly represent an observation.
- 2.Channel Coding: How reliably and quickly can I communicate a message over a noisy channel.
Intuitively, for a PMF of a disrete random variable, the surprise associated with a particular realization is
since less probable realizations are more surprising. With this intuition, we can try and quantify the “expected surprise” of a distribution.
Alternative interpretations of entropy are the average uncertainty and how random
is. Just like probabilites, we can define both joint and conditional entropies.
Conditional entropy has a natural interpretation which is that it tells us how surprised we are to see
given that we know
. If
and
are independent, then
because realizing
gives no additional information about
.
In addition to knowing how much our surprise changes for a random variable when we observe a different random variable, we can also quantify how much additional information observing a random variable gives us about another.
Source coding deals with finding the minimal number of bits required to represent data. This is essentially the idea of lossless compression. In this case, our message is the sequence of realizations of independently and identically distributed random variables
. The probability of observing a particular sequence is then
Theorem 24 tells us that with overwhelming probability, we will observe a sequence that is assigned probability
. Using this idea, we can define a subset of possible observed sequences that in the limit, our observed sequence must belong to with overwhelming probability.
Two important properties of the typical set are that
- 1.
- 2.
The typical set gives us an easy way to do source coding. If I have
total objects, then I only need
bits to represent each object, so I can define a simple protocol which is
- 1.If, then describe them using thebits
- 2.If, then describe them naiively withbits.
This makes the average number of bits required to describe a message
This is the first half of a central result of source coding.
This lends a new interpretation of the entropy
: it is the average number of bits required to represent
.
Whereas source coding deals with encoding information, channel coding deals with transmitting it over a noisy channel. In general, we have a message
, and encoder, a channel, and a decoder as in Figure 1.

Figure 1: Channel Coding
Each channel can be described by a conditional probability distribution
for each time the channel is used.
In words, the capacity describes the maximum mutual information between the channel input and output.
Last modified 1yr ago