Consider the following:
Alice and Bob have figured out how to
transmit messages between their treehouses.
At first, they used flames at night, and shutters during the day.
Then they used a wire, which they plucked in different ways.
Eventually, they electrified this wire to send electrical pulses,
and were now at work on an experimental wireless method.
The problem is in order to pay for their equipment,
they needed money.
So they decided to offer their service --
for a fee to others.
And on the first day, Alice had three new customers
who wanted to transmit messages to their friends
over at Bob's treehouse.
The first customer wanted to send a list of 10 coin flips.
The second customer wanted to send a 6-letter word.
And the third customer wanted to send a poker hand.
The question now is, "How much should she charge?"
Well, the price of a message should depend on
how long it takes Alice to transmit it.
But how could she measure the length
of different types of messages using a common unit?
To find out, let's play a game.
Imagine you are Bob now.
And you know Alice wants to send you these messages.
But all you can do is get the answer to
yes-or-no questions you've arranged.
Alice will answer by sending a sequence
of zeros or ones using some method of variation.
Recall that all of their methods of transmission
involve the exchange of differences.
So, a 1 could be represented by an open flame,
or an open shutter, or an electrical pulse.
No matter how they are manifested,
we can simply call them 'binary digits' --
because a binary digit can have only one of two values -- 0 or 1.
So, let's say 0 represents a 'no,'
and 1 represents a 'yes.'
Your challenge, now, is to always ask
the minimum number of questions
in order to determine the exact message.
First, let's consider the coin flips.
For each symbol, the sender, Alice,
can be thought of as selecting
one of two different symbols --
'heads' or 'tails.'
Now how many questions do you need to ask
to determine which she selected?
One questions -- such as, "Is it heads?" -- will suffice.
For 10 flips, what is the minimum number of questions?
Well, 10 flips times one question per flip
equals 10 questions --
or 10 binary digits to transmit this message.
Next, let's consider the letters.
For each symbol, the sender, Alice,
can be thought of as selecting
1 of 26 different symbols.
Let's start with the simplest message --
which is 1 letter.
How many questions are needed?
Is it A?
Is it B?
Is it C?
Is it D?
And so on.
But that is not the minimum number of questions.
The best you could do is ask questions
which eliminate half of the possibilities.
For example the middle of the alphabet
is between M and N.
So we could first ask, "Is it less than N?"
If we receive a 1 -- yes --
we cut out half of the possibilities --
[so we have] 13 left.
And since we can't split a letter in half,
we divide the possible symbols into sets of 6 and 7,
and ask is it less than G?
We receive a 1, which is yes.
And now we are left with 6 possible letters.
And we can split them in half, and ask,
"Is it less than D?"
We receive a 0, which is no --
leaving us with three possible letters.
And now we can pick a side and ask, "Is it D?"
We receive a 0, which is no.
And finally, we are left with two possibilities.
We ask, "Is it E?"
We receive a no.
And after five questions,
we have correctly identified the symbol: F.
Realize that we will never need to ask
more than five questions.
So the number of questions will be at least 4, and at most 5.
And in general, 2 to the power of number of questions
equals the number of possible messages --
which we previously defined as the 'message space.'
So how can we calculate the exact average --
or expected -- number of questions,
given a message space of 26?
We ask the reverse question.
2 to the power of something equals 26.
And to answer these type of questions,
we naturally use a logarithmic function, base 2 --
because log, base 2, of 26 is the exponent
2 needs to be raised to to give us 26.
Which is approximately 4.7.
So on average, approximately 4.7 questions
will be needed per letter, at minimum.
And since she wants to transmit a word with 6 letters,
Bob can expect to ask, at minimum, 28.2 questions --
Which means Alice will need to send at most 29 binary digits.
Finally, let's apply this formula
to a new message -- the poker hand.
Well, for each symbol, the sender, Alice,
can be thought of as selecting
1 of 52 different symbols.
And in this case, the number of questions
is the same as the number of times
we need to split the deck and ask Alice which pile it is in --
until we are left with one card --
which we will find is usually 6 splits --
or questions -- and sometimes 5.
But we can save time and just use our equation.
Log, base 2, of 52 is approximately 5.7,
since 2 to the power of 5.7 is approximately 52.
So the minimum number of questions,
on average, is 5.7 per card.
A poker hand contains five cards.
So to transmit a poker hand requires
28.5 questions, on average.
We are done.
We now have our unit.
It's based on the minimum number of questions
to define the message --
or the 'height' of the decision tree.
And since Alice transmits this information
as binary digits, we can shorten this [term],
and call our unit the 'bit' --
instead of 'binary digit.'
So we have 10 coin flip require 10 bits.
The 6-letter word requires 28.2 bits.
and the poker hand requires 28.5 bits.
Alice, then, decides to charge one penny per bit,
and begins collecting her fees.
Now this idea emerged in the 1920s.
It was one of the more abstract problems
that communication engineers were thinking about.
Ralph Hartley was a prolific electronics researcher,
who built on the ideas of Harry Nyquist --
both of whom worked at Bell Labs after World War I.
And in 1928, Hartley published an important paper
titled "The Transmission of Information."
And in it, he defines the word 'information'
using the symbol h -- as: h = n × logarithm of s,
where h is our information,
n is the number of symbols --
whether they're notes, letters, numbers, etc. --
and s is the number of different symbols
available at each selection.
And this can also be written as
h = logarithm of s^n.
And Hartley writes,
"What we have done, then, is to take,
as our practical measure of information,
the logarithm of the number of possible symbol sequences.
So, information is the logarithm of the message space.
However, realize that throughout this lesson,
we have been assuming that the symbol selection is random --
a convenient simplification.
However, we know that in reality,
most communication -- such as speech --
isn't always random.
It's a subtle mix of predictability and surprise.
We do not roll dice when we write letters.
And it is this predictability which can result in
significant savings in the length of transmission.
Because when we can predict things in advance,
we shouldn't need to ask
as many yes-or-no questions to define it.
But how could we formally model
this subtle difference?
This question brings us to
the key insight in our story.
Can you think of what it might be?
