The following content is
provided under a Creative
Commons license.
Your support will help MIT
OpenCourseWare continue to
offer high quality educational
resources for free.
To make a donation or to view
additional materials from
hundreds of MIT courses, visit
MIT OpenCourseWare at
ocw.mit.edu.
PROFESSOR: I'm going to spend a
couple of minutes reviewing
the major things that we talked
about last time and
then get into discrete source
coding, which is the major
topic for today.
The first major thing that we
talked about last time, along
with all of the philosophy and
all those other things, was
the sense of what digital
communication really is.
I said that what digital
communication is, is it's
communication where there's
a binary interface between
source and destination.
The source is very
often analog.
The most interesting
sources are analog.
The channel is often analog,
most interesting.
Channels are analog, and we'll
say more about what I mean by
analog later.
What's important is that you
have this binary interface
between source and
channel coding.
We said a little bit about why
we wanted a binary interface,
aside from the fact that it's
there now, there's nothing you
can do about it even if
you don't like it.
One reason is standardization,
which means it simplifies
implementation, which
means you can do
everything in the same way.
If you have ten different kinds
of channel coding and
you have ten different kinds of
source coding and you have
a binary interface, it means
you need to develop 20
different things -- ten at the
source and ten at the decoder.
If you don't have that
standardization with a binary
interface between them, you
need 100 different things.
You need to match every kind of
source with every kind of
destination.
That raises the price of
all chips enormously.
One of the other things we said
is the price of chips is
very much the cost of
development divided by the
number of them that
you stamp out.
That's not quite true,
but it's a good first
approximation.
In other words, standardization
is important.
Layering.
Layering is in many ways very
similar to standardization
because this binary interface is
also a layer between source
and destination.
But the idea there is not that
it standardizes to make things
cheaper, but it simplifies
the conceptualization of
what's going on.
You can look at a source and
only focus on one thing.
How do I take that source and
turn it into the smallest
number of binary digits
possible?
We'll talk a good deal about
what that means later because
there's something stochastic
involved in there and will
take us awhile to really
understand that.
Finally, using a binary
interface loses nothing in
performance.
That's what Shannon said,
it's what he proved.
There's some questions there
when you get to networks, but
the important thing is the
places where you want to study
non-binary interfaces, you will
never get a clue of what
it is that you're looking at or
why if you don't first very
well understand why
you want a binary
interface to start with.
In other words, if you look at
these other cases, there's
exceptions to the rule, and if
you don't know what the rule
is, you certainly can't
understand what
the exceptions are.
So for today we're going to
start out by studying this
part of the problem in here.
Namely, how do you turn a
source and put a general
source input into binary digits
that you're going to
put into the channel.
How do I study this without
studying that?
Well, one thing is these
are binary digits here.
But the other thing is we're
going to assume that what
binary digits go in here
come out here.
In other words, there
aren't any errors.
It's an error-free system.
Part of the purpose of studying
channel encoding and
channel decoding is to say how
is it that you get that
error-free performance.
You can't quite get error-free
performance, you get almost
error-free performance, but the
idea is when errors come
out here, it's not this guy's
fault, it's this guy's fault.
Therefore, what we're going to
study here is how we do our
job over here.
Namely, how we deal with
decoding, the same string of
bits that went into there and
decode them coming out.
So that's where we'll be for
the next three weeks or so.
We talked a little bit last time
about how do you layer
source coding itself.
I want to come back, because we
were talking about so many
things last time, and emphasize
what this means a
little bit.
We're going to break source
coding up into three different
layers again.
You start out with some kind of
input wave form or image or
video or whatever
the heck it is.
You're going to do something
like sampling it or expanding
it in some kind of expansion,
and we'll talk a great deal
about that later.
That's not an obvious thing,
how to do that.
When you finish doing
that, you wind up
with an analog sequence.
In other words, you wind up
with a sequence of real
numbers or sequence of
complex numbers.
Those go into a quantizer.
What the quantizer does is to
turn an uncountably infinite
set of things into a finite
set of things.
When you turn an uncountably
infinite set of possibilities
into a finite set of
possibilities, you get
distortion.
There's no way you
can avoid it.
So that's a part of what
happens there.
Then at this point you have a
finite alphabet of symbols.
That goes into the discrete
coder, goes through what we're
now calling a reliable binary
channel and comes out here.
What we're going to be studying
for the next two
weeks or so is this piece of
the system right in here.
Again, what we're going to be
doing is assuming a reliable
binary channel to the right
of this, which is
what we already assumed.
We're going to assume
that these things do
whatever they have to.
But this problem here, this
isolated problem is important
because this is dealing with
the entire problem of text,
and you know what text is,
it's computer files, it's
English language text, it's
Chinese text, it's whatever
kind of text.
If we understand how to do that,
we can then go on to
talk about quantization because
we'll have some idea
of what we're trying to
accomplish with quantization.
Without that we won't know
what the purpose of
quantization is.
Without the quantization we
won't know what we're trying
to accomplish over here.
There's another reason for
studying this problem, which
is that virtually all the ideas
that come into this
whole bunch of things are all
tucked into this one subject
in the simplest possible way.
One of the nice things about
information theory, which
we're going to touch on I said
in this course, is that one of
the reasons for studying these
simple things first is that
information theory is really
like a symphony.
You see themes coming out, those
themes get repeated,
they get repeated again with
more and more complexity each
time, and when you understand
the simple idea of the theme,
then you understand
what's going on.
So, that's the other reason
for dealing with that.
To summarize those things --
most of this I already said.
Examples of analog sources are
voice, music, video, images.
We're going to restrict this
to just wave form sources,
which is voice and music.
In other words, an image is
something where you're mapping
from two dimensions this way and
this way into a sequence
of binary digits.
So it's a mapping after you
get done sampling from r
square, which is this axis and
this axis, into your outut.
Namely, for each point in this
plane, there's some real
number that represents the
amplitude at that point.
Video is a three-dimensional
to one-dimensional thing,
namely, you have time.
You also have this way, you
have this way, so you're
mapping from r cubed into r.
We're not going to deal with
those because really all the
ideas are just contained
in dealing
with wave form sources.
In other words, the conventional
functions that
you're used to seeing.
Namely, things that you can draw
on a piece of paper and
you can understand what's
going on with them.
These are usually samples
or expanded into series
expansions almost invariably,
and we'll
understand why later.
That, in fact, is a major
portion of the course.
That's where all of the
stuff from signals and
systems come in.
We'll have to expand that a
whole lot because you didn't
learn enough there.
We need a lot of other things,
and that's what we need to
deal with wave forms.
We'll take the sequence
of numbers that
comes out of the sampler.
We're then going to quantize
that sequence of numbers.
That's the next thing we're
going to study.
Then we're going to get into
analog and discrete sources,
which is the topic we will
study right now.
So we're going to study this.
After we get done this, we're
going to study this also.
When we study this, we'll have
what we know about this as a
way of knowing how to deal with
the whole problem from
here out to here.
Finally, we'll deal with wave
forms and deal with the whole
problem from here out to here.
So that's our plan.
In fact, this whole course is
devoted to studying this
problem, then this problem, then
this problem -- that's
the source part of the course.
Then dealing with -- if I can
find it again -- with the
various parts of this problem.
So first we study sources,
then we study channels.
Because of the binary interface,
when we're all done
with that we understand
digital communication.
When we get towards the end of
the term we'll be looking at
more sophisticated kinds of
channels than we look at
earlier, which are really models
for wireless channels.
So that's where we're
going to end up.
So discrete source coding,
which is what we want
to deal with now.
What's the objective?
We're going to map a sequence
of symbols into a binary
sequence and we're going to do
it with unique decodability.
I'm not going to define unique
decodability at this point.
I'm going to define it
a little bit later.
But roughly what it
means is this.
We have a sequence of symbols
which come into
the encoding encoder.
They go through this
binary channel.
They come out as a sequence
of binary digits.
Unique decodability says if
this guy does his job, can
this guy do his job?
If this guy can always do his
job when these digits are
correct, then you have something
called unique
decodability.
Namely, you can guarantee that
whatever comes out here,
whatever comes in here,
will turn into a
sequence of binary digits.
That sequence of binary digits
goes through here.
These symbols are the same
as these symbols.
In other words, you are
reproducing things error-free
if, in fact, this reproduces
things error-free.
So that's our objective.
There's a very trivial approach
to this, and I hope
all of you will agree
that this is
really, in fact, trivial.
You map each source
symbol into an
l-tuple of binary digits.
If you have an alphabet of size
m, how many different
binary strings are there
of length l?
Well, there are 2 to
the l of them.
If l is equal to 2, you have
0, 0, 0, 1, 1, 0, and 1, 1.
If l is equal to 3, you have
strings of 3, which is 0, 0,
0, 0, 0, 1, 0, 1, 0, blah,
blah, blah, blah, blah.
What comes out to be 2 to the
3, which is equal to 8.
So what we need if we're going
to use this approach, which is
the simplest possible approach,
which is called the
fixed length approach, is you
need the alphabet size to be
less than or equal to the number
of binary digits that
you use in these strings.
Now, is that trivial or
isn't it trivial?
I hope it's trivial.
We don't want to waste bits
when we're doing this,
particularly, so we don't want
to make l any bigger than we
have to, because for every
symbol that comes in, we get l
symbols coming out.
So we'd like to minimize l
subject to this constraint
that 2 to the l has to
be bigger, greater
than or equal to m.
So, what we want to do is we
want to choose l as the
smallest integer which
satisfies this.
In other words, when you take
the logarithm to the base 2 of
this, you get log to the base 2
of m has to be less than or
equal to l, and l is then going
to be less than log to
the base 2 of m plus 1.
This is the constraint which
says you don't make l any
bigger than you have
to make it.
So in other words, we're going
to choose l equal to the
ceiling function of
log to the base m.
In other words, this is the
integer which is greater than
or equal to log to
the base 2 of m.
So let me give you a couple
of examples of that.
Excuse me for boring you with
something which really is
trivial, but there's notation
here you have to get used to.
You get confused with this
because there's the alphabet
size which we call m, there's
the string length which we
call l, and you keep getting
mixed up between these two.
Everybody gets mixed
up between them.
I had a doctoral student the
other day who got mixed up in
it, and I read what she had
written four times and I
didn't catch it either.
So this does get confusing
at times.
If you have an alphabet, which
is five different kinds of the
letter a --
that's one reason why these
source codes get messy, you
have too many different kinds
of each letter, which
technical people who like a lot
of jargon use all of them.
In fact, when people start
writing papers and books you
find many more than
five there.
In terms of Latex, you get math
cow, you get math gold,
you get math blah, blah, blah.
Everything in little and big.
You get the Greek version.
You get the Roman version and
the Arabic version, if you're
smart enough to know that
language, those languages.
What we mean by code is alpha
gets mapped into 0, 0, 0. a
gets mapped into 0, 0, 1.
Capital A into this
and so forth.
Does it make any difference
what mapping you use here?
Can you find any possible reason
why it wouldn't make a
difference whether I map alpha
into 0, 0, 0, and a into 0, 0,
1 or vice versa?
I can't find any reason
for that.
Would it make any difference
of instead of having this
alphabet I had beta b, capital
B, script b, and capital B
with a line over it?
I can't see any reason
why that would make
a difference either.
In other words, when we're
talking about fixed length
codes, there are only two
things of importance.
One of them is how big is the
alphabet -- that's why we talk
about alphabets all the time.
After you know how big the
alphabet is and after you know
you want to do a fixed length
binary encoding, then you just
assign a binary string to
each of these letters.
In other words, there's nothing
important in these symbols.
This is a very important
principle
of information theory.
It sort of underlines
the whole subject.
I'm not really talking about
information theory here, as I
said, we're talking about
communication.
But communication these days is
built on these information
theoretic ideas.
Symbols don't have any
inherent meaning.
As far as communication is
concerned, all you're
interested in is what is
the set of things --
I could call this a1, a2, a3,
a4, a5, and we're going to
start doing this after awhile
because we will recognize that
the name of the symbols don't
make any difference.
If you listen to a political
speech if it's by a Republican
there are n different things
they might say, and you might
as well number them
a1 to a sub n.
If you listen to one of the
Democratic candidates there
are m different things
they might say.
You can number them 1 to m,
and you can talk to other
people about it and say oh, he
said a1 today, which is how do
we get out of the war in Iraq.
Or he said number 2 today, which
is we need more taxes or
less taxes and so forth.
So it's not what they say as
far as communication is
concerned, it's just
distinguishing the different
possible symbols.
So, you can easily
decode this --
you see three bits and
you decode them.
Can I?
Is this right or is there
something missing here?
Of course, there's something
missing.
You need synchronization if
you're going to do this.
If I see a very long string of
binary digits and I'm going to
decode them into these letters
here, I need to know where the
beginning is.
In other words, if it's a
semi-infinite string of binary
digits, I don't know
how to look at it.
So, inherently, we believe that
somebody else gives us
synchronization.
This is one of these things
we always assume.
When you start building a system
after you decide how to
do this kind of coding, somebody
at some point has to
go through and decide
where do you get the
synchronization from.
But you shouldn't think of the
synchronization first.
If I'm encoding 10 million
symbols and it takes me 1,000
bits to achieve the
synchronization, that 1,000
bits gets amortized over 10
million different symbols, and
therefore, it doesn't make any
difference, and therefore,
we're going to ignore it.
It's an important problem
but we ignore it.
The ASCII code is a more
important example in this.
It was invented many,
many years ago.
It was a mapping from 256
different symbols which are
all the letters, all the
numbers, all the things that
people used on typewriters.
Anybody remember what
a typewriter is?
Well, it's something people used
to use before they had
computers, and these typewriters
had a lot of
different keys on them and
they had a lot of special
things you could do with them.
And somebody dreamed up 256
different things that they
might want to do.
Why do they use l equals 8?
Nothing to do with communication
or with
information theory or with
any of these things.
It was that 8 is
a nice number.
It's 2 to the 3.
In other words, this was a
standard length of both
computer words and of lots
of other things.
Everybody likes to deal with 8
bits, which you call a byte,
rather than 7 bits which is
sort of awkward or 6 bits
which was an earlier standard,
which would have been
perfectly adequate for most
things that people wanted.
But no, they had to go
to 8 bits because it
just sounded nicer.
These codes are called
fixed length codes.
I'd like to say more about them
but there really isn't
much more to say about them.
There is a more general version
of them, which we'll
call generalized fixed
length codes.
The idea there is to segment
the source sequence.
In other words, we're always
visualizing now having a
sequence of symbols
which starts at
time zero, runs forever.
We want to segment that into
blocks of length n.
Namely, you pick off the first
n symbols, you find the code
word for those n symbols, then
you find the code word for the
next n symbols, then you find
the code word for the next n
symbols and so forth.
So it's really the
same problem that
we looked at before.
It's just that the alphabet
before had the number of
symbols as the alphabet size.
Now, instead of having an
alphabet size which is m,
we're looking at blocks of m
symbols and how many possible
combinations are there of blocks
where every symbol is
one of m different things.
Well, if you have two symbols,
the first one can be any one
of m things, the second one can
be any one of m things.
So there are m squared possible
combinations for the
first two symbols, there are m
cubed possible combinations
for the first three symbols
and so forth.
So we're going to have an
alphabet on blocks of m to the
n different n tuples
of source letters.
Well, once you see that we're
done because what we're going
to do is find a binary sequence
for every one of
these blocks of m to
the n symbols.
As I said before, the only
thing important is
how many are there.
It doesn't matter that they're
blocks or that they're stacked
this way or that they're stacked
around in a circle or
anything else.
All you're interested in is how
many of them are there.
So there are m to
the n of them.
So, what we want to do is make
the binary length that we're
dealing with equal to log to the
base 2 of m to the n, the
ceiling function of that.
Which says log to the base 2 of
m is less than or equal to
l bar where l bar is going to be
the bits per source symbol.
I'm going to abbreviate that
bits per source symbol.
I would like to abbreviate it
bps, but I and everyone else
will keep thinking that bps
means bits per second.
We don't have to worry about
seconds here, seconds had
nothing to do with
this problem.
We're just dealing with
sequences of things and we
don't care how often
they occur.
They might just be sitting in
a computer file and we're
doing them offline, so seconds
has nothing to
do with this problem.
So, log to the base 2 of m is
less than or equal to l over
n, which is less than log to the
base 2 of m plus 1 over n.
In other words, we're just
taking this dividing it by n,
we're taking this dividing by
n, the ceiling function is
between log to the base 2 of m
to the n, and log to the base
2 of m to the n plus 1.
When we divide by n, that
1 becomes 1 over n.
What happens when you make n
large? l approaches log to the
base 2 of m from above.
Therefore, fixed length coding
requires log to the base 2 of
n bits per source symbol
if, in fact, you
make n large enough.
In other words, for the example
of five different
kinds of a's, we had
m equal to 5.
So if you have m equal to 5,
that leads to m squared equals
25, that leads to l equals --
what's the ceiling function of
log of this?
It's 5.
l bar is equal to --
what's half of 5?
2 and 1/2, yes.
As you get older you can't
do arithmetic anymore.
So look what we've
accomplished.
We've gone from three bits per
symbol down to two and and
half bits per symbol,
isn't that exciting?
Well, you look at it
and you say no,
that's not very exciting.
I mean yes, you can do it, but
most people don't do that.
So why do we bother with this?
Well, it's the same reason we
bother with a lot of things in
this course, and the whole first
two weeks of this course
will be dealing with things
where when you look at them
and you ask is this important,
you have to answer no, it's
not important, it doesn't really
have much to do with
anything, it's a mathematical
idea.
What it does have to do with is
the principle involved here
is important.
It says that the lower limit of
what you can do with fixed
coding is log to the
base 2 of m.
You have an alphabet of size
m, you can get as close to
this as you want to.
We will find out later that
if you have equally likely
symbols when we get to talking
about probability, we will
find out that nothing
in the world can do
any better than this.
That's the more important thing,
because what we're
eventually interested in is
what's the best you can do if
you do things very
complicated.
Why do you want to know what
the best is if you do
something very complicated?
Because if you can do that
simply then you know you don't
have to look any further.
So that's the important thing.
Namely, it lets you do something
simple and know
that, in fact, what you're
doing makes sense.
That's why we do all of that.
But then after we say well
there's no place else to go on
fixed length codes, we say well,
let's look at variable
length codes.
The motivation for variable
length codes is that probable
symbols should probably have
shorter code words than very
unlikely symbols.
And Morse thought of this a
long, long time ago when Morse
code came along.
Probably other people thought of
it earlier, but he actually
developed the system
and it worked.
Everyone since then has
understood that if you have a
symbol that only occurs very,
very, very rarely, you would
like to do something, make a
code word which is very long
for it so it doesn't interfere
with other code words.
Namely, one of the things that
you often do when you're
developing a code is think of a
whole bunch of things which
are sort of exceptions.
They hardly ever happen.
You use the fixed length code
for all the things that happen
all the time, and you make one
extra code word for all these
exceptions.
Then you have this exception and
paste it on at the end of
the exception is a number
which represents which
exception you're looking at.
Presto, you have a variable
length code.
Namely, you have two different
possible code lengths -- one
of them for all of the likely
things and the indication that
there is an exception, and two,
all the unlikely things.
There's an important
feature there.
You can't drop out having
the code word
saying this is an exception.
If you just have a bunch of
short code words and a bunch
of long code words, then you see
a short code word and you
don't know -- well, if you see
a long code word starting or
you have a short code word, you
don't know which it is and
you're stuck.
So one example of a variable
length code -- we'll use some
jargon here.
We'll call the code
a script c.
We'll think of script c as a
mapping which goes from the
symbols onto binary strings.
In other words, c of x is the
code word corresponding
to the symbol x.
So for each x in the alphabet,
capital X, and we have to
think of what the
capital X is.
But as we say, the only thing
we're really interested in is
how big is this alphabet --
that's the only thing
of importance.
So if we have an alphabet which
consists of the three
letters a, b and c, we might
make a code where the code
word for a is equal to zero, the
code word for b is equal
to 1, zero, and the code word
for c is equal to 1,1.
Now it turns out that's
a perfectly fine
code and that works.
Let me show you another
example of a code.
Let me just show you an example
of a code here so we
can see that not everything
works.
Suppose c of a is zero, c of b
is 1, and c of c is -- this is
a script c, that's a little
c -- is 1, zero.
Does that work?
Well, all of the symbols have
different code words, but this
is an incredibly stupid
thing to do.
It's an incredibly stupid thing
to do because if I send
a b followed by an a, what the
poor decoder sees is 1
followed by zero.
In other words, one of the
things that I didn't tell you
about is when we're using
variable length codes we're
just concatenating all of these
code words together.
We don't put any spaces
between them.
We don't put any commas
between them.
If, in fact, I put a space
between them, I would really
have not a binary alphabet
but a ternary alphabet.
I would have zeros and I would
have 1's and I would have
spaces, and you don't
like to do that
because it's much harder.
When we start to study channels
we'll see that
ternary alphabets are much more
difficult to work with
than binary alphabets.
So this doesn't work,
this does work.
Part of what we're going to be
interested in is what are the
conditions under why
this works and why
this doesn't work.
Again, when you understand this
problem you will say it's
very simple, and then you come
back to look at it again and
you'll say it's complicated
and then it looks simple.
It's one of these problems that
looks simple when you
look at it in the right way, and
it looks complicated when
you get turned around and you
look at it backwards.
So the success of code words of
a variable length code are
all transmitted just as a
continuing sequence of bits.
You don't have any of these
commas or spaces in them.
If I have a sequence of symbols
which come into the
encoder, those get mapped into
a sequence of bits, variable
length sequences of bits
which come out.
They all get pushed together
and just come out
one after the other.
Buffering can be a problem here,
because when you have a
variable length code --
I mean look at what
happens here.
If I've got a very long string
of a's coming in, I got a very
short string of bits
coming out.
If I have a long string of b's
and c's coming in, I have a
very long string of
bits coming out.
Now usually the way the channels
work is that you put
in bits at a fixed
rate in time.
Usually the way that sources
work is that symbols arrive at
a fixed rate in time.
Therefore, here, if symbols are
coming in at a fixed rate
in time, they're going out at
a non-fixed rate in time.
We have to bring them into a
channel at a fixed rate in
time, so we need a buffer to
take care of the difference
between the rate at which they
come out and the rate at which
they go in.
We will talk about that problem
later, but for now we
just say OK, we have
a buffer, we'll put
them all in a buffer.
If the buffer ever empties out
-- well, that's sort of like
the problem of initial
synchronization.
It's something that doesn't
happen very often, and we'll
put some junior engineer on
that because it's a hard
problem, and seeing your
engineers never deal with the
hard problems, they always
give those to the junior
engineers so that they can
assert their superiority over
the junior engineers.
It's a standard thing you
find in the industry.
We also require unique
decodability.
Namely, the encoded bit stream
has to be uniquely deparsed at
the decoder.
I have to have some way of
taking that long string of
bits and figuring out where the
commas would have gone if
I put commas in it.
Then from that I have
to decode things.
In other words, it means that
every symbol in the alphabet
has to have a distinct code
word connected with it.
We have that here.
We have that here.
Every symbol has a distinct
code word.
But it has to be
more than that.
I'm not even going to talk about
precisely what that more
means for a little bit.
We also assume to make life easy
for the decoder that it
has initial synchronization.
There's another obvious
property that we have.
Namely, both the encoder and the
decoder know what the code
is to start with.
In other words, the code is
built into these devices.
When you design a coder and a
decoder, what you're doing is
you figure out what an
appropriate code should be,
you give it to both the encoder
and the decoder, both
of them know what the code is
and therefore, both of them
can start decoding.
A piece of confusion.
We have an alphabet here which
has a list of symbols in it.
So there's a symbol a1,
a2, a3, up to a sub m.
We're sending a sequence of
symbols, and we usually call
the sequence of symbols we're
sending x1, x2, x3,
x4, x5 and so forth.
The difference is the symbols
in the alphabet are all
distinct, we're listing them
one after the other.
Usually there's a finite
number of them.
Incidentally, we could have a
countable number of symbols.
You could try to do everything
we're doing here say with the
integers, and there's a
countable number of integers.
All of this theory pretty much
carries through with various
little complications.
We're leaving that out here
because after you understand
what we're doing, making
it apply to integers is
straightforward.
Putting in the integers to start
with, you'll always be
fussing about various silly
little special cases, and I
don't know a single situation
where anybody deals with a
countable alphabet, except
by truncating it.
When you truncate an infinite
alphabet you
get a finite alphabet.
So, we'll assume initial
synchronization, we'll also
assume that there's
a finite alphabet.
You should always make sure that
you know whether you're
talking about a listing of the
symbols in the alphabet or a
listing of the symbols
in a sequence.
The symbols in a sequence can
all be the same, they can all
be different.
They can be anything at all.
The listing of symbols in the
alphabet, there's just one for
each symbol.
We're going to talk about a very
simple case of uniquely
decodable codes which are called
prefix-free codes.
A code is prefix-free if no code
word is a prefix of any
other code word.
In other words, a code word is
a string of binary digits.
A prefix of a string
of binary digits.
For example, if we have the
binary string 1, 0, 1, 1, 1.
What are the prefixes of that?
Well, one prefix
is 1, 0, 1, 1.
Another one is 1, 0, 1.
Another one is 1, 0.
Another is 1.
In other words, it's what you
get by starting out at the
beginning and not quite
getting to the end.
All of these things are
called prefixes.
If you want to be general you
could call 1, 0, 1, 1, 1, a
prefix of itself.
We won't bother to do that
because it just is -- that's
the kind of things that
mathematicians do to save a
few words in the proofs
that they give and we
won't bother with that.
We will rely a little more
on common sense.
Incidentally, I prove a lot of
things in these notes here.
I will ask you to prove
a lot of things.
One of the questions that people
always have is what
does a proof really mean?
I mean what is a proof and
what isn't a proof?
When you take mathematics
courses you get one idea of
what a proof is, which
is appropriate
for mathematics courses.
Namely, you prove things using
the correct terminology for
proving them.
Namely, everything that you deal
with you define it ahead
of time so that all of the
terminology you're using all
has correct definitions.
Then everything should follow
from those definitions and you
should be able to follow a
proof through without any
insight at all about
what is going on.
You should be able to follow
a mathematical proof
step-by-step without knowing
anything about what this is
going to be used for, why
anybody is interested in it or
anything else, and that's an
important thing to learn.
That's not what we're
interested in here.
What we're interested in
here for a proof --
I mean yes, you know all of
the things around this
particular proof that we're
dealing with, and what you're
trying to do is to construct
a proof that covers
all possible cases.
You're going to use insight for
that, you're going to use
common sense, you're going to
use whatever you have to use.
And eventually you start to get
some sort of second sense
about when you're leaving
something out that really
should be there.
That's what we're going to be
focusing on when we worry
about trying to be
precise here.
When I start proving things
about prefix codes, I think
you'll see this because you will
look at it and say that's
not a proof, and, in fact,
it really is a proof.
Any good mathematician would
look at it and say yes, that
is a proof.
Bad mathematicians sometimes
look at it and say well, it
doesn't look like proof so
it can't be a proof.
But they are.
So here we have prefix-free
codes.
The definition is no code
word is a prefix of
any other code word.
If you have a prefix-free code,
you can express it in
terms of a binary tree.
Now a binary tree starts at a
root, this is the beginning,
moves off to the right -- you
might have it start at the
bottom and move up or whatever
direction you want to go in,
it doesn't make any
difference.
If you take the zero path
you come to some leaf.
If you take the one path
you come to some
intermediate node here.
From the intermediate
node, you either go
up or you go down.
Namely, you have
a 1 or a zero.
From this intermediate node
you go up and you go down.
In other words, a binary tree,
every node in it is either an
intermediate node, which means
there are two branches going
out from it, or it's a leaf
which means there aren't any
branches going out from it.
You can't, in a binary tree,
have just one branch coming
out of a node.
There are either no branches
or two branches, just by
definition of what we mean
by a binary tree --
binary says two.
So, here this tree corresponds
where we label
various ones of the leafs.
It corresponds to the code where
a corresponds to the
string zero, b corresponds
to the string 1, 1, and c
corresponds to the
string 1, 0, 1.
Now you look at this and when
you look at the tree, when you
look at this as a
code, it's not.
Obvious that it's something
really stupid about it.
When you look at the tree,
it's pretty obvious that
there's something stupid about
it, because here we have this
c here, which is sitting off on
this leaf, and here we have
this leaf here which isn't doing
anything for us at all.
We say gee, we could still keep
this prefix condition if
we moved this into here
and we drop this off.
So any time that there's
something hanging here without
corresponding to a symbol,
you would really
like to shorten it.
When you shorten these things
and you can't shorten anything
else, namely, when every leaf
has a symbol on it you call it
a full tree.
So a full tree is more than a
tree, a full tree is a code
tree where the leaves correspond
to symbols.
So a full tree has
no empty leaves.
Empty leaves can be shortened
just like I showed you here,
so we'll talk about full trees,
and full trees are sort
of the good trees.
But prefix-free codes don't
necessarily have to worry
about that.
Well, now I'm going to prove
something to you, and at this
point you really should object,
but I don't care.
We will come back
and you'll get
straightened out on it later.
I'm going to prove that
prefix-free codes are uniquely
decodable, and you should cry
foul because I really haven't
defined what uniquely
decodable means yet.
You think you know what uniquely
decodable means,
which is good.
It means physically that you can
look at a string of code
words and you can pick out
what all of them are.
We will define it later
and you'll find out
it's not that simple.
As we move on, when we start
talking about Lempel Ziv codes
and things like that.
You will start to really
wonder what
uniquely decodable means.
So it's not quite as
simple as it looks.
But anyway, let's prove that
prefix-free codes are uniquely
decodable anyway, because
prefix-free codes are a
particularly simple example of
uniquely decodable codes, and
it's sort of clear that you
can, in fact, decode them
because of one of the properties
that they have.
The way we're going to prove
this is we want to look at a
sequence of symbols or a string
of symbols that come
out of the source.
As that string of symbols come
out of the source, each symbol
in the string gets mapped into
a binary string, and then we
concatenate all those binary
strings together.
That's a big mouthful.
So let's look at this code we
were just talking about where
the code words are b, c and a.
So if a 1 comes out of the
source and then another 1, it
corresponds to the
first letter b.
If a 1, zero comes out,
it corresponds to the
first letter c.
If a zero comes out, that
corresponds to the letter a.
Well now the second symbol comes
in and what happens on
that second symbol is if the
first symbol was an a, the
second symbol could be a b or a
c or an a, which gives rise
to this little sub-tree here.
If the first letter is a b,
the second letter could be
either an a, b or a c, which
gives rise to this little
sub-tree here.
If we have a c followed by
anything, that gives rise to
this little sub-tree here.
You can imagine growing this
tree as far as you want to,
although it gets hard
to write down.
How do you decode this?
Well, as many things, you want
to start at the beginning, and
we know where the
beginning is.
That's a basic assumption on
all of this source coding.
So knowing where the beginning
is, you sit there and you look
at it, and you see a zero as
the first letter as a first
binary digit, and zero says I
move this way in the tree, and
presto, I say gee, an a must
have occurred as the first
source letter.
So what do I do?
I remove the a, I print out a,
and then I start to look at
this point.
At this point I'm back where I
started at, so if I can decode
the first letter,
I can certainly
decode everything else.
If the first letter is a
b, what I see is a 1
followed by a 1.
Namely, when I see the first
binary 1 come out of the
channel, I don't know
what was said.
I know either a b
or c was sent.
I have to look at the second
letter, the second binary
digit resolves my confusion.
I know that the first source
letter was in a b, if it's 1
1, or a c, if it's 1 zero.
I decode that first source
letter and then where am I?
I'm either on this tree or on
this tree, each of which goes
extending off into the
wild blue yonder.
So this says if I know where the
beginning is, I can decode
the first letter.
But if I can decode the first
letter, I know where the
beginning is for everything
else.
Therefore, I can decode
that also.
Well, aside from any small
amount of confusion about what
uniquely decodable means,
that's a perfectly fine
mathematical proof.
So, prefix-free codes are, in
fact, uniquely decodable and
that's nice.
So then there's a question.
What is the condition on the
lengths of a prefix-free code
which allow you to have
unique decodability?
The Kraft inequality is a test
on whether there are
prefix-free codes or there
are not prefix-free codes
connected with any given set
a code word lengths.
This is a very interesting
enough inequality.
This is one of the relatively
few things in information
theory that was not invented
by Claude Shannon.
You sit there and you wonder
why didn't Claude Shannon
realize this?
Well, it's because I
think he sort of
realized that it was trivial.
He sort of understood it and he
was really eager to get on
to the meat of things, which is
unusual for him because he
was somebody, more than anyone
else I know, who really
understood why you should
understand the simple things
before you go on to the more
complicated thing.
But anyway, he missed this.
Bob Fano, who some of you
might know, who was a
professor emeritus over in
LCS, was interested in
information theory.
Then he was teaching a graduate
course back in the
'50s here at MIT, and as he
often did, he threw out these
problems and said nobody knows
how to figure this out.
How kinds of lengths can you
have on prefix-free codes, and
what kinds of lengths
can't you have?
Kraft was a graduate student
at the time.
The next day he came in with
this beautiful, elegant proof
and everybody's always known who
Kraft is ever since then.
Nobody's ever known what
he did after that.
But at least he made his
mark on the world
as a graduate student.
So, in a sense, those were
good days to be around,
because all the obvious things
hadn't been done yet.
But the other thing is you never
know what the obvious
things are until you do them.
This didn't look like an obvious
problem ahead of time.
Don't talk about a number of
other obvious things that cuts
off, because somebody was
looking at it in a slightly
different way than other people
were looking at it.
You see, back then people said
we want to look at these
variable length codes because
we want to have some
capability of mapping improbable
symbols into long
code words and probable symbols
into short code words.
You'll notice that I've done
something strange here.
That was our motivation for
looking at variable length
codes, but I haven't said a
thing about probability.
All I'm dealing with now is
the question of what is
possible and what
is not possible.
We'll bring in probability
later, but now all we're
trying to figure out is what
are the sets of code word
lengths you can use, and what
are the sets of code word
lengths you can't use.
So what Kraft said is every
prefix-free code for an
alphabet x with code word
lengths l of x for each letter
in the alphabet x satisfies the
sum 2 to the minus length
less than or equal to 1.
In other words, you take all
of the code words in the
alphabet, you take the length
of each of those code words,
you take 2 to the minus
l of that length.
And if this inequality is not
satisfied, your code does not
satisfy the prefix condition,
there's no way you can create
a prefix-free code which
has these lengths, so
you're out of luck.
So you better create a new set
of lengths which satisfies
this inequality.
There's also a simple procedure
you can go through
which lets you construct a code
which has these lengths.
So, in other words, this, in
a sense, is a necessary and
sufficient condition on the
possibility of constructing
codes with a particular
set of lengths.
It has nothing to do
with probability.
So it's, in a sense, cleaner
than these other results.
So, conversely, if this
inequality is satisfied, you
can construct a prefix-free
code, and even more strangely,
you can construct it very, very
easily, as we'll see.
Finally, a prefix-free code is
full -- you remember what a
full prefix-free code is?
It's a code where the tree has
nothing that's unused if and
only if this inequality is
satisfied with a quality.
So it's a neat result.
It's useful in a lot of places
other than source coding.
If you ever get involved with
designing protocols for
computer networks or protocols
for any kind of computer
communication, you'll find
that you use this all the
time, because this says you can
do some things, you can't
do other things.
So let's see why it's true.
I'll give you another funny
proof that doesn't look like a
proof but it really is.
What I'm going to do is to
associate code words with base
2 expansions.
There's a little Genie that
early in the morning leaves
things out of these slides
when I make them.
It wasn't me, I put it in.
So we're going to prove this by
associating code words with
base 2 expansions, which are
like decimals, but decimals to
the base 2.
In other words, we're going to
take a code word, y1, y2 up to
y sub m where y1 is a binary
digit, y2 is a binary digit.
This is a string of binary
digits, and we're going to
represent this as
a real number.
The real number is the decimal,
but it's not a
decimal, it's a becimal, if you
will, which is dot y1, y2
up to y sub m.
Which means y1 over 2 plus y2
over 4 plus dot dot dot plus y
sub m over 2 to the minus m.
If you think of it, an ordinary
becimal, y1, y2 up to
y sub m, means y1 over 10 plus
y2 over 100 plus y3 over 1,000
and so forth.
So this is what people would
have developed for decimals
if, in fact, we lived in
a base 2 world instead
of a base 10 world.
If you were born without fingers
and you only had two
fingers, this is the number
system you would use.
When we think about
decimals there's
something more involved.
We use decimals all the time
to approximate things.
Namely, if I say that a number
is 0.12, I don't mean usually
that it's exactly 12
one hundredths.
Usually I mean it's about
12 one hundredths.
The easiest way to do this is
to round things down to two
decimal points.
In other words, when I say 0.12,
what I really mean is I
am talking about a real number
which lies between 12 one
hundredths and 13
one hundredths.
It's greater than or equal to
12 one hundredths and it's
less than 13 one hundredths.
I'll do the same thing
in base 2.
As soon as I do this you'll see
where the Kraft inequality
comes from.
So I'm going to have this
interval here, which the
interval associated with a
binary expansion to m digits,
there's a number associated
with it which
is this number here.
There's also an interval
associated with it, which is 2
to the minus m.
So if I have a code consisting
of 0, 0, 0, 1 and 1, what I'm
going to do is represent zero
zero as a binary expansion, so
0, 0, is a binary expansion
is 0.00, which is zero.
But also as an approximation
it's between zero and 1/4.
So I have this interval
associated with 0, 0, which is
the interval from
zero up to 1/4.
For the code word zero 1, if I'm
trying to see whether that
is part of a prefix code, I map
it into a number, 0.01 as
a binary expansion.
This number corresponds to the
number 1/4, and it also
corresponds into sub length 2
to an interval of size 1/4.
So we go from 1/4 up to 1/2.
Finally, I have 1, which
corresponds to the number 1/2,
and since it's only one binary
digit long, it corresponds to
the interval 1/2 to 1.
Namely, if I truncate thing to
one binary digit, I'm talking
about the entire interval
from 1/2 to 1.
So where does the Kraft
inequality come from and what
does it have to do with this?
Incidentally, this isn't the
way that Kraft proved it.
Kraft was very smart.
He did this as his Master's
thesis, too, I believe, and
since he wanted it to be his
Master's thesis he didn't want
to make it look quite that
trivial or Bob Fano would have
said oh, you ought to do
something else for a Master's
thesis also.
So he was cagey and made his
proof look a little more
complicated.
So, if a code word x is a prefix
of code word y, in
other words, y has some binary
expansion, x has some binary
expansion which is the first
few letters of y.
Then the number corresponding
to x and the interval
corresponding to x, namely, x
covers that entire range of
decimal expansions which start
with x and goes up to
something which differs from x
only in that mth binary digit.
In other words, let me
show you what that
means in terms of here.
If I tried to create a code word
0, 0, 0, 1, 0, 0, 0, 1
would correspond to
the number 1/16.
1/16 lies in that
interval there.
In other words, any time I
create a code word which lies
in the interval corresponding to
another code word, it means
that this code word has a prefix
of that code word.
Sure enough it does --
0, 0, 0, 1, this has
this as a prefix.
In other words, there is a
perfect mapping between
intervals associated
with code words and
prefixes of code words.
So in other words, if we have
a prefix-free code, the
intervals for each of these code
words has to be distinct.
Well, now we're in nice shape
because we know what the size
of each of these intervals is.
The size of the interval
associated with a code word of
length 2 is 2 to the minus 2.
To be a prefix-free
code, all these
intervals have to be disjoint.
But everything is contained here
between zero and 1, and
therefore, when we add up all
of these intervals we get a
number which is at most 1.
That's the Kraft inequality.
That's all there is to it.
There was one more
thing in it.
It's a full code if and only
if the Kraft inequality is
satisfied with a quality.
Where was that?
The code is full if and only
if the expansion intervals
fill up zero and 1.
In other words, suppose this was
1 zero, which would lead
into 0.1 with an interval 1/2
to 3/4, and this was all you
had, then this interval up here
would be empty, and, in
fact, since this interval
is empty you could
shorten the code down.
In other words, you'd have
intervals which weren't full
which means that you would have
code words that could be
put in there which
are not there.
So, that completes the proof.
So now finally, it's time to
define unique decodability.
The definition in the notes is
a mouthful, so I broke it
apart into a bunch of different
pieces here.
A code c for a discrete source
is uniquely decodable if for
each string of source letters,
x1 up to x sub m, these are
not distinct letters of the
alphabet, these are just the
things that might come out of
the source. x1 could be the
same as x2, it could be
different from x2.
If all of these letters coming
out of the source, that
corresponds to some
concatenation of these code
words, namely, c of x1, c of
x2 up to c of x sub m.
So I have this coming out of the
source, this is a string
of binary digits that come out
corresponding to this, and I
require that this differs from
the concatenation of the code
words c of x1 prime up
to c of xm prime.
For any other string, x1 prime
x2 prime, x of m prime of
source letters.
Example of this, the thing
that we were trying to
construct before c of a equals
1c of b equals zero, c of c
equals 1 zero, doesn't work
because the concatenation of a
and b yields 1 zero, c of
x1 -- take x1 to be a,
take x2 to be b.
This concatenation, c of x1,
c of x2 is c of a, c
of b equals 1 zero.
C of c equals 1 zero, and
therefore, you don't have
something that works.
Note that n here can be
different from m here.
You'll deal with that in the
homework a little bit, not
this week's set.
But that's what unique
decodability says.
Let me give you an example.
Here's an example.
Turns out that all uniquely
decodable codes have to
satisfy the Kraft
inequality also.
Kraft didn't prove this.
In fact, it's a bit of a
bear to prove it, and
we'll prove it later.
I suspect that about 2/3 of you
will see the proof and say
ugh, and 1/3 of you will say
oh, this is really, really
interesting.
I sort of say gee, this is
interesting sometimes, and
more often I say ugh, why
do we have to do this?
But one example of a code which
is uniquely decodable is
first code word is 1, second
code word is 1, 0, third is 1,
0, 0, and the fourth
is 1, 0, 0, 0.
It doesn't satisfy the Kraft
inequality with the quality,
it satisfies it with
inequality.
It is uniquely decodable.
How do I know it's uniquely
decodable by
just looking at it?
Because any time I see
a 1 I know it's the
beginning of a code word.
So I look at some along binary
string, it starts out with the
1, I just read digits till it
comes to the next one, I say
ah-ha, that next 1 is the first
binary digit in the
second code word, the third 1
that I see is the first digit
in the third code word
and so forth.
You might say why don't I make
the 1 the end of the code word
instead of the beginning of the
code word and then we'll
have the prefix condition
again.
All I can say is because I want
to be perverse and I want
to give you an example of
something that is uniquely
decodable but doesn't satisfy
the Kraft inequality.
So it's a question.
Why don't we just stick the
prefix-free codes and forget
about unique decodability?
You won't understand the answer
to that really until we
start looking at things like
Lempel Ziv codes, which are,
in fact, a bunch of different
things all put together which
are, in fact, very, very
practical codes.
But they're not prefix-free
codes, and you'll see why
they're not prefix-free codes
when we study them.
Then you will see why we want
to have a definition of
something which is more
involved than that.
So don't worry about that
for the time being.
For the time being, the correct
idea to take away from
this is that why not just use
prefix-free codes, and the
answer is for quite
a while we will.
We know that anything we can do
with prefix-free codes we
can also do with uniquely
decodable codes, anything we
can do with uniquely decodable
codes, we can do with
prefix-free codes.
Namely, any old code that you
invent has like certain set of
lengths associated with the
code words, and if it
satisfies the Kraft inequality,
you can easily
develop a prefix-free code which
has those lengths and
you might as well do it because
then it makes the
coding a lot easier.
Namely, if we have a prefix-free
code -- let's go
back and look at that because
I never mentioned it and it
really is one of the important
advantages
of prefix-free codes.
When I look at this picture and
I look at the proof of how
I saw that this was uniquely
decodable, what we said was
you start at the beginning and
as soon as the decoder sees
the last binary digit of a code
word, the decoder can say
ah-ah, it's that code word.
So it's instantaneously
decodable.
In other words, all you need to
see is the end of the code
word and at that point you
know it's the end.
Incidentally, that makes
figuring out when you have a
long sequence of code words and
you want to stop the whole
thing, it makes things
a little bit easier.
This example we started
out with of --
I can't find it anymore --
but the example of a uniquely
decodable, but non-prefix-free
code, you always had to look at
the first digit of the next
code word to know that the old
code word was finished.
So, prefix-free codes have
that advantage also.
The next topic that we're going
to take up is discrete
memoryless sources.
Namely, at this point we have
gone as far as we can in
studying prefix-free codes and
uniquely decodable codes
strictly in terms of their
non-probabalistic properties.
Namely, the question of what set
of lengths can you use in
a prefix-free code or uniquely
decodable code, and what sets
of lengths can't you use.
So the next thing we want to do
is to start looking at the
probabilities of these different
symbols and looking
at the probabilities of
the different symbols.
We want to find out what sort of
lengths we want to choose.
There will be a simple
answer to that.
In fact, there'll be two ways of
looking at it, one of which
will lead to the idea of
entropy, and the other which
will lead to the idea of
generating an optimal code.
Both of those approaches are
extremely interesting.
But to do that we have
to think about a very
simple kind of source.
The simple kind of source
is called a
discrete memoryless source.
We know what a discrete source
is -- it's a source which
spews out a sequence of symbols
from this finite
alphabet that we know and
the decoder knows.
The next thing we have to do
is to put a probability
measure on the output
of the source.
There's a little review
of probability at
the end of this lecture.
You should read it carefully.
When you study probability, you
have undoubtedly studied
it like most students do,
as a way of learning
how to do the problems.
You don't necessarily think of
the generalizations of this,
you don't necessarily think of
why is it that when you define
a probability space you start
out with a sample space and
you talk about elements in the
sample space, what's their
sample points.
What do those sample points
have to do with random
variables and all
of that stuff?
That's the first thing you
forget when you haven't been
looking at probability
for a while.
Unfortunately, it's something
you have to understand when
we're dealing with this because
we have a bunch of
things which are not random
variables here.
These letters here are
things which we
will call chance variables.
A chance variable is just like
a random variable but the set
of possible values that it has
are not necessarily numbers,
they're just events,
as it turns out.
So the sample space is just
some set of letters, as we
call them, which are
really events in
this probability space.
The probability space assigns
probabilities to
sequences of letters.
What we're assuming here is that
the sequence of letters
are all statistically
independent of each other.
So for example, if you go to Las
Vegas and you're reporting
the outcome of some gambling
game and you're sending it
back your home computer and your
home computer is figuring
out what your odds are in black
jack or something, then
every time the dice are rolled
you get an independent -- we
hope it's independent if
the game is fair --
outcome of the dice.
So that what we're sending
then, what we're going to
encode is a sequence of
independent, random -- not
random variables because it's
not necessarily numbers that
you're interested in, it's
this sequence of symbols.
But if we deal with the English
text, for example, the
idea that the letters in English
text are independent
of each other is absolutely
ludicrous.
If it's early enough in the
term that you're not
overloaded already, I would
suggest that those of you with
a little time go back and read
at least the first part of
Shannon's original article about
information theory where
he talks about the problem
of modeling English.
It's a beautiful treatment,
because he starts out same way
we are, dealing with sources
which are independent,
identically distributed
chance variables.
Then he goes from there, as we
will, to looking at Markov
chains of source variables.
Some of you will cringe at this
because you might have
seen Markov chains and forgotten
about them or you
might have never seen them.
Don't worry about it, there's
not that much that's peculiar
about them.
Then he goes on from
there to talk about
actual English language.
But the point that he makes is
that when you want to study
something as complicated as the
English language, the way
that you do it is not to start
out by taking a lot of
statistics about English.
If you want to encode English,
you start out by making highly
simplifying assumptions, like
the assumption that we're
making here that we're
dealing with a
discrete memoryless source.
You then learn how to encode
discrete memoryless sources.
You then look at blocks of
letters out of these sources,
and if they're not independent
you look at the probabilities
of these blocks.
If you know how to generate an
optimal code for IID letters,
then all you have to do is take
these blocks of length m
where you'd have a probability
on each possible block, and
your generate a code
for the block.
You don't worry about the
statistical relationships
between different blocks.
You just say well, if I make my
block long enough I don't
care about what happens at the
edges, and I'm going to get
everything of interest.
So the idea is by starting out
here you have all the clues
you need to start looking at
the more interesting cases.
As it turns out with source
coding there's another
advantage involved -- looking
at independent letters is in
some sense a worst case.
When you look at this worst
case, in fact, presto, you
will say if the letters are
statistically related, fine.
I'd do even better.
I could do better if I took that
into account, but if I'm
not taking it into account,
I know exactly
how well I can do.
So what's the definition
of that?
Source output is an unending
sequence -- x1, x2, x3 --
of randomly selected letters,
and these randomly selected
letters are called
chance variables.
Each source output is selected
from the alphabet using a
common probability measure.
In other words, they're
identically distributed.
Each source output is
statistically independent of
the other source outputs,
x1 up to x k plus 1.
We will call that independent
identically distributed, and
we'll abbreviate it IID.
It doesn't mean that the
probability measure is 1 over
m for each letter, it's not what
we were assuming before.
It means you can have an
arbitrary probability
assignment on the different
letters, but every letter has
the same probability assignment
on it and they're
all independent of each other.
So that's the kind of source
we're going to be
dealing with first.
We will find out everything we
want to know about how we deal
with that source.
You will understand that source
completely and the
other sources you will half
understand a little later.
