I’m assuming that everybody here is coming
in from part 1.
We were talking about Hamming codes, a way
to create a block of data where most of the
bits carry a meaningful message, while a few
others act as a kind of redundancy in such
a way that if any bit gets flipped, either
a message bit or a redundancy bit, anything
in this block, a receiver is going to be able
to identify that there was an error and how
to fix it.
The basic idea presented there was how to
use multiple parity checks to binary search
your way down to the error.
Now, in that video the goal was to make Hamming
codes feel as hands-on and re-discoverable
as possible, but as you start to think about
actually implementing this, either in software
or hardware, that framing may actually undersell
how elegant these codes really are.
You might think that you need to write an
algorithm that keeps track of all possible
error locations and cuts that group in half
with each check, but it’s actually way,
way simpler than that.
If you read out the answers to the four parity
checks from we did in the last video all as
1’s and 0’s, instead of as yesses and
nos, it literally spells out the position
of the error in binary
For example, the number 7 in binary looks
like 0111, essentially saying it’s 4+2+1.
And notice where the position 7 sits: It does
affect the first of our parity groups, and
the second, and the third, but not the last.
So reading the results of those four checks
from bottom to top indeed does spell out the
position of the error.
There’s nothing special about the example
7, this works in general, and this makes the
logic for implementing the whole scheme in
hardware shockingly simple.
Now if you want to see why this magic happens,
take these 16 index labels for our positions
but instead of writing them in base 10, let's
write them all in binary running from 0000
up to 1111.
As we put these binary labels back into their
boxes, let me emphasize that they are distinct
from the data that's actually being sent;
they’re nothing more than a conceptual label
to help you and me understand where the four
parity groups came from.
The elegance of having everything that we’re
looking at being described in binary is maybe
undercut by the confusion of having everything
we’re looking at being described in binary.
It’s worth it, though.
Focus your attention just on that last bit
of all of these labels, and then highlight
the positions where that final bit is a 1.
What we get is the first of our four parity
groups, which means you can interpret that
check as asking: “hey, if there’s an error,
is the final bit in the position of that error
a 1?”.
Similarly, if you focus on the second to last
bit and highlight all the positions where
that's a 1, you get the second parity group
from our scheme.
In other words, that second check is asking:
“Hey, me again, if there’s an error, is
the second to last bit in its position a 1?”.
And so on, the third parity check covers every
position whose third to last bit is turned
on... and the last one covers the last 8 positions,
those ones whose highest order bit is a 1.
Everything we did earlier is the same as answering
these four questions, which in turn is the
same as spelling out a position in binary.
I hope this makes two things clearer: The
first is how to systematically generalize
to block sizes that are bigger powers of 2.
If it takes more bits to describe each position,
like 6 bits to describe 64 spots, then each
of those bits gives you one of the parity
groups that we need to check.
Those of you who watched the chessboard puzzle
I did with Matt Parker might find all this
exceedingly familiar.
It’s the same core logic, but solving a
different problem, and applied to a 64 square
chessboard.
The second thing I hope this makes clear is
why our parity bits are sitting in positions
that are powers of two, for example, 1, 2,
4, and 8.
These are the positions whose binary representation
has just a single bit turned on.
What that means, is that each of those parity
bits sits inside one and only one of the four
parity groups.
You can also see this in larger examples,
where no matter how big you get, each parity
bit conveniently touches only one of the groups.
Once you understand that these parity checks
that we’ve focused so much of our time on
are nothing more than a clever way to spell
out the position of an error in binary, well
then we can draw a connection with a different
way to think about Hamming codes, one that
is arguably a lot simpler and more elegant,
and which can basically be written down with
a single line of code.
It’s based on the xor function.
Xor, for those of you who don’t know, stands
for “exclusive or”; when you take the
xor of two bits, it's going to return a 1
if either one of those bits is turned on,
but not if both are turned on or if both are
turned off.
Phrased differently, it’s the parity of
these two bits.
As a math person, I prefer to think about
it as addition mod 2.
We also commonly talk about the xor of two
different bit strings, which basically does
this component-by-component; it’s like addition
but where you never carry.
Again, the more mathematically inclined might
prefer to think of this as adding two vectors
and reducing mod 2.
If you open up some python right now and you
apply the carat operation between two integers,
this is what it’s doing but to the bit representations
of those numbers under the hood.
The key point for you and me is that taking
the xor of many different bit strings is effectively
a way to compute the parities of a bunch of
separate groups, like so with the columns,
all in one fell swoop.
This gives us a rather snazzy way to think
about the multiple parity checks from our
Hamming code algorithm as all being packaged
together into one single operation.
Though at first glance it does look very different.
Specifically, write down the 16 positions
in binary, like we had before, and now highlight
only the positions where the message bit is
turned on to a 1.
And then collect these positions into one
big column and take the xor.
You can probably guess that the four bits
sitting at the bottom, as a result, are the
same as the four parity checks that we’ve
come to know and love, but take a moment to
think about why, exactly.
This last column, for example, is counting
all of the positions whose last bit is a 1,
but we're already limited only to the highlighted
positions, so it’s effectively counting
how many highlighted positions came from the
first parity group.
Does that make sense?
Likewise, the next column counts how many
positions are in the second parity group,
the positions whose second to last bit is
a 1, and which are also highlighted.
And so on, it’s really a small shift in
perspective on the same thing that we’ve
been doing.
And so you know where it goes from here, the
sender is responsible for toggling some of
the special parity bits to make sure this
sum works out to be 0000.
Now once we have it like this, this gives
a really nice way to think about why these
four resulting bits at the bottom directly
spell out the position of an error.
Let's say some bit in this block gets toggled
from a 0 to a 1, what that means is that the
position of that bit is now going to be included
in the total xor, which changes the sum from
being 0 to instead being this newly included
value, the position of the error.
Slightly less obviously, the same is true
if there’s an error that changes a 1 to
a 0.
You see, if you add bit string together twice,
it’s the same as not having it there at
all, basically because in this world 1+1=0.
So adding a copy of this position to the total
sum has the same effect as removing it, and
that effect again is that the total result
at the bottom here spells out the position
of the error.
To illustrate how elegant this is, let me
show that one line of python code I referenced
before, which will capture almost all of the
logic on the receiver’s end.
We'll start by creating a random array of
16 1’s and 0’s to simulate the data block,
and I'll go ahead and give it the name "bits".
But of course, in practice, this would be
something that we’re receiving from a sender.
And instead of being random, it would be carrying
11 data bits together with 5 parity bits.
If I call the function “enumerate(bits)”,
what it does is pair together each of those
bits with a corresponding index, in this case
from 0 up to 15.
So if we then create a list that loops over
all of these pairs, pairs that look like (i,
bit), and then we pull out just the i value,
just the index....well it’s not that exciting,
we just get back those indices 0 through 15.
But if we add on the condition to only do
this “if bit”, meaning if that bit is
a 1 and not a 0, then it pulls out only the
positions where the corresponding bit is turned
on.
In this case it looks like those positions
are at 0, 4, 6, 9, etc.
Remember, we want to collect together all
of those positions, the positions of the bits
that are turned on, and then xor them together.
To do this in python, let me first import
a couple helpful functions...that way we can
call “reduce” on this list and use the
xor function to reduce it.
This basically eats its way through the list,
taking xors along the way.
If you prefer, you can explicitly write out
that xor function without importing from anywhere.
So at the moment it looks like if we do this
on our random block of 16 bits, it returns
"9", which has the binary representation 1001.
We won’t do it here, but you could write
a function where the sender uses that binary
representation to set the four parity bits
as needed, ultimately getting this block to
a state where running this line of code on
the full list of bits returns a 0.
This would be considered a well-prepared block.
Now what’s cool is that if we toggle any
one of the bits in this list, simulating a
random error from noise, then if you run this
same line of code what it does is it prints
out that error.
Isn’t that neat?
You could get this block from out of the blue,
run this single line on it, and what it'll
do is automatically spit out the position
of an error, or a 0 if there wasn't any.
And there’s nothing special about the size
16 here, this same line of code would work
if you had a list of, say, 256 bits.
Needless to say, there is more code to write
here, like doing the meta-parity check to
detect two-bit errors, but the idea is that
almost all of the core logic from our scheme
comes down to a single xor reduction.
Now, depending on your comfort with binary,
and xors, and software in general, you may
either find this perspective a little bit
confusing or so much more elegant and simple
that you’re wondering why we didn’t just
start with it from the get-go.
Loosely speaking, the multiple parity check
perspective is easier to think about when
implementing Hamming codes in hardware, and
the xor perspective is easiest to think about
when doing it in software, from kind of a
higher level.
The first one is easiest to actually carry
out by hand, and I think it does a better
job instilling the core intuition behind underlying
all of this, which is that the information
required to locate a single error is related
to the log of the size of the block, or in
other words, it grows one bit at a time as
the block size doubles.
The relevant fact here is that that information
directly corresponds to how much redundancy
we need.
That’s really what runs against most people
knee jerk reactions when they first think
about making a message resilient to errors,
where usually copying the whole message is
the first instinct that comes to mind.
And then, by the way, there’s this whole
other way that you sometimes see Hamming codes
presented where you multiply the message by
one big matrix, it's kind of nice because
it relates it to the broader family of "linear
codes", but I think that gives almost no intuition
for where it comes from or how it scales.
And speaking of scaling, you might notice
that the efficiency of this scheme only gets
better as we increase the block size.
For example, we saw that with 256 bits, you're
using only 3% of that space for redundancy.
And it just keeps getting better from there.
As the number of parity bits grows one-by-one,
the block size keeps doubling.
And if you take that to an extreme, you could
have a block with, say, a million bits, where
you would quite literally be playing 20 questions
with your parity checks, and it uses only
21 parity bits.
And if you step back to think about looking
at a million bits and locating a single error,
that genuinely feels crazy.
The problem, of course, is that with a larger
block, the probability of seeing more than
one or two bit errors goes up, and Hamming
codes do not handle anything beyond that.
So in practice, what you’d want is to find
the right size so that the probability of
too many bit flips isn’t too high.
Also, in practice errors tend to come in little
bursts, which would totally ruin a single
block, so one common tactic to help spread
out a burst of errors over across many different
blocks is to interlace those blocks, like
this, before they're sent out or stored.
Then again, a lot of this is rendered completely
moot by more modern codes, like the much more
commonly used Reed-Solomon algorithm, which
handles burst errors particularly well, and
it can be tuned to be resilient to a larger
number of errors per block.
But that’s a topic for another time.
In his book “The Art of doing Science and
Engineering”, Hamming is wonderfully candid
about just how meandering his discovery of
this code was.
He first tried all sorts of different schemes
involving organizing the bits into parts of
a higher dimensional lattice and strange things
like this.
The idea that it might be possible to get
parity checks to conspire in a way that spells
out the position of the error only came to
Hamming when he stepped back after a bunch
of other analysis and asked: “okay, what
is the most efficient I could conceivably
be about this?”
He was also candid about how important it
was parity checks were already on his mind,
which would have been way less common back
in the 1940s than it is today.
There are like half a dozen times throughout
this book that he references the Louis Pasteur
quote “luck favors a prepared mind."
Clever ideas often look deceptively simple
in hindsight, which makes them easy to underappreciate.
Right now my honest hope is that Hamming codes,
or at least the possibility of such codes,
feels almost obvious to you.
But you shouldn’t fool yourself into thinking
that they actually are obvious, because they
definitely aren’t.
Part of the reason that clever ideas look
deceptively easy is that we only ever see
the final result, cleaning up what was messy,
never mentioning all the wrong turns, underselling
just how vast the space of explorable possibilities
is at the start of the problem-solving process,
all of that.
But this is true in general, I think for some
special inventions, I think there’s a second,
deeper reason that we underappreciate them.
Thinking of information in terms of bits had
only really been coalesced into a full theory
by 1948, with Claude Shannon’s seminal paper
on information theory.
This was essentially concurrent with when
Hamming developed his algorithm.
This was the same foundational paper that
showed, in a certain sense, that efficient
error correction is always possible, no matter
how high the probability of bit flips, at
least in theory.
Shannon and Hamming, by the way, shared an
office in Bell Labs, despite working on very
different things, which hardly seems coincidental
here.
Fast forward several decades, and these days
many of us are so immersed in thinking about
bits and information that it's easy to overlook
just how distinct this way of thinking was.
Ironically the ideas that most profoundly
shape the ways that a future generation thinks
will end up looking, to that future generation,
well, simpler than they really are.
