>> 
Welcome. I'm Peter Allen. I'm the director
of Google University and I would like to welcome
you. Can't hear? Is this helpful? Okay. Great.
Thanks for letting me know. I'm the director
of Google University and I'd like to welcome
you to one of the very first G2G classes,
Googlers teaching Googlers. It's a series
we are piloting this quarter and we're very
excited about that Steve is teaching this
class and the fact that you're here. Just
wanted to take five seconds for a few administrative
details; one is there is a sign up sheet going
around. If you would be kind enough to check
your name off or add your name if it's not
there; that would be helpful to us. Another
is that we will send around an internet-based,
web-based evaluation at the end of the course.
That will be also helpful in terms of keeping
track. And lastly, just to encourage you guys
to think about teaching a course yourselves.
We instigated this series because we believe
there's a wealth of knowledge inside the Google
population and a great desire for learning.
And so we have an e-mail alias, G2G_proposals,
and we'll be sending out more requests. But
I just thought, since we're here in person,
we'd ask you to think ahead. So, without any
further ado, welcome to Theory and Practice
of Cryptography and Steve Weis.
>> WEIS: Hi, everybody. My name is Steve Weis
and I'm glad to see all these people out here
for this class. I'm really excited to teach
it. And I think it'll be really interesting
and a lot of fun. This lecture today, excuse
me, is going to be not as technically oriented.
It'll be a little bit historical and kind
of just give you an overview of some of the
modern cryptography, primitive constructions.
So to some of you, this may be a review; some
of it, if you're not an engineer, it might
be new material. So if you're following on
the VC or watching us on video, all the course
material is going to be at this website: "http://go/cryptocourse."
There's a present.ly link right now so you
can follow along with the slides if you're
watching live and all the course material
will be posted here throughout the rest of
the course. There's a first exercise set that's
up and some of you already started on that.
This is, you know, pretty hands on. There's
a couple of parts that I'll talk about in
the middle of the lecture and it should be
fun. And you're encouraged to work with others
on it. There's no deadline, you're not turning
it in. And that's kind of the theme for the
whole course is come to the lectures you're
interested in, do the assignments that interest
you, and feel free to ask questions. There's
also a QDB link on this page and present.ly
has got this little chat window if you want
to join in. There are some people in there
already. So feel free to add a QDB question
or whatever you like as we go through the
lecture. So, you know, as I said, this lecture
is going to be more historical, kind of general.
Next week will be more engineering oriented
and I'll actually talk about using crypto
at Google and, if you're writing code, what
kind of things to avoid, what kind of things
to do. So that'll be definitely more engineering-oriented,
so you're welcome to that if, you know, whether
you're an engineer or not. Third week, I'm
going to focus on more theoretical cryptography
and talk about some more of the math and the
underlying definitional stuff. So it'll be
interesting if you're interested in more of
the math side or the kind of the theory of
computation side. And the last week, it's
going to be a special topic and I'm going
to let you chime in on what kind of things
you're interested in. So if you go to the
QDB page for the final week, which is also
on the main course page, you could submit
some topic ideas and vote on them. And I put
a few up there myself but feel free to add
anything you're interested in and we could
do a lecture on that. Depending what it is,
I might have a guest lecturer that week, so
that will be good general lecture that people
should be interested. So, you know, getting
started, like, what is cryptography? And kind
of the classical definition is that the roots
of the word are secret writing. And, you know,
historically, this has been just keeping messages
secret, so it's a military type thing where
you want to send messages out to the field
and don't want your adversaries to intercept
them and know what's going on. And that's
kind of the classic informal definition of
inscription is taking a message and rendering
it unreadable to somebody who doesn't share
a secret. Now, a kind of related thing is
steganography and that comes up a lot in,
like, popular media and you'll see it a lot.
And I'll kind of distinguish that later in
the lecture. That's kind of hiding messages
in plain sight. So I'll give a few examples
of that later in the talk. So, one of the
earliest kind of known ciphers that have come
up in history is called scytale. And this
is basically just a stick that you wrap either
a piece of paper--or on this picture, it's
a piece of leather around it--and write your
message on it across the different wraps.
And if you think about this, the only secret
in the system is the size of the stick so
if you get one of these messages that's just
a piece of leather, piece of paper, just try
a bunch of stick sizes. And, you know, that's
kind of the secret key to share this. If you
and I want to communicate a message with a
cipher, the secret key is the stick size so
it's not too hard to break and it's obviously
very primitive but this is called a transposition
cipher. So you're basically taking these letters
or characters and just reordering. You're
not substituting anything. It's just kind
of the jumble that's in the newspaper every
week, some words that are scrambled up. So
a little bit more advanced is called the Caesar
cipher and this is just a substitution. So
basically, you take a letter and then you
replace it with a letter that's three letters
forward in the alphabet. And this is also--ROT13
is the same thing; it's just a simple substitution.
And if you think about this, the key here
is just how many letters you're shifting over.
In this case, it's the value of three. It
could be minus three, it could be 26. And
this level [INDISTINCT] is one of the assignments
on the exercises that you'll get to do. So
this is a very classic cipher that comes up
a lot. It's very simple and, obviously, knowing
the key is very trivial to kind of brute force
through. Now a little bit more--this is another
example of a cipher. So back in the '70s,
there was a Zodiac Killer in the Bay Area
and he would send in these messages to the
newspaper like the San Francisco Chronicle.
And they had these ciphers in there. And this
is one of the earlier ones that was unbroken
for a while. And some amateur up in Marin
just started, you know, pen and paper, trying
to guess what the symbols were. And so what
he's done is actually taken--a little more
complex. He's taken each alphabet letter and
has multiple symbols that it could be. So,
like, the letter E might have five different
symbols and you can used them each interchangeably.
So he was actually kind of smart with it as
he took the most common English letters like
E,I,T,O,N which appear much more than other
letters like Q and Z, and he gave them more
symbols. So if you look at the statistical
analysis of this, these symbols don't appear
as much. So I said, this one has been cracked
and you can read about it online. There are
still some out there that haven't been cracked.
So if you want to go play around with some
ciphers, this is, you know, out there and
it's been kind of sitting there for about
20 years now. And so, there's more complex
ones. So after they broke this, he, you know,
went back and did something else. And, you
know, I imagine that some government agency
was probably able to break this but no one
has really bothered. But anyway, it's out
there and it's something you could look up,
the Zodiac cipher. So a little more complicated
than that is called a--I'm going to butcher
the French name, but it's a polyalphabetic
cipher. So the idea here is that your key
is still basically just doing a substitution.
But you're doing a different substitution
for each one. So in this example here, they
key is GOOGLE and the plaintext is BUYYOUTUBE;
that's our secret that's were transmitting.
And so the first character, you're going to
shift down by G. So in this case, you know,
you go either to the lower column, that's
B and go to the corresponding lower column
that's G, and the letter that's--the letter
that's there is going to be H. And if you
keep doing this and you get to the end of
your key, what's going to happen is it's just
going to reuse your key. So if you're going
to say GOOGLE GOOGLE GOOGLE is your key, it'll
cross your whole message. And so, this is
about from the 19th century in France and
they--somebody published this and there's
different variations on this where you might
have different, you know, amount of shifts
in the table. But it's basically this idea
that you're shifting and it's kind of depending
on the position of where you are in your cipher
and it's kind of just going back and forth.
And this has come up in practice too and I'll
talk about that in a minute. So, as I said,
this idea of this polyalphabetic cipher was
used in mechanical cryptosystems as well.
So this is very famous encryption machine
called The Enigma and it was from World War
II, and it's been from a lot of movies and
books and things like that talking about this.
And what is this machine you're looking at?
It's got this keyboard and, you know, it's
just a regular keyboard and these rotors.
And basically, the configuration of these
rotors is your secret key. So, what's going
on in this picture is that somebody has typed
in the letter A and these rotors are spinning
around, and the signal of that A, its electric
signal, is going to go through in these different
configurations and it's going to come out
an output of G. And so it's just kind of the
same idea as this table. It's, you know, you've
got a key that's in a certain state and before,
it was the position of Google that you're
in and now it's the position of its rotors
but that's going to change as this goes through.
And so if you'd noticed, the rightmost rotor
has advanced a position. So if you see the
two wires on the bottom are now shifted down
one and the bottom wire has gone to the top
and now when you hit the same letter A again,
a different letter comes out. And at its core,
this is, you know, the same thing as that
previous polyalphabetic cipher and the keys
here were they had a bunch of different rotors
that you could substitute in. So there'd be
many different configurations and people would
plug in these different rotors. So both with
this and kind of the previous, you know, pen
and paper ciphers, the way that people would
break these is that even with, you know, this
complexity here, the underlying kind of patterns
and information in the message would still
be visible. People could do analysis of what
the message looks like. This is part of one
of the exercises on the thing I've posted
is that you're going to actually break a message
by doing a little statistical analysis, trying
to figure out what it is, because you know
that the message is in English. And so with
this particular example, what they would do
in World War II would--you know, the German
submarines would have this and they'd surface
and give their weather report everyday. So,
in the North Atlantic, the weather doesn't
change much, so pretty much everyday, this
submarine would surface and they would say
it's cold and gray outside and, you know,
that's pretty much the weather report every
single day. So every day they knew essentially
what was going into this cipher. They even
know, okay, well, this person is encrypting,
you know, "It's cold and gray out every single
day." And sometimes to do this, they would
also go out and create an event. They would
go have a plane fly over and then, you know,
10 minutes later, you have this, like, encrypted
radio signal that would say something and
they would know what the word plane is in
there and from that, they could try to determine
on what rotors had actually generated that
message. And this is something that Alan Turning
was involved and it led to a lot of the development
of modern computing was that, you know, by
the end of the war, they had build a machine
that would be able to crack the daily key
by generating enough of this known plain text.
And so it's kind of a--you know, it was how
you would actually break keys in the past
and still kind of a big part of cryptic analysis
is knowing what to look for and knowing, kind
of looking for patterns in there and trying
to actually, you know, maybe give somebody
some plain text that they're going to encrypt
and get them to encrypt something that you
know what they're doing. So kind of a side
topic I wanted to mention because it comes
up a lot in--you know, internet people always
talk about it and there's a lot of things
are--is steganography. So kind of classically,
this is the idea that you're hiding a message
in plain sight. And so a classic one is this
Herodotus and he had a slave and what he'd
do is shave his head and tattoo the message
on there and the hair would grow back and
then he'd send the slave with the message.
So, the message is just there but it's--you
have to know where to look for it and they'd
also do this thing where they would take a
tablet and carve in a message on the wood
and then put some wax on the top of it. It
would just look like a wax tablet. But if
you knew what you're looking for, you'd melt
off the wax. And so, you know, this isn't
really somebody that's been done a lot but
it has come up in kind of history and so some
things like invisible ink and microdots. Microdots,
what they are is basically--it's a message
that's been shrunken down to a very tiny dot.
So I'd send you a letter and the period at
the end of the message would be this incredibly
small, like microfilm picture and then when
you got the letter, you would just magnify
that up and then it would be, you know, microfusion
there. And that was used in World War II;
people would send letters with microdots in
them. Another one here, I've got the finger.
So this is kind of stretching a little bit
but there was a case when there was some POWs
in the Korean War and they took them out,
like, you know, like, the North Koreans took
a picture of them, you know, as the prisoners
and the guys just flipped the bird in the
camera. And the North Koreans didn't know
what that was but it was kind of like, you
know, a message back. If you know what you're
looking for, if you know what that means,
you know, it's a signal to the U.S. that you're
doing fine or something like that. And prison
gangs used steganography a lot, so they'll
have letters that are censored and red through.
And they'll do things with the letters that
are actually, you know, different fonts or
different size or different kernings. You
know, they're pretty basic and they've--prison
guards know about this stuff but people are
actually doing this in practice. And one that
always comes up, it's in movies all the time
and people, you know, there's slashed out
articles about it are low order bits and images.
So you take an image, you tweak the low order
bits and the message, you know, if you look
at the image of the message in there, because
you're changing a little minute part of the
actual image but you could hide messages in
plain sight that way. So people have done
research where they've gone out and, you know,
looked at a bunch of images and looked at
lower bits and nobody's really found anything
because, you know, it's probably easier to
just send an encrypted message directly. And
so this is a kind of orthogonal to cryptography
as well. You can put in an encrypted message,
you know, in a steganographic message. And
the reason people might want to do that is
if you live in a country where cryptography
is illegal which is still a lot of places.
You can't, you know, send out encrypted messages
from China, as far as I know, or Saudi Arabia.
So there's a lot of countries where you may
want to do this. And, you know, so that's
a little bit of a side note. And, you know,
if you noticed, maybe there's a message in
this slide or something, so you can look at
it later. So, another thing that comes up
are kind related to our codes. So codes are
a little bit different than ciphers is that
the idea is you're going to have a codebook
of words that are symbols or sentences that
are predetermined. And, you know, rather than
replace individual symbols, you're placing,
like, strings of the symbols. And so the line
is a little bit blurred because, you know,
they're very similar. But, you know, examples
of codes might be like Paul Revere, like,
one if by sea, two if by land. Like, you've
got these predetermined messages and they're
like "One of by sea" means the British are
invading by sea and "Two if by land" means,
like, you know, British invading by land.
And so you've established this kind of codebook
ahead of time and you have to know the entire
contents of the codebook to, you know, do
this message. And, you know, if the British
did something that wasn't in the codebook,
you don't have any way to convey that. So,
other examples are the a Beale code, and this
was something back in the 19th century where
this guy named Beale released a pamphlet that
had, you know--it said it was like map to
gold or something. And what he would do is
take a book, an actual, you know, book or
a source material and the code will basically
say, you know, go to page 55 and take, like,
the 30th word. So the message was just, you
know, 55, 30 and then like 120, 40 and so
on and so on. So, one of those messages was
actually decoded because of the Declaration
of Independence. So that was one and that
was, like, the first part of the message and
then the rest of the Beale code is, I think,
maybe one more has been decrypted, but there's
a couple more that haven't. And there's a
lot of speculation on it, if it's a hoax or
if there's any gold. And people have looked
at it and there's anachronisms because, like,
the version of the Declaration of Independence
he used was some weird misprintings or kind
of said this was actually after it was supposed
to be published. So that's kind of a code
too, because to do this, you need to actually
have the same copy of the book to decode the
message. Now another one, I'm going to actually
play a little audio here. This is called a
number station. And so basically, you know,
during the Cold War and currently, there would
be these radio stations that would just read
out numbers all the time or at predesignated
times. And I'll try to play one now. So this
is just a--turn it down a little [INDISTINCT].
This is a German voice reading out numbers
and letters. I don't speak German. But there's
all sorts of these stations that are all over
the world and there are still some in existence
if you're a short wave radio operator. And,
you know, the non-intelligence community doesn't
really know what they are but you can--there's
a link to one of them here. You can find more
on that page. But the speculation is that
these are kind of predesignated codes that
the agents would have saying, you know, if
they say 2,3,4,5, then, you know, go assassinate
so on so. The other speculation is they might
be one-time pad encrypted things. I'm going
to talk about that in the next slide. And
so the last bullet point here is called ECB
mode and I'm going to talk more about that
next week, but that means electronic codebook
mode and that's a cipher mode that I'll talk
about more in the more technical talk next
week. So, I mentioned one-time pads and I
want to mention this principle about cryptosystems.
And it's Kerckhoffs' Principle and there's
also--it's called Shannon's Maxim. And basically,
the security of any cryptosystem should only
depend on a secret key. Shouldn't depend on
secrecy of the actual, you know, cryptosystem
and how it works. So, if you think about steganography
or codes, all--that relies on actually knowing
not where to look or not knowing that this
is the codebook. I mean, you could consider
the whole codebook to be a big secret but
kind of the defining thing is that I could
publish this algorithm or say, "Hey, this
is how I'm going to do my encryption, but
I'm going to keep the key secret." And the
key is just the core of the security in the
system. So, the kind of more way to say this
is like don't rely on obscurity for your security
so, you know, publish your algorithms and,
you know. Basically, the only thing that should
have any--excuse me. The only thing that security
should rely on is a small secret key. And
this kind us brings us to one-time pads. See,
the idea of a one-time pad is that this actually
comes from the idea that you had a physical
pad of paper. And going back to those number
stations, this might be that somebody goes
into a country and we got this pad of paper
and there's a bunch of letters or values on
there that are completely random. And when
they hear this message, they just add the
message with the values on the paper and decrypt
it. So this was, like, you know, in binary,
this would be an XOR but on the assignment
on the exercise, it's just going to be adding
letters together. So the idea here is that
if your key is actually random and has been
shared securely, this is called information
theoretically secure. So, there's no computation
that could break this. And the idea is that
if I have messages to coin, you know, if I
have a bit and I have a secret key that we
share and I say heads, heads is the key plus
the coin, you know, the key could be--it's
50-50 that it could be heads or tails, so
there's no information that can be determined
by actually listing this as long as the keys
are secure. So, what--you know, if this is
information theoretically secure, why isn't
crypto just solved? Why don't we just do one-time
pads and, you know, be done with it. Now,
the problem here is that you have to transmit
a large secret message in order to transmit
a large secret message. So how do you actually
get these one-time pads out to people in the
world to start with? And you do it ahead of
time. You know, maybe you can just do this
a year in advance and have a stack of pads
that you're going to use but it's a major
problem of doing this. And another thing is
that you actually have to do this per message
so, you know, I have to do this for everyone
I talk to you for every message I send. I
can't reuse these pads. And the second exercise
shows what happens when you reuse these pads.
If you do that, you can actually break a message
because somebody reused a one-time pad. And
one of the problems of these keys are as big
as the messages, so your communication is
now doubled in size. Rather than, you know,
us sharing a very short cipher secret or secret
key, we have to share these huge one-time
pads to keep them secure. So, like, where
do I keep these secure messages if I'm some
spy who's behind enemy lines? So one-time
pads are--what is this? [INDISTINCT]. You
know, one-time pads are information theoretically
secure, but more or less unusable in practice.
I mean, there may be some applications where
it could make sense but in general practice,
they are secure. So this is like kind of the
one pitfall that a lot of kind of like amateurs
on the internet will fall into. They'll say,
"Oh, it's encrypted with a one-time pad. It's
secure," and blah, blah, blah. But, you know,
there's a lot of things to think about with
key distribution. So, you know, we've talked
about a lot of the classical crypto. And what
are some of the problems with this? So, you
know, going back to, like, those mechanical
ciphers and the, you know, the Zodiac cipher
and all this stuff, one big problem is these
are just weak. You know, these were mechanical
systems, they're pen and paper systems and
now with the advent of modern computation,
these are very weak and easy to break. So,
you know, the Enigma Machine that was broken
by kind of a modern computer and, you know,
these are just not going to withstand, you
know, modern computing power. Another problem
is that these are pretty informal. Like there's
no--in the past, there was no real formal
way of saying what it means to be secure and
what is a cipher and what is signature and
what are this things. And that's kind of a
recent development in cryptography, recent
being the last 25, 30 years. And I'll talk
more about that in the 3rd lecture about some
of the actual formal definitions and security
definitions and what it means to actually
be secure and, you know, being able to base
your security on some problem. I'll talk a
little bit about that today. Another issue
in the classic is that, you know, the knowledge
and the technology is very closed. It was
available to militaries and intelligence agencies
and maybe, you know, large companies like,
you know, IBM of the world, and it was not
available with the average citizen. Like,
if you wanted to communicate securely to your
friend, not have your government listen, you
didn't really have any options. I mean, you
could try one of these pen and paper schemes
but, you know, they've got all the resources.
They've got the knowledge, they've got the,
you know, the resources to actually break
a lot of this cryptography. So, finally, a
big problem if you think about the one-time
P--one-time key is key distribution. If you
think about the number of people in the system,
you know, even if we had a cipher where we
shared a secret key, there's N people and
everyone wants to communicate to everybody.
That's N squared keys that have to be distributed
and you still have this problem of how do
you actually get the key from me to you. If
you, you know--maybe an insecure channel all
the time, how do I actually ever get my secret
key to somebody else? And especially in a
network setting where I may never meet this
person, I may have no means of communication.
And how do I actually get my keys out to people?
So some things that kind of bring us into
the modern era, one of them is standardization
of strong cryptographic primitives. There's
actually--in modern times, we've got standards
that have been kind of been have vetted and
that people have actually looked at and have
been subject to public scrutiny. These are
publicly available. And it's kind a Kerckhoffs'
principle again is that everyone knows what
these standards are, everyone can look at
them, everyone can attack them and there's
no secret about what people are doing. All
the secrecy comes back to these keys. Another
big development that kind of leads this modern
era is public key cryptography. And I'll explain
what that is in a couple of slides. And as
I said before, the third thing is formalization
of security definitions, actually saying what
is secure, what is a cryptosystem. And growth,
you know, another thing that kind of brings
us in this modern era is the growth of personal
computers and the internet. You know, in the
past, there was really no way for an average
person to, you know, do any of these ciphers.
I mean, pretty much everything was pen and
paper. Maybe if you're smart, you can build
some rotor machine yourself but this technology
wasn't really even available to an average
person or maybe even, you know, small businesses
or typical people. But now almost everybody
is using crypto every day, you know, even
if you know it or not. And finally, kind of
something that also brings us to this modern
era is more recent, just the liberalization
of laws. I mean, for a long time, crypto was
illegal for regular people in a lot countries
and it was illegal to export in this country
until recently unless it was in weakened form.
So I think that this is one of the major developments
as well is that governments have kind of relaxed
and realized that it's good for people to
be open about this and have, you know, open
protocols and open standards. So, I mentioned
standardization. I'm just going to talk about
this for one slide so you kind of know what's
out there. And one of the big developments
was the data encryption standard. So this
is a cipher just like the kind of previous
ones we're talking about but it's strong.
It's something that's been designed to be
computed on modern computers and be resistant
to modern computers. And it was originally
designed by IBM, so the NSA kind of went out
and solicited people to design ciphers and,
you know, IBM came up with this one called
Lucifer. And the NSA went and tweaked with
it and then gave it back, said, "Okay, this
is--this is DES." And so people were very
suspicious about this. They thought that maybe
the NSA put in a back door or that they, you
know, did something to it to weaken it. And
one thing they did actually was shorten the
key length and they said, "Don't worry about
it, it's fine." So people were very concerned
about this. And it turns out that about, you
know, 15 years later in 1990, some researchers
discover this technique for cryptanalysis
called differential cryptanalysis. And then
they realized, looking back, that the original
Lucifer was vulnerable to this and DES was
not or not as much. So, basically, the NSA
knew about things 15 years prior to, like,
kind of public academia doing it and that's
why they, you know, changed some of the things
about it. So, you know, DES was 1970s technology
and by 1999, DES can be cracked in about a
day for $100,000. So now that it's, you know,
2007 and, you know, we've had computing power
become exponentially better, DES can probably
be broken much faster for--or for much less.
Now, there's a variant called triple DES which
is basically, you know, doing DES three times
and you effectively get a double the key size
and this is still used quite a bit. This is
still out there but people are generally trying
to move away from it. And the new standard
they're moving to is called the AES and this
is kind of the modern era to DES and it was
designed in a more public fashion. Basically,
they had a contest where a bunch of a researchers
submitted designs and then they kind of went
through a couple of rounds and got comments
on it and went back and forth and the end
result was actually developed by some European
researchers. And that's currently what is
now the government crypto standard and it's
out here and it's in, you know, large key
sizes, 128-bit up to, you know, 256 or bigger
key sizes. So I'm not really going to talk
anymore about particular standards there.
I just kind of want to give everyone an idea
of what's out there. And before I move on,
I'm going to--does anyone have any questions?
I'm just going to look at the chat on here
if anyone has any questions online. No. Okay.
So lets go back to this key distribution problem.
You know, if I want to communicate to somebody,
how do we actually agree on a key? If I'm
in one side of the world and somebody else
is on the other side, how do I get a key to
them in the first place? And what happens
if either party is compromised? If I'm talking
to somebody that I, you know--maybe I mailed
it by mail and we've got a shared key that,
you know--maybe it's a DES key or AES key.
If one of us gets compromised, then both of
our communication is compromised in both directions,
anything that was going back and forth. Oh,
what's happened? Sorry about this. And finally,
you know, I said this before, what happens
when a third person wants to join us? Well,
now each of us need to establish our own key
with her and we get N people in there, we
have to have N squared keys in the system
and I'd need to maintain all these keys for
each of these people and it kind of grows
quadratically. And every person who joins
the system, you need to create a whole new
set of keys for everyone else in the system.
So, in the '70s, a pair of researcher per
named Diffie and Hellman and--kind of the
stuff in yellow, you can--well, it can't really
show up as yellow. But, you know, the formula
of the stuff, that's kind of there for--if
you're interested in it. There's a couple
of researchers who came up with this idea
called the Diffie-Hellman key exchange. And
I've also got this guy called Ralph Merkle
on there and Williamson. So this is, again,
a technique that was invented in the intelligence
community by this guy Williamson a couple
of years before it was published in academia.
So, you know, they had the idea at--this is
actually in British intelligence and it's
possible it was invented independently and
the U.S. as well. But basically, this is a
protocol that you can share a secret with
a stranger. I'm going to go into a little
bit of the math and not go too deeply in it
today, but basically, Alice is going to pick
a group, and this is a mathematical group,
and a generator g which is the value in this
group and some random value of x. And, you
know, she's going to send Bob basically g
raised to the exponent x. And this is within
a group, so if you've kind of got some familiarity
with kind of, you know, modular math and things
like that, you could think of it mod prime,
but we needn't go into too many details. But
she sends this value g to the x to Bob. And
I've got the notation broken here, but he's
going to do the same thing and pick his own
exponent y and send it back to Alice. So Bob
has her capital A value, Alice has his capital
B value, and they raise each of these values
to their own exponent that they picked. So
in Alice's case, she's raising it to x; Bob
is raising it to y, and they end up with the
same value, g raised to the x and y, x times
y. And an eavesdropper, if you think about
it, who's listening on this channel, they
see four values. They see the group and the
generator which is, you know, basically just
some numbers or the group is a description
of a particular group with some numbers and
a and b which are these g's to the x, and
g to the y. And the question here is how hard
is it for--to actually compute the shared
value that they have, g to the xy? And this
turns out to be, to our knowledge right now,
as a hard problem. We don't know to do this
effectively. And it's called actually the
computational Diffie-Hellman problem and as
of now, we don't have efficient ways to do
this. And so somebody listening in on this
conversation isn't able to actually determine
the same shared secret as Alice and Bob. So
they know they share this value g to the xy
that no one else shares and they can use that
to determine a, you know, DES or AES key or
any other sort of shared secret. Now, you
know, does anyone see any problems with this
if Eve is not just an eavesdropper? Yeah,
there's a man in the middle attack. And so
what somebody could do on this is actually
sit on the channel and pretend like they're
Bob to Alice and that they're Alice to Bob
and just take their values and then forward
it--a different value and they're going to
know the shared secret that each person is
using. So this is--unresponsive script. You
know, this doesn't really solve the problem
completely. You still would need to establish
N squared keys for N people so you're going
to have to do this protocol with everybody.
And maybe you do it for every communication
session but, you know, it's not like you could
have one key that everyone in the world can
use. You know, this was invented in the '70s
and at that time, computation over these large
groups can be expensive and it still is relatively
expensive today. It's not--it's not especially
cheap to do this all the time. And finally,
as somebody mentioned, it's vulnerable to
a man in the middle attack. This is not, you
know--if it's like a wartime scenario or if
you have somebody active on the line, they
can sit in the middle of the line and switch
messages out and are actually able completely
to break this protocol. Now, since then, there
have been new protocols developed that are
resistant to man in the middle of the attack
and there's more modern options out there.
But this is kind of innovative because it's
the first time that two complete strangers
can actually, you know, establish a key where
somebody eavesdropping in the middle can't
determine that key. Now, there are some things
that this is based on a hard mathematical
problem. It actually has some basis saying
that if you can't solve this hard problem,
you can't break this protocol, you know, versus
some other scheme out there where, you know,
this is kind of the first time that you'd
actually have some sort of mathematical reason
why this would be secure. And if tomorrow
somebody figures out how to solve the, you
know, the discrete log problem or computational
Diffie-Hellman, you know, this would break.
But that's a hard problem that people are
looking at for a while, so you've got some
confidence that it's actually going to remain
hard. If you're interested in this, the original
paper is on the Wiki page. You can look at,
it's a fairly easy read. This is one of the
first kind of papers in the field and, you
know, if you're interested in that, you can
read through that. So we still got this key
problem. So, you know, what if you could have
a single key that's public that anyone in
the world could see and they could use it
to encrypt messages to you but not decrypt
your messages. So the idea here is like everyone
in the world might have a postbox out there
that you can drop letters in but no one else
could open the back of the postbox unless
they have the private key. So anybody can
send to anybody but only you can open your
own box. So this, you know, this crypto system
would consist of three algorithms, G, E and
D. And so the first thing Alice would do would
be to generate what's called a key pair and
so this, you know, hypothetical public key
and a secret key. And she would publish her
public key. She could put it up on her webpage,
in a phonebook. She could, you know, call
somebody over the phone and tell them what
their public key is. So now what Bob can do
is just take that public key and encrypt a
message. So anybody can use this public key.
There's nothing secret about it. You can put
it up on a billboard, it doesn't matter. And
he can, you know, turn it into ciphertext
C. So then he can send ciphertext to Alice
and only she can be decrypt it. She's going
to keep that secret key secret and is able
to obtain the original message that Bob sent.
So the question here is, you know, is this
public key encryption even possible? Now,
it's got some nice properties here because
everyone in the world could just publish their
own public key. You know, everyone could have
it in a phonebook or on a, you know, web directory,
and so it's nice. And I can--I could communicate
with a stranger with--you know, I could do
it offline. I could prepare my message and
send in the message. I don't need to have
this interactive protocol. I don't need to
ever to do any key exchange with them. I just
need their public key. But some there is,
you know, is this even possible? I mean, how--before
a certain time, it wasn't even known if you
could do this. Another thing is how do you
actually get Alice's public key? And, you
know, why do you trust that the ciphertext
has not been modified? I mean, just because
you encrypt something doesn't mean that the
ciphertext hasn't been messed with in transit.
And, you know, the second question here is
actually part of the exercises. You're going
to do one thing where you're going to get
my public key and use it to verify a message.
And somebody has already done it before the
class actually. So, yeah, the first question
is was this even possible? And in '77, RSA
which is Rivest, Shamir and Adleman showed
a scheme that could actually do this. But
again, this was actually invented in the intelligence
community in 1973. Clifford Cocks had a very--it
was kind of a rudimentary version. It wasn't
as generalized as RSA, but they knew about
this years in advance and I thought this pretty
interesting. But some of this is kind of discrete
math, so I'm not going to too much into it,
but here are the public keys that are going
to be this exponent and an N. And an N is
a large composite. And to encrypt a message,
I'm just going to raise that message to the
exponent and in this case, you know, there's
often three or like for mod prime but it's
just--it's kind of public value that everybody
knows and, you know, that's going to take
it mod this group N, which is a large composite.
And then to decrypt, I'm just going to raise
it to another exponent, d. And so again, you
can read the paper that's linked online. It's
pretty easy to read through because it's,
you know, again, one of the first papers in
this area and it's very interesting. Now,
some problems with this is that the ciphertext
is a fixed size. It's going to be as big as
this group and so you can't encrypt arbitrary
sized messages this way. Computation is still
relatively expensive. Back in the '70s, it
was very expensive and even today, it's expensive.
You do this over very large groups. And again,
why do we actually trust that the ciphertext
has not been modified? When I get this value
C, somebody might have done something to this.
So for example, I can multiply values in there
and change the value that you're going to
get that you're going to decrypt. So there's
no notion of authentication here. And, you
know, a more technical thing, this is called--this
is not what's called semantically secure.
And I'll talk about that in the third lecture,
but that has to do with this--with some of
the properties of the crypto system. But,
you know, now we've got a way to actually
do public key cryptography. So keep I coming
back to authentication. You know, what about--how
do we know Alice is actually Alice when we
talk her? And how do we know that a message
came from her and how do we know that wasn't
altered in transit? So, you know, if I get
a message claiming to be from Alice in the
previous message, how do I actually know it's
from her? How do I know it hasn't been altered
in transit by somebody else? So, another cryptographic
primitive that didn't really have an analog
in the classical era is authentication primitives
and one of them is a message authentication
code. And this is very similar to a cipher,
so it's kind of similar to DES, AES, where
Alice and Bob will share a secret key. And
there will be two functions, sign and verify,
and basically anybody who shares the secret
key can sign a message and produce a signature.
I'm using signa or sigma for that. And then
if I give you the message in the signature,
you can take the secret key and verify it.
And so these are often constructed from other
primitives. There's a lot of literature on
MACs and they're basically an analog of ciphers.
And so what I could do is I could send you
a message that's encrypted along with, you
know, a signature on it and you both know
that the messages have been altered. And because
you believe that I share a key with you, you
know that somebody who shares a key with you
actually produced that. So it doesn't really
identify that person who sent it but you know
that, you know, whoever sent it shares a key
with you. And again, this has the same key
distribution problems of ciphers. If you want
to be able to sign a message to anybody in
the world, you need to create, you know, N
square or N keys for everyone in the world
and it's the exact same key distribution problems.
So like with public key encryption, we could
talk about public key signatures. So it's
a very close analog and, you know, the idea
here is that anybody in the world should be
able to sign the message with a secret key
this time and that anybody in the world can
verify it with a public key. And so this is
what I've done. I actually sent a message
out to the mailing list that's signed with
my secret key and anybody here can go grab
my public key and verify the message actually
came from the person who knows the--of course,
my secret key. So, you know, like a public
key scheme, this is going to consist of three
algorithms. One of them generates a verifying
key and a secret key or a signing key, and
then Alice will now, rather than publish the
public key, she'll publish the verifying key
so anybody in the world can verify. And to
sign a message, she'll just, it's the same
type of operation; she'll take her secret
key or signing key and apply it to the message
and output a signature. And then Bob--I don't
have the signature there, but he can take
a message and a verifying key and the signature
and it'll say, you know, is this valid or
not valid? And so this is a nice primitive
that allows anybody--yes.
>> I was just wondering why is the problem
created [INDISTINCT]?
>> WEIS: You could do it and I think it just--there's
two issues here. One is that it's just the
complexity of it. I need to manage N keys
instead of--well, you know, actually, public
key system, you would still be managing N
keys yourself but you would need to create
and distribute N keys. So it takes away one
side of it where we each create one key and
need to store N keys for our friends. So kind
of rather than having--everybody has to store,
like I said, everyone does have to store N
keys. So it takes--makes it easier to create
your own key and manage your own secret key,
so instead of having to manage N secret keys,
I need to manage one. Yes?
>> [INDISTINCT]
>> WEIS: I'm going to talk a little bit about
that.
>> [INDISTINCT]
>> WEIS: Yes. So I'm going to talk a little
bit about PKI in a minute and, you know, about
how do we actually trust these public keys.
Yes?
>> One of the reason is that if you want to
create a particular channel, all parties should
trust [INDISTINCT]?
>> WEIS: So yeah, what he was saying was that
if you wanted to do a key online or if you
want to establish a key with, like, Diffie-Hellman,
both parties would have to be present and
online and using a public key scheme, I can
prepare this offline. I could do it at home
and then when I log on, I can upload my messages
and send them to you. So--okay. So--yes. So
basically, public key signatures are kind
of an analogy of a MAC and it kind of brings
up the same questions. Is this even possible?
I mean, can I do something like this? And
this was a question, you know, early on. When
people kind of proposed these ideas, it wasn't
clear whether it was even possible and again,
we have this distribution problem. How do
I get my verification key out there? Why do
you believe it? And it turns out that this
is possible and RSA can actually also be used
as a signature scheme. And that's a very nice
property in this cryptosystem and it is why
that it's gotten so much usage and so much
attention. It that actually won a Turing Award
for the creators because it both can encrypt
and sign data. But it's only fixed size, so
I can only sign messages that are so big.
And how do I actually sign big messages? So
this leads into the next primitive I'm going
to talk about briefly which are message digests.
And there's kind a lot of overlap with message
digests and hash functions and a lot of kind
of collisions in the naming because there's
hash functions out there that are for hash
tables and, like, universal hash functions
which aren't security related. So message
[INDISTINCT] from message digests kind of
distinguished in that but basically it can
compress an arbitrary length input down to
a fixed size output and there's no keys involved.
So right now, there's standards out there
that are message digests. One of them is called
SHA1, Secure Hash Algorithm, one version,
and one that used to be used more was called
MD5 which is now broken. But there's basically
just algorithms out there that you can use.
And typically in cryptography, they want to
have one or two properties or both. And these
two properties are actually--you could have
one or the other or both together. It doesn't
necessarily imply each other. And one of them
is called one-wayness and I'm kind of informally
describing it. It basically means that if
I give you the hash output of something, it's
hard for you to invert it and find the pre-image.
And that's kind of often the way that people
think about it. There's like it's a one-way
function. The other one that's actually more
important for signatures is collision resistance.
That means that I have hash to a value and
it's hard for somebody to find any other value
that hashes to it. So I don't really care
about inverting it, it's just finding a collision.
And one of the homework or exercises is to
see who can find the--excuse me, who can find
the biggest collision. So it's hard to actually
find a full collision but you can find collisions
in the prefixes. And so we're going to have
like a little contest to see who can get the
largest one. If you get up to the entire thing,
that's a pretty huge breakthrough. But, you
know, I put one up there that's, like, 32-bits
out of 160 so that took, like, under a second.
And so, you know, we'll see who can actually
get the biggest one and then it's going to
go until next Wednesday. So basically the
noon before the next lecture, we can see who
got the biggest one. And so, yeah, I posted
that on the mailing list and you can--you
can check that out. Right. So--and the reason
why message digests are useful is that I can
take an arbitrary size message and hash it
down to this thing that's difficult to find
something else that it collides to. And then
instead of signing the whole message, I just
can sign the small hash function. And this
is nice because we can put together these
primitives and I can now sign arbitrary length
messages. And so I'm relying on both the signature
being hard to forge and the message digest
being hard to find collisions. So we go back
to key distribution and it's still a pretty
major problem. We have this issue of how do
I get your public key? And so, you know, I
put my public key up somewhere for you to
find and it shouldn't be hard to find but
how do you actually believe that I did this?
And when this MIT, we did this in the class.
This is the same assignment and what some
students did, they went up and put my public
key up for me, so they were getting the messages
themselves and, like, there's no way to actually
take the key down and, like, let people know
which is which. So it's a problem. And so
one way that you can think of this, now that
we've got these kind of primitives, you've
got public key signature, we have public key
encryption, we have message digests, a tool
that was actually invented by Lawrence Coneberger
who is at Google now is called a certificate.
And what a certificate is just a signature
on a public key. So I could put up my key
on a webpage and then somebody else can sign
it and attest to it. And if you go on your
browser, well, you can look at it in your
browser, there's a bunch of certificates and
these are signed by what are, you know, called
trusted authorities or roots. And they're
just people like a VeriSign and AT&T and all
these companies who--their business is basically
signing certificates, signing people's public
keys. And what you end up getting is like
a graph. So you've got these trusted roots
at the top and then they confer trust with
the signature and they sign other people's
keys. And those people can sign other people's
keys with their keys and you end up with these
chains of certificates. And so when I send
you a message that's from me and I say, "Hey,
here's my public key," what you can do is
go and look up and say, "Okay, well, his public
key is signed by Google," and Google's key
is also signed by them and I can get a bunch
of signatures on my key and that's kind of
an attestation that it's a valid key, that
is my key. And so some people like VeriSign,
you know, you actually have to go in and show
them an ID or something and it kind of depends
but, you know, that can give you a idea of,
you know, why you would trust somebody's key.
And so one kind of early--in the early '90s,
there was this software called PGP which is
now--there's a new one called GPG that you're
going to use for the first exercise, and its
model was a little bit different. And rather
than have, like, one trusted authority, it's
kind of like more of a social graph. It's
like web of trust so you would actually just
sign the keys of your friends and they would
sign yours and your friends would sign their
friends and you'd end up with a chain coming
from different directions. So if I wanted
to communicate with a stranger, we are likely
to have maybe some friend in common who signed
us both or he knows somebody and you basically
could find a path from your friend to another.
So it's kind of a nice analogy with all the
social networks now. It's like all those connections
could be, you know, public key, they can be
certificates. They could be signatures from
one person on somebody else's key saying,
"Yeah, this is so and so, and this is their
key." But, you know, there are still some
problems out there of, like, how do I actually
revoke a key? If I--if I get my laptop broken
into and my secret key stolen, how do I go
tell the world that my public key is no longer
valid? And, you know, one thing here that
you may hear about is called certificate revocation
list. These are basically just revocations
that are pushed out there. But basically PKI
is still a very hard problem and actually
getting the keys out there is very difficult.
So I--that's pretty much all the material
I have for today and, like I said, the rest
of the course next week is going to be more
kind of practical engineering, like, you know,
how to use crypto on your code. Third week
is going to be more theoretical and math side
and talking about security definitions. And
the last week will be, you know, whatever
topic people are most interested in and you
can vote on the QDB page. I'm going to go
look at the--see if anyone has put stuff on
QDB and... Could you repeat any questions
you have for the benefit of the audience in
the VC? Sorry about that. Yeah, pretty much
I'm going to probably answer these offline.
These look a little bit more geared for next
week. But if there are more questions from
the audience, feel free to ask right now.
Yes?
>> [INDISTINCT]
>> WEIS: Could I get my PGP fingerprint?
>> [INDISTINCT] so we can verify your fingerprint.
>> WEIS: I will send a...
>> [INDISTINCT] distribution or anything?
>> WEIS: I will send it out to the mailing
list.
>> [INDISTINCT] trust me.
>> WEIS: I don't have it available right now
on the screen but that would have been a good
idea. How about this? I'll put on the Wiki
page because I've got the--no? Okay. Don't
trust it. You can come to my desk and I'll
get it to you and I'll show you my ID. Yes?
>> It seemed that for message digest algorithms
that if you have--if you fail the one-wayness
test, you would possibly fail the collision
resistance as well?
>> WEIS: It's not necessarily the case actually
and this is kind of an exercise that you can
do is you can construct, you know, a hypothetical
one-way hash function that is not collision
resistant. Usually it's kind of something
stupid where you make it easy to collide on
purpose. And so these are two separate properties
and--yeah?
>> My [INDISTINCT] was that if you have something
that's--that fails the one-wayness test...
>> WEIS: Right.
>> ...you could always just generate a key
randomly and then generate another one that
collides with it if it fails the one-wayness
test, unless it only failed the one-way test
for various questions, a special table.
>> WEIS: I'm going to have to think about
that. I think that, I mean, these are separate
properties. There's actually three different
properties and one is called pre-image resistance,
second pre-image resistance and then the collision
resistance. Collision resistance is actually
two of the three, one-wayness is two of the
three. And I hear what you're saying but I
have to think about it offline. Yes?
>> [INDISTINCT] social way that why is it
so--this is from my experience, why is it
so hard to get people to understand the value
of these in crypto and e-mail [INDISTINCT]
and so forth? And why are they so [INDISTINCT]
>> WEIS: So I think this is kind of a lesson
from the first exercise is that actually using
this stuff is hard and, like, even setting
up that first message took me a while and
then the first person to respond, it took
them half an hour. I think the average MIT
grad student in the security course, it look,
like, 90 minutes to just, you know, decrypt
a message and then send us an encrypted message.
So I think--I don't know if I repeated the
question. So the question was why do people
not use crypto in practice and why is it hard
to actually get people to use this. I think
usability is a huge a part of it and, you
know, almost every client, e-mail client out
there like Outlook has S/MIME compatibility
and it has for years. So there's no reason
why people couldn't start using this today.
And I think it's kind of usability and just
actually having the infrastructure there.
It pretty much needs to be invisible for it
to be universal. Like, there's no way people
are going to take the extra strep to encrypt
their mail unless it's you know a very special
case.
>> So the two-time pad examples, is that base
27?
>> WEIS: Two-time pad examples are, yes, base
27 so it's the alphabet in space. Yes, and
a few--I can post some python code to do it
if anyone needs that. Are there any more questions?
All right. Well, thank you very much and I'll
see you next week for more practical use of
cryptography.
