The RSA crypto systems is one of
the lovely and really important
applications of number
theory in computer science.
So let's start talking about it.
The RSA crypto system
is what is known
as a public key cryptosystem,
which has the following really
amazing properties--
namely, anyone
can send a secret
encrypted message
to a designated receiver.
This is without there being
any prior contact using only
publicly available information.
Now, if you think
about that, it's
really terrific because
it means that you
can send a secret message to
Amazon that nobody but Amazon
can read even though
the entire world knows
what you know and can see
what you sent to Amazon.
And Amazon knows that
it's the only one can
decrypt the message you sent.
This in fact is hard to
believe if you think about it.
It sounds paradoxical.
How can secrecy be possible
using only public info?
And in fact, the existence of
this public key cryptosystem
has some genuinely paradoxical
consequence, which kind of
are a mind bender.
So let me tell you
about one of them.
I don't know if you've
heard of mental chess,
but it's a standard
thing in the chess world.
Chess masters are so talented
and have such deep insight
into the game that they
don't need a chessboard,
and they don't
need chess pieces.
They can just go for
walk on a country
lane talking to each
other and saying
pond to king 4 and
knight to bishop 3
and just talking chess code
and play an entire chess
game that way.
That's known as mental chess.
It's quite impressive.
In fact, the grand masters
can play multiple games
of mental chess
against opponents
who are staring
at the chessboard
and win the great
majority of the games.
Of course, these are not against
other grand masters, but still.
OK.
So now, this is what I propose.
How about playing mental poker?
If you know how to play
poker, we deal our cards
and we bet and so on.
And my only condition
is that I'll deal.
Now, that sounds like a
joke and an absurd thing
for you to agree to
do, but it's amazing.
It's actually possible.
One of the famous papers
of Rivest and Shamir
was how to play mental poker
using public key crypto.
So I once tried to persuade
an eminent MIT dean who's
a physicist
researcher about this,
and he just wouldn't believe it.
He argued that it was
just impossible logically.
And what he was
thinking about was
that if you know how to compute
a function, then of course
you can figure out
how to invert it.
That is to say if I know how
to compute some function f
of a number and let's
say that the function is
one arrow in-- that
is an injection-- then
if I know what f of n, there's
a unique n that it came from.
So how can I not
be able to find n?
And it's an insight of computer
science and complexity theory
that says it's quite possible.
It's not that you can't find
the n that produced f of n.
It's that the search for
it will be prohibitive.
There are, in short, one-way.
That is, functions that are
easy to compute in one direction
but hard to invert.
They're easy to compute
but hard to invert.
In particular, we're
thinking about multiplying
and factoring.
It's an observation that it's
easy to compute the product
of two large prime numbers.
We all know how to multiply.
And in fact, there are faster
ways to multiply than you know.
But the current state
of our knowledge
of number theory and
complexity theory
is that given a
number n that happens
to be the product
of two primes, it
seems to be hopelessly
hard in general to factor
n into the components p and q.
Now, this is an open problem.
It's similar to the p
equals np question--
that famous open problem.
It's actually a weaker--
it's quite possible
that you could factor, and
np would not equal to np.
But nevertheless, it's
the same kind of problem.
And more generally,
the existence
of one way functions is
closely related to that p
equals np question.
Nevertheless, even though
it's an open problem
and theoretically has not
been settled either way,
it's widely believed-- the
banks, the governments,
and the commercial world have
really bet the family jewels
on the difficulty of factoring
when they use the RSA protocol.
So I like to make the joke that
my most important contribution
to MIT was being involved in
the hiring of our S and A.
So this is A, Adi Shamir, R,
Ron Rivest, and A, Len Adleman
back in the late '70s when they
first came up with these ideas.
So let's look at the way this
RSA protocol actually works.
So here's what happens.
To begin with, you have to
make some information public
so that people can
communicate with you.
We're looking at
two players here.
There's a receiver who's going
to get encrypted messages,
and there's a
sender who is trying
to send an encrypted
message to the receiver.
So what the receiver
does before hand is
generates two primes, p and q.
Now, in practice, you want
these to be pretty big primes--
hundreds of digits.
And we'll examine it in
a moment, the question
of how you find them.
But the receiver's job is to
find two quite substantial
large primes, p and
q, chosen more or less
randomly because if you have any
kind of predictable procedure
for how you got them, that
would be a vulnerability.
But if you just
choose them at random,
then there's enough primes
in the hundreds of digits
that it's hopeless
that people would guess
which one you wound up with.
OK.
What do you do to begin with
is multiply p and q together,
which is easy to do.
Let's call that number n.
And now the other thing
the receiver is going to do
is find a number e
that's relatively
prime to this peculiar
number p minus 1, q minus 1.
Now as a hint, you might notice
that p minus 1, q minus 1
is in fact Euler's
function of n-- phi of n.
But for now, we don't
need to understand
that this is Euler's function.
It's just the recipe of
what the receiver has to do.
Find a number e that's
relatively prime to p minus 1,
q minus 1.
Again, you don't want
e to be too small,
and we'll discuss in a moment
how do you find such an e.
But the receiver's job
is to find such an e.
This pair of
numbers e and n will
be the public key which the
receiver publishes widely
where it can easily
be found by anyone
who cares to look for it.
Basically there's
a phone directory
where if you want to know how to
send somebody a secret message,
you look them up, and you find
the receivers name in there.
And then you see
his public e and n,
and that's what you use
to send him a message.
Now, how do you use it
to send him a message?
Well, I'll explain
that in a minute,
but let's look at one more
thing that the receiver needs
to do to set himself up.
The receiver is going to find
an inverse of this number e that
he's published-- the part of
his public -- modulo p minus 1,
q minus 1.
That is, this e since it's
relatively prime to p minus 1,
q minus 1, it will have
an inverse in Z star p
minus 1, q minus 1.
Let's let that inverse be d.
And of course, we know
how to find d because you
can do that with a Pulverizer.
D is the private key.
That's this crucial
piece of information
that the receiver has and
that the receiver is not
going to tell anybody.
Only the receiver knows
that because the receiver
chose the p and the q and
the e more or less randomly--
maybe even as randomly
as they can manage--
and then they find the d.
And that's their secret.
OK.
That's what the receiver does.
How does the sender
send a message?
Well, to send a message,
what the sender wants to do
is choose a message that is
in fact a number in the range
from 1 to n where-- we're
thinking again, of n,
if it's a product of two
primes of a couple of hundred
digits each, then the
product is around 400 digits.
And so you can
pick any message m
that can be represented
by a 400 digit number.
Now, there's a lot
of messages that
will fit within 400 digits.
And of course, if
it's bigger, you just
break it up into
400 digit pieces.
So that's the kind of
message you're going to send.
So the message is
going to be a number
in this range from 1 to n.
And what the sender is going to
do is look up the public key e
and the other part
of the public key
n and raise the secret
message to the power e in Z n.
So we're going to
compute m to the e in Zn
and send that encoded
message m hat.
So m hat is what we think
of as the encrypted version
of the message m.
So then we have the
problem if that's
what the sender sends
to the receiver,
how does the receiver
decode the m hat,
and the answer is the
receiver just computes
m hat to the power d--
the secret key-- also
in the ring Zn.
And the claim is that in
fact, that's equal to m.
Now, you can check
in class problem,
and it's easy to see
that the reason why
that method of decrypting
works is precisely
an application of Euler's
theorem-- at least
when m happens to be
relatively prime to n.
Now, the odds of
finding an m that's
not relatively prime
to n are basically
negligible because if
you'd find such an m,
it would enable
you to factor them.
And we believe
factoring is very hard.
But in fact, it actually
works for all m, which
is a nice theoretical results.
And you'll work this
out in class problem.
OK.
That's how it works.
The receiver publishers e
and n, keeps a secret key d.
The sender exponentiates
the message to the power e.
The receiver simply
decodes by raising
the received
message to the power
d and reads off what
the original was.
OK.
So we need to think about the
feasibility of all of this
because we believe that
it's impossible to decrypt,
but there's a lot of other stuff
going on there that the players
have to be able perform.
And let's examine what their
responsibilities and abilities
have to be.
So the receiver
to begin with has
to be able to find large primes.
And how on earth
do they do that?
Well, without going
into too much detail,
we can make the remark that
there are lots of primes.
That is to say by appealing
to the prime number theorem,
we know that among the n digit
numbers, about log n of them
are going to be
primes so that you
don't have to go
too long before you
stumble upon a random prime.
That is, if you're
dealing with a 200 digit n
and you're searching for a
prime of around that size,
you're not going
to have to search
more than a few hundred
numbers before you're
likely to stumble on a prime.
And of course, how do you know
that you stumbled on a prime?
Well, you need to be able to
check whether a number is prime
or not-- and efficientlY--
in order for this whole thing
to be feasible.
So we'll have to
discuss that brieflY--
how do you test whether
or not a number is
prime in an efficient way?
The other thing the
receiver has to do
is find an e that's relatively
prime to p minus 1, q minus 1.
But that's easy.
Well, it's easy
because first of all,
if you just kind of randomly
guess a medium sized e
and then search consecutively
from some random number you've
chosen somewhere in the
middle of the interval
up to p minus 1, q minus 1.
Again, you're very likely to
find in a few steps a number
e that is relatively prime
to p minus 1, q minus 1.
How do you recognize that
it's relatively prime?
Well, you just
compute the GCD, which
we know how to do using
Euclid's algorithm.
So that's really
quite efficient.
Recognizing that it's
relatively prime is easy,
you just don't have to
search very many numbers
until you stumble on an e.
OK.
The other thing you have to do
is find the d that an e inverse
modulo p minus 1, q minus 1.
And again, that is the
extended Euclidean algorithm,
the extended GCD,
namely the Pulverizer.
So those are the pieces
that the receiver has to do.
Now, let's look at
this a little bit more
and think about the
information about the prime.
So the famous theorem
about the primes
is their density, which
is if you let a pi of n
be the number of primes
less than or equal to n,
then it's a deep
theorem of number theory
that pi event
actually approaches
a limit in an asymptotic sense--
which we'll discuss in more
detail-- that pi of n
as n grows gets to be
very close to n over log n.
That's the natural log of n.
Now, that's a deep theorem.
But in fact, if we want a
self-contained treatment
for our purposes,
there's an exercise
that will be in
the text where we
can derive Chebyshev's bound,
which is weaker than they tight
prime number theorem.
But Chebyshev's
bound, which can be
proved by more elementary means
that's within our own ability
at this point with the
number theory we have--
to be able to show
that n over 4 log n
is a lower bound on pi of n.
So basically that
says that if you're
dealing with numbers
of size n, which
means they're of length
log n a few hundred digits,
then you only have
to search maybe 1,000
digits before your very
likely to stumble on a prime.
And if you search 2,000 digits,
it becomes extremely likely
that you'll stumble on a prime.
So the primes are
dense enough that we
can afford to look
for them, providing
we can have a reasonably
fast way to recognize
when a number is prime.
Well, one simple
way that it almost
is perfect-- but
works pragmatically
pretty well-- is
called the Fermat test.
But let me just reemphasize
this -- I got ahead of myself--
that if I'm dealing
with 200 digit numbers,
then about one in 1,000 is
prime using just the weaker
Chebyshev's bound.
And that says that I
don't have to search
too long-- only a
few thousand numbers
to be able to find a prime.
And a few thousand
numbers is well
within the ability of a
computer to carry out,
providing that the test for
recognizing that a number is
prime isn't too time consuming.
So one naive way that
the really almost
works to be a reliable
primality test
is to check whether
Fermat's theorem is obeyed.
Fermat's theorem-- the special
case of Euler's theorem--
says that if n is
prime, then if I
compute a number a
to the n minus 1,
it's going to equal 1 in Z n.
And that's going to
be the case for all a
that are not 0 if n is prime.
Now that means that if
this equality fails in Z n,
then I immediately
know a is not prime.
Go on.
Search for another one.
OK.
So suppose I'm
unlucky-- or lucky--
and I choose an a to
test and it turns out
that a to the n minus 1 is 1,
does that mean that n is prime?
Unfortunately not.
It might be that I
just hit an n that
happened to satisfy Fermat's
equation even though n was not
prime.
But it's not a very
hard thing to prove
that if n is not prime, then
half of the numbers from 1 to n
are not going to
pass the Fermat test.
So if half of the
numbers are not
going to pass the Fermat
test, then what I can do
is just choose a random
nonzero number in the interval
from 1 to n, raise it to
the n minus first power,
and see what happens.
And if n is not
prime, the probability
that this random numbers that
I've chosen fails this test
is at least a half.
So I try it 50 times.
And if in fact 50 randomly
chosen a's in the interval 1
to n all satisfy
Fermat's theorem,
then there's one chance in 2 to
the 50th that n is not prime.
That's a great bet.
Leap for it.
So that basically is the idea of
a probabilistic primarily test.
Now, there's a
small complication
which is that there are
certain numbers n where
this property that half the
numbers will fail to satisfy
Fermat's theorem doesn't hold.
They're known as the
Carmichael numbers,
and they're known
to be pretty sparse.
So that really if
you're choosing an n
at random, which is kind of
what we're doing when we choose
random primes p and
q, the likelihood
that you'll stumble
on a Carmichael number
is another thing that you just
don't have to worry about.
So really, the
Fermat primality test
is a plausible
pragmatic test that you
could use to pretty reliably
detect whether or not
a number was prime-- what
was the last component
of the powers that we
needed the receiver to have.
OK.
So now we come to
the question of why
do we believe that the
RSA protocol is secure?
And the first thing to notice
is that if you could factor n,
then it's easy to break.
Because if you can factor n,
then you have the p and the q.
And that means you know what
p minus 1 times q minus 1 is.
And therefore you can
use the Pulverizer
in exactly the same
way the receiver did
to find the inverse
of the public key e.
You could find d easily.
So surely if you can
factor, then RSA breaks.
No question about that.
What about the converse?
Well, what you can approve--
and there's an argument that's
sketched in class problem,
not fully, in the book--
is that if I could find the
private key d, then in fact,
I can also factor n.
So if I believe that
factoring is hard,
then in fact finding the
secret key is also hard.
And we could try to be confident
that our secret key is not
going to be found
even given the public.
Now, unfortunately this
is not the strongest kind
of security guaranteed
you'd like because there's
a logical possibility
that you might
be able to decrypt messages
without knowing the secret key.
Maybe there's some other
walk around whereby
you can decrypt the secret
message m hat by a method other
than raising it
to the dth power.
And what you'd really like
is a theorem of security
that said that
breaking RSA-- reading
RSA messages by any
means whatsoever--
would be as hard as factoring.
That's not known for RSA.
It's an open problem.
And so RSA doesn't have the
theoretically most desirable
security assurance, but
we really believe in it.
And the reason we
really believe in it
is that for 100 or more years,
mathematicians and number
theorists have been trying to
find efficient ways to factor.
And more pragmatically, the most
sophisticated cryptographers
and decoders in the world using
the most powerful networks
of supercomputers have been
attacking RSA for 35 years
and have yet to crack it.
Now, the truth is that in
the course of the 35 years,
various kinds of
glitches were found
that required some added
rules about how you found
the p and the q and
how you found the e,
but they were easily
identified and fixed.
And RSA really is a robust
public key encryption
method that has withstood
attack for all these years.
That's why we believe in it.
