[Evans] The key point is that this was for weak collision resistance.
For weak collision resistance, our assumption,
an attacker needs to do about 2 to the b work where b is the number of hash bits
to have a good chance of finding a collision.
Strong collision resistance is harder than weak.
To obtain strong collision resistance is actually much harder,
and we'll see that we need about twice as many bits for that--
that the attacker's effort is more like 2 to the b over 2,
so we'll need about twice as many output bits in our hash function to provide this.
The reason for this is because of what's known as the birthday paradox.
It's called a paradox. It's not really a paradox.
It's a surprise to people who don't follow the mathematics on this,
but there's nothing paradoxical about it.
It's not like the Berry paradox that leads to a contradiction.
The way this is usually framed is you have some group of k people--
let's say it's a kindergarten classroom--
and you want to know what the probability that 2 people have the same birthday.
The reason this is called a paradox is the answer is actually much higher
than most people expect. So let's compute this.
We'll assume that there are no leap years and that birthdays are uniformly distributed.
We assume the same property for our outputs of our ideal hash function.
So we'll compute the complement probability that there are no duplicates,
and then the probability that there is at least 1 duplicate is 1 minus this.
And the way to think about that is we can go through the people in the class.
We can assign some birthday to each one.
So we'll assign b0 to the first one, b1 to the second one,
keep assigning birthdays.
There are lots of different ways we could do that.
In order to assign birthdays with no duplicates,
then there's a limited number of ways that we can pick any of 365 days
for the first birthday.
For the second one, if we want no duplicates,
we can't use whatever day we pick for the first one.
So it will be that times 364 times 363 and so forth,
and that's the number of ways to assign birthdays with no duplication.
We're trying to compute the probability, so we're going to divide that
by the number of ways to assign with duplication,
which is just 365 choices for each person.
So in general, this first value is n factorial divided by n minus k factorial,
and the bottom result is just n to the k.
N is the number of possible days or the number of possible hash outputs.
K is the number of trials.
And so our probability that there are duplicates is just 1 minus this.
This is what we need for strong collision resistance.
It's higher than what we needed for weak collision resistance,
which was 1 minus 1 minus 1 over n to the k.
So to actually compute this, these numbers would get really big
if we tried to actually compute the factorials.
Instead we need to use approximations to do this.
But to give you some idea what the results are like,
if we have the duplicate birthday question where we have 365 days and 20 people,
the probability that there's at least 1 duplicate exceeds 0.4.
If we're thinking about hash collisions, if we only had a 64-bit hash
and our attacker was much weaker than the one we hypothesized--
let's say they can only do 2 to the 32--that's already almost a 40% chance of success.
This success goes up quite quickly.
If the attacker can do 2 to the 34 hashes, then the success probability is very close to 1.
So the conclusion is that given an ideal hash function
with N possible outputs, an attacker needs about N guesses to find an input
that hashes to a particular value but only needs about the square root of N guesses
to find a pair that collide.
This assumes the attacker can store all those hash values as they try the guesses
and compare it to all the previous ones.
This is the reason why hash functions need to have large outputs.
SHA-1, which was a widely used hash function, used 160 bits in its output.
This was actually broken.
There is a way to find collisions using SHA-1 with only 2 to the 51 operations--
much fewer than the 2 to the 80 that one would expect from this square root estimate,
and that's because of mathematical weaknesses in the way SHA-1 works.
SHA-2, the output can be either 256 or 512 bits.
As long as there aren't mathematical weaknesses in the cipher,
if it was a really ideal hash function, this would be big enough to defeat any realistic attacker,
but there are beginning to be suggestions that there may be ways to break this.
No practical breaks have been found yet.
And SHA-3, the winner is expected to be announced later this year.
So for the rest of this class we're going to assume that we do have an ideal hash function
and that it has enough outputs to provide strong collision resistance
against a motivated attacker.
