Hashing is widely used in digital signatures and cryptocurrrencies.
Hashing function takes an arbitrary sized
input
and convert that input into a fixed size output. As shown in the figure, a hash function
takes an "arbitrary sized" input x and produces an output h which is of "fixed size".
However, this is not entirely true
As the second figure is showing famous SHA512
hash function it takes 1024 bits as input
and produces an output of 512 bits.
Thus, obviously this hash function does not have an arbitrary sized input.
Then why we say that hashing function takes arbitrary sized input?
It is because, usually when we talk about
hash functions then we are talking about hash
function in Merkle-Damgard construction.
The Merkle-Damgard construction looks something
like that.
We have an arbitrary sized input and we convert
that input to fixed sized chunks.
The size of chunks depends on compression
function.
If the last chunk is smaller than the desired
size then we add padding with that chunk,
making it the same size as all other chunks.
We start with an initial value.
The initial value and the first input goes
to a compression function, producing the first
hash h.
Now this newly created hash, and the next
input goes to the compression function, producing
a new hash.
The process continues till the last input
produces the last hash which becomes our fixed
sized output.
In this way, we can take any arbitrary sized
input and produce a fixed sized output using
a hashing (compression) function.
Another question can be asked that why we need to use hashing?
To answer this question, let me first clear
the board.
An example is like that, we have Alice (sender)
and Bob (receiver).
Alice wishes to send a email of 5 kilo bytes
to Bob.
She wishes that email must be first signed
using her signatures.
So, now assume that Alice is using RSA Digital
Signatures.
In RSA-DS, we cannot sign a message  which is greater
than RSA modulus.
In this case, assume that the modulus n=3072
bits.
We cannot sign a message greater than modulus 3072 bits.
In fact, we also need some space for RSA padding,
hence in reality we are able to sign smaller
data than modulus (3072) bits.
For this example, we assume there is no padding, for the sake of simplicity.
Alice wishes to sign her email (say x) but
she cannot sign whole x.
It is because, x is greater than modulus (3072
bits).
Therefore, Alice has to make different parts
of message x.
Precisely, in this example Alice has to make
14 different parts of her emails.
X1 to X14.
Now each of the 14 parts of her email has
to be signed separately.
Hence, we have (x1, S1), (x2, s2) ... (x14,
s14).
Now each signature along with corresponding
email part will be dispatch to Bob.
This 
approach has three key drawbacks.
Number one: Computational overhead
It is because Alice has to compute 14 different
signatures.
Thus, producing an overhead of 14 signatures.
Second drawback is bandwidth wastage.
It is because, Alice also has to send these
14 digital signatures to Bob via network,
wasting network bandwidth.
So basically has waste 14 times n bits of
bandwidth to send these 14 signatures.
The last drawback is the lack of security.
Because these messages are singed individually,
instead of signing the whole file, this presents
disruption opportunities for Oscar to manipulate
those messages.
For instance, Oscar can stop one of the 14
files from reaching Bob and rest of the files
signatures will still be valid.
To overcome these disadvantages, we use hashing.
The key idea is that instead of signing each file separately, Alice should sign all the 14 files
together using hash of all the files.
So basically Alice should compute common hash h=H(x1,
x2, ..., x14), all complete email.
Subsequently, she should sign that hash.
Producing signatures based on the hash of whole email.
The single signature and all the files are dispatched to Bob.
If any of the file is altered in this case
the the signatures will not be verifiable
by the Bob.
Using hash we have created only minimal overhead.
That is, instead of computing 14 signatures,
we only have to compute only a single signature.
Instead of wasting bandwidth with 14 signatures,
now we are only, sending a single signature
for the whole 5K email.
We do not have lack of security because if any one of the files is altered then the signatures
become invalid.
A good hash function must have two key properties.
1) preimage resistence (a.k.a onewayness) Preimage resistance says that producing a hash
from input should be easy and quick.
On the contrary, producing input from a hash
should be near impossible.
So you can say that
H(x) should be easy and fast.
Whereas, H^{-1}(h) should be near impossible.
The second property of hash functions is collision
resistance.
It says that, finding two different inputs
(say xj and xk) with same hash should be infeasible.
Collision is natural in hashing because we
are mapping very large inputs to smaller sized
outputs.
But collision resistance says that for Oscar
it should be infeasible to compute two different
inputs having same hash.
I will demonstrate the importance of these
two properties using some examples from digital
signatures.
Assume that you have hash function without one-wayness.
Alice wants to send some data to Bob with digital signatures.
Alice has encrypted the data but by mistake
signatures are done on unencrypted data.
Basically, she has first computed hash of
the data and then signed that hash.
Later on, she has encrypted her data and send
the encrypted data and signatures towards
Bob.
In this case, Oscar can first obtained hash
from the signatures and because the hash function
does not have one-wayness Oscar can retrieve
data from hash.
Specifically, Oscar will first obtain has
from signature using public key e
Subsequently, Oscar will use x=H^{-1}(h),
hashing inverse function, to get the data.
The inverse function can be easily computed
due to absence of one-wayness.
The lack of one-wayness renders hashing function
useless and enables getting data from any
available hash.
Now I will show an example to 
emphasize the second property of collision resistance.
In this example, we have hashing function
with no collision resistance.
Alice wishes to send some data to Bob.
So Alice first compute hash of the data and
then signs that hash.
In this example, we do not have any encryption.
So Alice send the signatures and data (in
plaintext) to Bob.
This is done many time in practice, when we
do not wish to hide our data.
Oscar will first get the hash from signature
using public key and RSA modulus.
Subsequently, Oscar will produce new data x_o which has the same hash as the hash of the original data.
Oscar will interrupt original message and
instead will send his one malicious message
x_o with valid signatures.
Hence. due to lack of collision resistance, these kinds of attacks can be realized.
So far we have only considered hashing from the perspective of digital signatures but
hashing is also widely using in cryptocurrencies.
This is the subject of next lecture.
So stay tuned.
