So, in today's class we shall discuss about
implementation of elliptic curve cryptography.
So, as we have seen in the previous classes
is that, elliptic curves and elliptic curve
cryptography essentially relies upon point
operations, like point addition and point
doubling operations. So, they are all actually,
quite competition or intensive operations.
Therefore, if I want to develop either a software
or a hardware, we need to take some more,
or rather, there are some interesting developments,
how you can actually implement it in much
more efficiently. So, today's discussion will
be based on this.
So, we shall discuss about scalar multiplications
and we shall discuss about, whether LSB first
or MSB first, is approach, I mean, we will
try to compare between these two approaches.
Then, we will discuss about Montgomery technique
of scalar multiplication and develop, discuss
about fast scalar multiplications without
pre-computations; like, before this, there
were some people, I mean, there were some
implementations, which were actually based
on pre-computed values of... Suppose, I want
to compute lambda p. So, they were like, some
pre-computed values were already stored and
then, they were combined to compute the value
of lambda p. This is one approach, could be
like, if I want to compute lambda p, then,
you take the binary encoding of lambda and
you keep certain values of say, may be 2 lambda,
3 or 4 lambda, 8 lambda stored and then, you
combine them in order to obtain the value
of lambda p.
So, they are actually, like scalar multiplications
using pre-computed tables, but there may be
applications, wherein, you may not be interested
in spending so much amount of memory. And
therefore, they were quite nice, I mean the
methodized paper by Lopez and Dahab, to show
how you can actually do fast scalar multiplication
without pre-computation. So, we will study
that. And we will study about Lopez and Dahab
Projective transformation to reduce the number
of inversions or finite free inversions, which
are necessary; discuss about, little bit about
mixed coordinates; not go much into details
and rather, concentrate about some parallelization
techniques, which can actually accelerate
the scalar multiplication operation.
So, let us go step by step. Therefore, this
is the, how the broad diagram of elliptic
curve hierarchy looks like. So, you can see
that, the basic elliptic curve operation is
based upon point multiplication. I called
it kP, that is, k is a scalar and P is a base
point. And, the other one is and it is based
upon group operations, which are point addition
and point doubling. And, that is again, underlying,
based upon the finite field operations like
addition, subtraction and inversion and so
on. Now, you see that, there are... That means,
that, there are various levels of how this
elliptic curve cryptography operation works.
So, therefore, if I want to obtain accelerators,
then, I will try to parallelize this architectures,
right. So, I will try to parallelize the point
multiplication; I will try to parallelize
the group operations; I will try to parallelize
this underlying arithmetic operations. So,
therefore, for, what is important is, to understand,
where are the scopes of these parallelization's
or how we can actually obtain these accelerations?
So, first, let us concentrate upon how to
do this scalar multiplication operation. So,
there is a point P and we are interested in
computing k multiplied by P. So, k is my scalar,
which can actually be written as a, can be
encoded in a binary format; call it k 0, k
1 and till k m minus 1. So, k m is actually,
the first time, when 1 is encountered; previous
to that, everything was 0. Do you understand
this? Therefore, if I want to encode 7, it
will be 1 1 1; you want to encode as, encode
5, that it is, k is 5, then, it is 1 0 1;
previous to that, it is all 0s.
So, therefore, very simple way is, what is
called as the double and add algorithm. So,
what we do is, that, we take this P and stored
in a temporary register, call it Q and for
i equal to m minus 2 to m minus 0, that, it
is the first one; therefore, we take, start
from m minus 2 and go till 0, and whenever
we see that, we always do a doubling operation
and whenever there is a 1, then we do a Q
equal to Q plus P, that is the point addition
operation.
So, this, in this case, you are actually parsing
the scalar from MSB to the LSB. This is the
least significant bit and this is the most
significant bit. So, we are going from MSB
to LSB; therefore, this algorithm is called
MSB first. So, in this case, what is the total
number of computations necessary? You see
that, it requires, if there are, I mean, roughly
if you just see that, it is m; that is, the
length is m; it requires roughly m point doublings;
because, you are always doing a doubling operation.
It is actually for m minus 2 to 0, there it
is m minus 1, but approximately I have written
it, just throwing away the constant term;
that is, roughly it requires, proportional
to m point doublings. What about the number
of additions required?
So, generally, if you assume a random k, that
k will have half number of 1s and half number
of 0s, if k is thought to be a, if I consider
the average complexity of these algorithm.
So, in that case, there will be m minus 1
by 2 point additions, which are necessary.
Therefore, I need to do for very number of
point additions, that is half number of point
additions; that is a rough cost of this algorithm.
So, therefore, you can consider an example.
If I want to compute 7 into P, I will encode
7 as 1 1 1. So, you note that, this is the
first time when you have got a 1. So, when
you are encoding for MSB, then, this is the
first 1 that you are actually considering.
If it is a 1, then, you have to do not only
a 2 into P, but also add that with the corresponding
P; that is, you have to do a doubling, as
well as, you have to do an addition operation.
And next one is again a 1. So, it means, that
means, again you are doing a doubling and
adding with P, that is 2 into 2P, you see
that 2P plus P is 3P into 2 is 6P plus P is
7P.
So, you see that two iterations are required;
the first one is double and then add, that
is the principle. So, first you are doing
a doubling and then, you are doing an addition
or accumulation operation. So, similarly,
if you want to compute 6P, it is 1 into 1
into 0. So, in this case you have to do, because
of this 1, you have to do into 2 into P plus
P, but since this is 0, you have to do only
a doubling operation; you do not need to do
the addition operation. So, in this case,
you see, it is 3P into 2, which is 6P, right.
So, that is the LSB first approach of obtaining
the scalar multiplication.
Similar to this, you can also actually do
an LSB first algorithm also; that is, where
do you go from k 0 and go to the MSB. So,
in this case, again, you see that k m is 1
and these are the corresponding terms; corresponding
binary values. Now, I want to compute again,
Q equal to k into P. Now, for this, what we
do is that, we actually, instead of having
one register, we actually choose two registers;
call it, call one of them as Q and the other
one as R. So, what we do is that, we initialize
Q to 0 and R to the value P. Now, for i equal
to 0 to m minus 1, if k i is 1, then we add
onto this register Q; that is, Q is equal
to Q plus R and in the other register, we
are actually doing a doubling operation.
So, here, you see that, as opposed to the
pervious algorithm, previous algorithm you
were doing a doubling and then, you were doing
an addition operation; that means, there are
two steps, which you cannot actually parallelize;
they are sequential steps. But in this case,
because you are doing this operations on two
different registers, and actually, you are
actually giving more space, you are actually
saving time. So, we can actually, parallelize
these two operations and you can do the accumulation
and the doubling in two different registers
simultaneously, right. You can do this Q equal
to Q plus R and in the other transistor, you
can do this R equal to 2 into R operation;
and this, you can repeat for all these values.
So, therefore, the accumulation and doubling
can be stored in separate register. On an
average there are m by 2 point additions and
m by 2 point doublings which are measured.
Why so? Because, you are doing the addition
and doubling in the in two different, two
separate registers. Therefore, always there
is a cost of m by 2. Therefore, you see that,
you are, whenever there is an addition required,
so, if we assume that, half, this k has got
half one terms, then, half of the times you
will have, that is m by 2 times; you need
an addition operation, that is m by 2 addition.
And when you do not need a addition, you need
a doubling operation, right. Therefore, the
total time is m by 2 point additions plus
m by 2 point doubling operations. Now, why
it is not m point doubling operations? Because
the addition, generally, will take more time
than a doubling operation. So, if you, even
if you parallelize, you can assume that, the
addition will actually take more time. So,
you have to wait, since there are two parallel
steps, you have to wait for the longest time,
right. So, that is m by 2 point additions
and then, you have to this rest of m by 2
point doubling operations. So, therefore,
you see that, there is some advantage; because,
you can actually parallelize these algorithm
through this. Therefore, this is the LSB first
approach of doing it.
So, therefore, now you see that, this is also,
can be seen, this the same 7P and 6P example.
So, here it is 1 1 1; Q is equal to 0 and
R is equal to P. Now, you see that in three
time steps you can actually do this Q plus
R and in parallel, you can do this R equal
to 2 into R operation. So, because this is
a 1, now, you are doing LSB first. So, you
are first starting with this one; this is
a 1. So, you are doing an addition, you are
doing a doubling. Next time, you are again
getting a 1. So, you are again doing an addition,
again doing a doubling operation. Next time,
you are again seeing a 1; you are doing an
addition, you are doing an doubling operation.
So, that is the result, the register is in,
the result is in Q.
Similarly, when you are doing a 1 1 0, the
first thing, is 0. Therefore, 0 means, you
are not doing the addition operation. So,
Q is unaltered, unaffected; R is doubled.
So, R becomes 2 into R. Next, you get a 1.
Therefore, you add 0 with 2P, that is, you
add that current value of P and the current
value of R, and it becomes 2P. You double
R, it becomes 4 P. Next time, you add 2P with
4 P, because it is a 1 and you double that;
this is in consequential because of final
result is stored here, 6P. So, therefore,
you see that, you can still do the computations
7P and 6P. But in this case, you are actually
parsing from the right and going to the left,
right. So, you see that, there are small implications,
where you are doing a, from the left to the
right or from the right to the left, which
you can use to your advantage, depending upon
the platform you are implementing, your constants.
So, therefore, one example here shows that,
if I want to compute 31 P and this is an MSB
first approach, and this is the LSB first
approach. You see that, there is a significant
reduction in terms of the time steps that
are required to do this operation. But the
cost that you pay is, two registers. That
is the cost which you are paying. Therefore,
it in, we can say that, that has more scope
of parallelizing, when you are doing, taking
an LSB first approach.
So, now, let us actually study that, I mean,
a small point about the scalar multiplication.
Now, let us go into the implementations of
the elliptic curve point addition and the
point doubling operations. So, this is the
recapitulation of the equations that we got
in the first class. So, this is the addition
of P and Q, when P and Q are not the same.
So, this is the equation of adding x 1, y
1 and x 2, y 2 and if the points P and Q are
same, then, this is the corresponding doubling
point. Similarly, the corresponding y coordinates
are shown here. Now, the question is, how
do we do this operation? So, where do you
see that point addition? And another point
which is to be noted here, that is, if P is
equal to x 1 comma y 1, then, minus P is shown
as x 1 comma x 1 plus y 1.
So, how do you get this minus P? If you take
this curve, you take any point x 1 comma y
1 and draw a vertical line through this x
1 comma y 1. So, wherever it intersects the
elliptic curve, that is the negative form.
So, you can take this and you can solve this,
I mean, take a vertical point and say, call
it y, I mean, it is a vertical point, I mean,
vertical line, and that we have to basically,
simultaneously solve with these equations
and then you can easily show that, the corresponding
minus point is minus P is equal to x 1 comma
x 1 plus y 1. So, I am not going to the derivation,
which you can do yourself; you can take it
as an exercise.
So, now, I want to add these two points are,
I mean P and Q and store the result in the
register R. So, you note that, point addition
and doubling, each requires one inversion
and two multiplication operation. So, if you
observe your addition and doubling here, you
see that, you need do this 1 by x 1 plus x
2. So, that, that means, that is one inversion
operation and that you need to multiply with
y 1 plus y 2; in this case, if I just consider
the addition operation; in this case, we need
to multiply with, also with x 1 plus x 3;
that is one more multiplication. So, note
that multiplications with constants and multiplication
and squarings are neglected; because, that
I believe that, you can, that we believe that,
you, we can also do it without multipliers.
So, the other thing is, if you see that corresponding
doubling equation, there also you need to
do apply this 1 by x 1; that is, one inversion
is necessary. Similarly, is the same 1 by
x 1 can be used here also; the other thing
is, multiplication with this x 3. So, that
is, the corresponding three, that is you required;
you also required to multiply this with this
y 1. So, that is another multiplication. So,
which means, you need two multiplications
and one inversion, for both point addition
and for both point doubling operations. So,
we neglect the costs of squaring and addition,
and we also neglect the cost of multiplying
with constants; like, here you have multiplied
with b, that is neglected. So, if b is already
a pre known value, or fixed value for that
curve, then, you can actually develop an architecture
which is devoid of multiplication.
Now, the first thing, which was interesting
behind what Montgomery noticed is that, the
x coordinate of 2P does not depend on the
y coordinate of P. So, if you note that, this
P equal to Q point, when, I mean, when you
are doing a doubling operation, then the x
coordinate of these output, actually does
not depend upon the y coordinate. So, this
was the important observation; that is, you
see that, the x coordinate of 2P, you see
that, does not depend upon the corresponding
y coordinate; that is it does not depend upon
y 1; it is only a function of x 1.
So, based upon this, Montgomery developed,
and this was a very important observation,
developed a faster method to perform the scalar
multiplication. So, this is actually based
upon an invariant property, where you say
that, there are two points P 2 and P 1 and
I want to compute k into P. So, I choose,
I rather, I generate P 2 and P 1, at each
step or at each iteration, such that, the
difference of P 2 and P 1 is always maintained
to be P. So, I want to compute this kP. So,
I, first of all, encode this in this fashion
and I set P 1 as big P and this P and I set
P 2 as 2P and so, you see that, P 2 minus
P 1 is what? is P. And therefore, what we
do is, when we vary from i from l minus 2
to 0. So, what is this? This is the MSB first
approach.
Then, if k i is equal to 1, then, what you
are doing is that, you are, in P 1, you are
adding; that is, you are doing P 1 plus P
2; and in P 2, you are doubling; that is,
you are doing 2 into P 2. But if this k i
is 0, then, you are actually adding in the
P 2 register, but while you are doubling in
the P 1 register. So, you see that, whether
k i is 1 or whether k i is 0, P 2 minus P
1 is always invariant. So, you see that, P
2 minus P 1 here is 2P 2 minus P 1 plus P
2, which is what? Which is P 2 minus P 1.
Similarly, here P 2 minus P 1 is also P 2
plus P 1 minus 2P 1, which is again P 2 minus
1. Therefore, P 2 minus P 1 is actually an
invariant for this algorithm. You see that?
So, now, we are actually doing this addition
and doubling in this fashion and therefore,
the idea is that, P 1 actually finally, stores
the value of k into P. So, what are the implications
of this algorithm? That the, how essentially
or what is the relation, I mean, what is the
efficiency factor, I mean, how to implement
these operations efficiently?
So, first of all, let us see that, indeed
it computes what we want; that is, the correctness
of the algorithm. So, I want to compute 7P.
Therefore, first of all, I choose P 1 as P
and P 2 as 2P, as we have seen previously.
And the steps are, because first we start
with this one. Therefore, you see that, this
is 1. So, in P 1, what are you doing, you
are adding; you are adding P with 2P; you
are getting 3P. What about this P 2? P 2,
we are doubling. Next time, you are again
getting a 1. So, again in P 1, you are adding
3P plus 4 P is 7P and P 2, you are doubling;
that is inconsequential in this case. In this
case, it is 1 1 0. Therefore, in 1, again
you are doing 3P and 4 P like previously,
but next time, it is 0. So, you are actually
adding in P 2. Therefore, P 2 is 3P plus 4
P which is 7P, where as P 1, while P 1 is
double of P 1, that is 6P.
So, again you are returning P 1 as the result.
So, you see that, the correctness of the algorithm
is still maintained, right. Therefore, indeed
P 1 stores the value of k multiplied by P.
Now, what is the, I mean, rather, why essentially
is this efficient? That is the important question,
right. So, rather, why essentially do we do
in this fashion?
So, therefore, that brings us to this point
topic of fast multiplication on elliptic curves
without pre-computation; that is, I am not
having any pre-computed values; then, still
I am accelerating the computation of lambda
P or k into P.
So, for that, there are certain results which
are important, which I will try to discuss
here. That is, suppose... So, for that, first
of all, let us observe the corresponding sums
here.
So, the sums of, you know that, x three is
equal to...So, I am just writing down the
results. So, I am adding x 1 comma y 1 and
if I add x 2 comma y 2, the corresponding
sum is stored as y 1 plus y 2 divided by x
1 plus x 2 whole squared plus y 1 plus y 2
divided by x 1 plus x 2, plus x 1 plus x 2
plus a, right; that is, the sums, I mean,
that is the corresponding sum, x axis; x coordinate;
and the corresponding y coordinate is as shown
here as y 1 plus y 2 divided by x 1 plus x
2 into x 1 plus x three plus x 3Plus y 1.
So, at this point, let us just concentrate
on this term; that is, the x coordinates of
the result. So, I want to add this point x
1 and this point x 2, y 2. So, note, at this
point, let us make a kind of restriction also
is that; let us only be concerned about elliptic
curves, which are all points in characteristic
2; that is the elements are chosen from g
f 2 to the power of m, for any m, for some
m.
So, now, if I want to obtain the corresponding
sum of this x 1 and y 1 or rather x 1 y 1
and x 2 y 2, then, I can actually simplify
this and I can write it as, x 1 plus x 2 whole
squared and the numerator will be as, y 1
squared plus y 2 squared, why, because 2 y
1 y 2 is, in 0, right here. So, then you have
got here, x 1 plus x 2 into y 1 plus y 2.
So, that becomes here x 1 y 1 plus x 1 y 2
plus x 2 y 1 plus y 1 y 2 right, plus x 1
plus x 2 whole squared right, rather x 1 plus
x 2 whole cubed. So, it is x 1 cubed plus
x 2 cubed plus x 1 squared x 2 plus x 2 squared
x 1, right, x 2 squared x 1, this is correct.
So, x 1 cubed plus, we are multiplying this
with this, right. So, x 1 cubed plus x 2 cubed
plus x 1 squared x 2 plus x 2 squared x 1
plus a multiplied with x 1 squared plus a
multiplied with x 2 squared, right. And you
note that, x 1 y 1 is a point on the elliptic
curve. So, you choose, you know that, if this
is your equation, then, you know that, since
x 1 comma y 1 and x 2 comma y 2 are points
on the elliptic curve, you know that, y 1
squared plus x 1 y 1 is equal to x 1 cubed
plus a x 1 squared plus b.
Similarly, your y 2 squared plus x 2 y 2 is
equal to x 2 cubed plus ax 2 squared plus
b, right. So, that means, if you observe now,
these term, then, your numerator becomes,
we can actually combine, and you will see
that, you have got like x 1 y 2 plus x 2 y
1, that is this term and this term, plus you
can actually write, y 1 squared plus x 1 y
1 plus x 1 cubed plus ax 1 squared plus b
plus, I am adding a b there, and y 2 cubed
y 2 squared plus x 2 y 2 plus x 2 cubed plus
ax 2 squared plus b. So, if you do this, then,
you note that, there are some more terms.
So, there, this is that, x 1 squared x 2 plus
x 2 squared x 1. So, this full thing is there
in the numerator, right.
Is it so? That means, this term and this term
will go to 0, because they are points on the
curve and being in characteristic two field,
these two terms go to 0. So, you have got
x 1 plus x 2 whole squared in the denominator
and in the numerator you have got x 1 y 2
plus x 2 y 1 plus x 1 squared x 2 plus x 2
squared x 1, is it correct. So, that is your
corresponding sum of x 1 y 1 and x 2 comma
y 2, right. So, that is the first result here,
which says that, here, if you take x 1 comma
y 1 and if you take x 2 comma y 2, and these
are points on elliptic curve, then x coordinate
of P 1 plus P 2, x 3 can be computed as x
1 y 2 plus x 2 y 1 plus x 1 squared x 2 plus
x 2 squared x 1 divided by x 1 plus x 2 whole
squared.
So, therefore, you remember that the field
has got a characteristic two and that P 1
and P 2 are points on the curve. So, these
are the two things that we have used in this
result.
So, therefore, that from this, that if I,
if in this Montgomery's algorithm, that we
know that P was maintained to be equal to
P 2 minus P 1; that was an invariant for the
Montgomery's algorithm, right. Therefore,
P 2 was essentially a point like x 2 and y
2 and what was P 1? P 1 was equal to x 1 comma
x 1 plus y 1, I mean, minus of P 1 is equal
to x 1 comma x 1 plus y 1. So, therefore,
the sum of this these two, that are of x 2
y 2 and x 1 comma x 1 plus y 1 is nothing,,
but x comma y. So, this is the sum of these
two points, right. So, therefore, we know
that from here, your x, if you say that, this
x is equal to, I mean, you know that, this
x, you can actually write this x as x 1 plus
x 2 whole squared into x 1 y 2 plus x 2. So,
you have, you can actually use this y co-ordinate
as x 1 plus y 1 plus x 1 x 2 squared plus
x 2 x 1 squared, right. Basically, I am adding
up these two things and using the previous
result.
So, similarly, you can add up these terms.
So, and if you want the sum of P 1 and P 2...
So, you call it P 1 plus P 2 and call it as
x 3 comma y 3, then your x 3 plus what we
have previously got... So, I am again writing
that, x 1 plus x 2 whole squared is x 1 y
2 plus x 2 y 1 plus x 1 squared x 2 plus x
2 squared x 1; this was what, x 3. So, you
can add these two equations, where you will
get like, x plus x 3 is equal to x 1 plus
x 2 whole squared and in the numerator, you
will see that, x 1 y 2 x 2 y 1 x 1 squared
x 2 and x 2 squared x 1 all of them will cancel.
So, what you will be remaining with, is only
x 1 x 2.
So, therefore, you can actually write x 3,
as you can rearrange these terms and you can
express x 3 as x plus x 1 plus x 2 whole squared
into x 1 x 2. Now, you see that, if you want
to do this operation, then, how many you need
to multiply this, right. Therefore, the other
way of writing this, in an equivalent fashion
like, x plus x 1 plus x 2 whole squared x
1 squared plus x 2 divided by x 1 plus x 2.
So, this is simple. You see that, if I can
take x 1 plus x 2 whole squared, then x 1
squared will stay, x 2 will be multiplied
with x 1 plus x 2, so, x 2 squared will be
there and the other term will be x 1 into
x 2, right. So, you see this, sorry, this
is 1, this is x 1, right.
So, if in this case, this x 1 squared and
this x 1 squared will be get cancelled, right.
Now, what is the advantage of doing this in
this fashion? So, how many inversions do you
need to do? You need to do one inversion,
because you need compute 1 by x 1 plus x 2.
And, how many multiplications you need to
do? You need to multiply with only x 1. You
need to, I want to compute the x 1, sorry...
I, basically, do a 1 by x 1 plus x 2; that
is 1 inversion; I multiply it with x 1. That
is how many multiplications? One multiplication.
And then, I need to do a squaring. So, how
many multiplications I did? One multiplication.
But here, I have to multiply with x 2, I have
to multiply with x 1. So, I needed two multiplication
operation. So, you see that, it is quite interesting;
the same thing, but if you just rewrite and
expand little bit, you see that, you are basically
saving the number of multiplication operations
that you need to do.
And therefore, that is your result 2; that
is, if your P is equal to P 2 minus P 1, then
the x co-ordinate of P 1 plus P 2, x 3 can
be computed in terms of the x co-ordinate
as this. So, when you see that, here, whenever
you are computing this x 3, that is, whenever
you are computing P 1 plus P 2, you are not
actually bothered about the y co-ordinate;
you see that, right; that means, you can do
the operation entirely devoid of y co-ordinate.
So, previously, we had already noticed, Montgomery
noticed, that the doubling was actually devoid
of the y co-ordinate; but then, you, it was
also a used, I mean, the thing which is used
here is that, if you maintain this invariant,
that is, if P is equal to P 2 minus P 1, then
the sum of P 1 plus P 2 is also made devoid
of the y co-ordinate; that means, you are
not actually operating on 2 y 2 co-ordinates;
we are actually operating on only the x co-ordinate.
So, that means, you need not care about the
y co-ordinate, if you are maintaining this.
And that, actually gives you an amount of
efficiency.
Then, you also need to obtain a y co-ordinate
finally, right. So, for that, this result
is used, that is, if P is equal to x comma
y, and your P 1 is x 1 comma y 1 and P 2 is
x 2 comma y 2, the elliptic points and assume
that, P 2 minus P 1 is again equal to P, and
x is not 0, then the y co-ordinate of P 1
can be expressed in terms of P and the x co-ordinates
of P 1 and P 2 as follows.
So, here, you are actually writing this in
a different way; that is, you are writing
this as, you know that, your invariant is
P equal to P 2 minus P 1. So, that is your
invariant here. Therefore, this you can actually
rewrite as, P 1 being equal to P plus P 2,
right. You can always write them in this fashion;
that means, P points where x comma y and P
2 was x 2 comma y 2 and your P 1 is x 1 comma
y 1. So, therefore, your x... Therefore, you
can actually add this. So, I think I wrote
something wrong here. P plus P 1 right. P
plus P 1 is P 2, that is, P 2 is x 2 comma
y 2 and P 1 is x 1 comma y 1.
So, that means, you can use the previous result
here to write or express x 2. So, here, x
2 is actually equal to x 1 plus x whole squared
and you have got x 1 y plus x y 1 plus x 1
x squared plus x x 1 squared in the numerator.
So, that means, this you can write as... So,
this is that x 2. Therefore, if I want to
obtain the value of x y 1, if I target this
x y 1, then I can express x y 1 as x 2 into
x 1 plus x 2 or rather x 1 plus x whole squared
plus x 1 y plus x 1 x squared plus x x 1 squared.
That is equal to x 2 multiplied by x 1 squared
plus x squared plus x 1 y plus x 1 x squared
plus x x 1 squared. So, now, you can actually
take out from here, you can take a common
of x 1, and you can write it as x 1 x 2 plus
x 1 x plus x squared plus y plus x into x
into x 2.
So, that is nothing, but your...Here x squared
into x 2, as we have written in this fashion.
So, x squared into x 2, is this term and in
the other part, that is x 1. Therefore, you
see that, here you have an x 1. So, it is
x 1 squared into x 2, which has been written
in this fashion. Then, we have got an x 1
multiplied with y, which has been written
in this and then, you have got an x 1 squared
x. So, that is your this term x 1 squared
x and then, you have got an x squared x 1.
So, that is, x squared x 1 is this term.
So, these are actually written in two factors,
like this. The reason is, like, if you write
x 1 and then, now, if I add say, x 1 x 2 plus
x 1 x plus x squared and add x 2 plus x squared
plus y. So, basically, I am adding here an
x x 2 plus x squared here, and then in the
other term, I am also writing x x 2 plus x
2 x 1 plus x x 1 plus... So, that is another
term which we are adding here. So, do you
see this? That, this is what, this is at root
term. So, it is x x 1 x 2. So, x x 1 x 2 gets
cancelled over here. What is the other term
that is adding? The x squared x 1, that is
x squared x 1 and so, you see, and this is
the extra term that is being added, over here,
is it ok.
So, that you can write as, if you take x 1
plus x common... So, you see that, here x
squared and x squared will get canceled out;
and, there is a plus y term here. So, there
is a additional plus y term here. So, if you
see that, this one is x 1 into x 2 plus x
1 into x plus x x 2 plus y. What is this term
here? x x 2 plus x 2 x 1 plus x x 1 plus y.
That is the same term, basically. So, therefore,
you can take this as common and you can write
this as, x 1 plus x into x 2 plus x plus x
squared plus y plus x into y.
So, therefore, now you can actually obtain
y 1 from here. You can just take and divide
this by x, you should get y 1. That is, what
is, you remember here, that is, y 1 is equal
to x 1 plus x multiplied with x 1 plus x into
x 2 plus x plus x squared plus y divided by
x plus x y by x. So, that is y. So, that means,
that you are always doing, wherever in this
Montgomery's algorithm you are only bothered
about the x co-ordinates and you are doing
it; finally, at the end when you need the
y co-ordinate of the output also, you can
apply this equation to get the y co-ordinate.
So, therefore, if you write all these things
in the form of an algorithm, then, this is
the way, how you can do so. So, you see that,
here we start with x 1 equal to x, that is
the x co-ordinate of the point. And what does
this indicate, x 2 equal to x squared plus
b by x squared? What does this indicate? This
indicates it is 2P. And I am only bothered
about the x co-ordinate. I am not bothered
about the y co-ordinate. So, x 2 is equal
to x squared plus b by x squared is only the
x co-ordinate of twice P. You remember, the
algorithm, this is the Montgomery algorithm.
So, therefore, we had P 1 storing P, and P
2 storing twice P, right. And in this operation,
I am only bothered about the x co-ordinates.
Therefore, here we only store the, we store
the corresponding x co-ordinates; it is x
and this is x squared plus b by x squared.
Now, what we are doing is that, we are calculating
the value of t equal to x 1 by x 1 plus x
2. And then, if this value is 1, then we are
doing an addition operation here. The addition
is defined only by doing x plus t squared
plus t. So, that is only the x co-ordinate
of the corresponding sum of P 1 and P 2 of
P 1 and P 2 or rather an x 1 and x 2. And
here, you are storing the doubling thing and
when this value is 0, here, you are doing
the doubling in x 1 and you are doing the
addition operation in x 2. And, do you understand
why it is x plus t squared plus t?
That is, from the result 2, right, that is
from the result 2, where you are doing this
operation. So, this is your t. So, this is
t squared plus t.
So, you are doing that t squared plus t only
to do the addition operation. So, finally,
you need to obtain the y co-ordinate. Therefore,
you can obtain the y co-ordinate by doing
this operation. So, what is the advantage
here, you can see, I mean, what is the number
of inversions, multiplications, additions
and squaring required here, is shown here,
which you can work out. Therefore, it is like,
if you want to do the inversion, you see that,
you are always doing an inversion operation.
So, it is 1 by x 1 plus x 2 means, that is,
cost of one inversion always. And whatever
you do, you are always doing inversion either
here or you are doing a inversion here. So,
that means, it is a two inversions which are
necessary.
So, there is one here and there is either
this one or this one. So, that is two inversions,
multiplied by the number of times you are
doing this, running this loop, plus 1 because
you are doing an another additional inversion
operation here. Similarly, you can also count
the number of multiplication operations here,
additions and squarings. So, you see that,
here, still you have got a large number of
times you have to do this inversion operation.
So, that means, that an, that inversion is
actually a very costly step in finite field
operations.
So, therefore, there was, in order to reduce
this number of inversions, these are fine
co-ordinate systems were actually converted
into something which is called as projective
co-ordinates. So, that is basically, a 3 dimensional
space which is being borrowed to reduce the
number of inversions which are necessary.
So, for example, where n is greater than equal
to 128, each inversion is actually like equivalent
to 7 multipliers in hardware design; that
means, it is a, there is a lot of cost and
therefore, if there is an importance of reducing
the number of inversions. Therefore, there
was one projective co-ordinates which is called
as the Lopez Dahab projective co-ordinates,
where x, y and z are the three projective
systems and x, the affine system small x is
actually equal to x by z and a small y is
written as y by z squared. So, that is being
written as these x comma y comma z equivalence
class.
So, the motivation is to, now replace this
inversions by the multiplication operations
and then, perform one inversion at the end
to obtain back the affine coordinates. So,
what is done in projective coordinates is
that, all the operations, the addition and
doubling are done devoid of any inversions.
They are done at the cost of increased number
of addition operations. And increased number
of multiplication operations and finally,
there is a necessity of converting the output
from the projective coordinates back to the
affine coordinates. There we will need to
do one more inversion state.
So, for example, here, if you remember that,
in doubling, this was the corresponding equations
in Montgomery system. That, these are the
values which are being shown here. That, this
you need to do 2 inverses, 1 general field
multiplication, 4 addition and squaring. If
you convert this into the projective coordinates
by the previous transformations that we have
seen, then, you will see that, you actually
need to do 0 inverses. There are no inverses
necessary, but you need to do increased number
of multiplication and addition and squaring
operations. So, therefore, depending upon
your n is to i ratio, that is the number of
multipliers in your inverses, this may become
handy and may help to make your implementations
more efficient.
So, therefore, if you convert this entire
system into projective coordinates, this is
how the Montgomery's algorithm looks like.
So, here you see that, here, you have set
here x as a, I mean x, I mean X 1 as small
x, Z 1 as 1 and X 2 is x to the power of 4
plus b and Z 2 is x squared. So, x to the
power of 4 plus b is... So, you see that,
X 2 and Z 2 are storing what? Are storing
the double point and here it is storing the
single point. Now, you are doing, if k i is
equal to 1, you are doing in projective coordinates,
the addition operations in X 1 Y 1 and you
are a doing doubling in X 2 Z 2.
So, you see that, you are not actually working
on the y 2 point. Why? Because, your Montgomery's
algorithm, Montgomery's technique does not
need me to work on the y coordinate; it just
needs only the x coordinate. And when your
k i is 0, then, also you need to do the addition
here and the doubling in the X 1 Z 1 register.
And finally, when you have got the result,
then you need to convert this back into the
affine coordinates.
So, for that, actually I mean, I am not going
to this, but you need to do one, one inversion
at that point, is necessary. So, this is actually
straight forward, because this is exactly
like the previous algorithm. You had basically
this step.
So, see for example, you had y 1 as x 1 plus
x into... So, basically, what you do did there
was, if you see the y 1 equation, was x 1
plus x into x 1 plus x, into x 2 plus x plus
x squared plus y, this was divided by x plus
y; that was your y coordinates. So, now, your
transformation in your projective coordinate
system is x is equal to x by z; your y is
y by z squared. So, if you do this, then,
what you will write here is, you will take
this point; therefore, you see that, if you
want to do this conversion, you will write
here this x as x, this x 1 point, you will
right as x 1 by z 1. And your x 1 plus x,
again you will write as x 1 by z 1 plus x.
x 2 you will write as, x 2 by z 2 plus x,
right. plus x squared plus y, right. And that
is divided by x and again point y is added,
I mean y is added.
So, therefore, this you can actually simplify
and you can write as x plus x 1 by z 1 and
this, as x 1 plus x into z 1 x 2 plus x into
z 2. So, you see that, there is a z 1 by z
2 at the bottom. Then, you have got x squared
plus y and you can actually write all these
z 1 z 2 z 1 z 2 and minus, inverse right,
therefore, x into z 1 z 2 plus this term y.
So, therefore, you see that, all these terms
x 1 and your x 2 and your x 1 and your x 2
and your z 1, z 2 are there in the projective
coordinates. You are using this projective
coordinates and you are basically applying
only one inversion operation to finally, convert
your projective result back into the affine
coordinates. And, the corresponding x thing
is very simple, because you just need to divide
this by the corresponding z to obtain the
x coordinate.
So, therefore, the if you want to obtain x
3, that is a result of P 1 and P 1 plus P
2, the x coordinate, you just need to divide
x 1, projective coordinate x 1 by z 1, and
for the y coordinate, you need to do more
operations. Actually, you need to do 10 multiplications
and 1 inversion operation to do the final
transformation.
So, if you do a neck to neck comparison of
affine coordinates and projective coordinates,
this is how its looks like. There is more,
the inversions have been reduced, but there
are multiplications and the additions and
the squarings have of course, increased. Therefore,
whether you will go for your affine coordinates
or whether you will go for your projective
coordinates depends upon your i is to m ratio;
that is, in your inverse how many multiplications
are multipliers are there. So, that actually
dictates, whether this one is more efficient
or whether this one is more efficient.
So, there are, I mean, I will not go in to
this, that is, there are some approaches which
says that, you actually keep one of the points
in projective coordinates and keep the other
point in affine coordinates; that is called
as mixed coordinate systems. So, there are,
these are some equations we have derived,
which you can check also offline, that is,
but in this case, the main thing we pointed
out here is that, the number of multiplications
are further reduced, squaring is increased
a bit, but they are cheap in GF 2 power of
n. There is 10 percent improvement if a is
not equal to 0 and if, so, a is 0, then, there
is a 12 percent improvement. So, mixed coordinate
systems are at times more efficient than even
the projective coordinates systems.
So, the, I will just finally, conclude with
some parallelizing, comments on parallelizing
strategies on the point doubling and the point
addition operation. So, you see that, in your
point doubling... So, I am considering the
projective coordinates here. So, in, these
are the operations that you did. Therefore,
you just think of x 1 comma y 1, what you
have done here, is this. That is, you have
taken this x 1, squared it, see z 1 squared
z 2 is t into z 1 squared and finally, you
are adding this t squared plus m squared and
this is your doubled point. This you can check,
that is, this is exactly the same as what
we did previously.
So, how many multiplications are necessary
here? There is one multiplication operation
necessary. Multiplication is constant, I am
not again considering as multiplication. So,
if I got one multiplier, then, that is sufficient
to do this operation, but what about doubling?
You see that, there are more number of steps.
So, here for example, there is one multiplication,
here is another multiplication. So, if I want
to do a parallelization, then I have two,
and there is again a multiplication here,
again a multiplication here. So, if I got
2 multipliers, then, I can actually parallelize
this step and I can parallelize these two
steps also, right. Therefore, depending upon
my resource constant, I can actually parallelize
this point addition also.
And what will be the final, I mean, if you
just see of this Montgomery's algorithm structure,
if I want to parallelize this, I can parallelize
this probably at various levels. Like I can
either parallelize this, at this level of
kP or I can the parallelize the inherent internal
doubling and addition operations; that is,
I can either parallelize level 1 or I can
parallelize level 2.
So, here are some comments. Like, suppose,
if we allocate 1 multiplier to each of m add
and m double, one multiplier to each of m
add and m double, then, we can parallelize
steps 5 a and 5 b. So, we have got two multipliers;
we are allocating one to the addition and
one to the doubling.
So, you see that, here, you are actually doing
a addition and then, you are a doubling in
parallel. Because you have two multipliers,
we have actually given one multiplier to this
and one multiplier to this and you can actually
doing both of them in concurrence, right.
So, what will be the cycle length required,
will be that, equal to that, that required
for the addition because addition is more.
And you know that, from these steps, if you
have got one multiplier instead of two multipliers,
then, for this you need two times test; because,
first you have to do this; then you have to
do this or do this, and then, you have to
do this; you cannot do this two things in
parallel. Similarly, here, you can do either
this or you can do this; you cannot do them
in parallel. So, how many clock cycles you
need to do is 1, 2, 3 and 4. So, you need
to do 4 clock cycles for doing this.
And therefore, for every time step, you need
to do 4 ls; therefore, if l is the number
of times you are iterating this loop, you
require 4 l number of clock cycles, nearly
4 l.
But suppose that, if you can, if you have
got, like, if you can parallelize the underlying
m add and m double, then you cannot parallelize
the level 1; because you have got again a
constant of two multipliers. Therefore, if
you can parallelize the underlying m add and
m double; that means, we are actually given
two multipliers to this, right, because that
is the way, how you can parallelize this.
Then, you cannot do the addition and doubling
simultaneously.
Because you have got a constant of two multipliers,
right; because you do not have three multipliers,
right. If you have three multipliers, you
could have done, parallelize this as well
as perform the doubling and addition in parallel,
right. Now, since you have, if you have parallelized
this, you have to do this and this in sequence.
Which means, for this, you need two clock
cycles, because, you have parallelized this
and you have parallelized this. So, you basically,
you can do this addition in two clock cycles,
but the doubling also, we need one clock cycle.
And, since this doubling and addition has
to be done in sequence, you need three clock
cycles for doing this. And this, we will iterate,
and you will require 3 l number of times for
doing the entire operation.
Suppose, you have got three multiplier, that
is, you have got more resource, sorry, you
can actually do this operation in lesser time.
You can do that in 2 l clock cycles, sorry
right; that means, you see that, Montgomery's
algorithm is highly parallelizable, depending
upon your constants and your requirements
of power and throughput you can actually,
there is a spelling mistake here, it is throughput,
therefore, you can actually make high performance
designs for doing this scalar multiplications.
So, there are lot things to concentrate and
think of about here. So, one of the things
is, like there is something is called Kobiltz
curves, standardized by NIST are of the form
as y squared plus xy equal to x cubed plus
a x squared plus 1 over a binary field. You
can take this exercise to compute the number
of additions and doubling equations for points
on these curves and again compute the number
of multiplications and inversions to be performed
in affine projective and mixed coordinates,
what we have seen. So, you can rework this
on specifically these kind of cons, where
this constant a is equal to 1.
So, here is some of the references, some of
the classic papers that I have followed are
this, this Lopez and Dahab, which is published
in CHES 1999 and also some of the other important
references I mentioned here.
And the books are standard this. So, next
time, we will take up the topic of secret
sharing schemes.
