Homomorphic encryption or HE is one of the technologies that we at Openmined use
to help answer questions from data you cannot see.
In this video, we will explore what HE is, what are its pros and cons,
and we'll introduce the theory behind why it works. So, let's begin.
Homomorphic encryption(HE) is a form of encryption that allows computation on encrypted data.
In short, HE ensures that performing operations on encrypted data
and decrypting the result is equivalent to performing analogous operations
without any encryption.
This technology has numerous applications that range from healthcare to smart electric grids,
from education to machine learning as a service.
all sectors where privacy is crucial and
All sectors where privacy is crucial
and making use of the data is usually very
complex
due to regulations or just the significance of the data.
So, clearly the main advantage of HE is that it enables
model owners to perform inference and analysis
on encrypted data, keeping the results encrypted as well.
So that, for example, in a MLaaS scenario
the owner of the data doesn't need to fully trust the service that he uses.
Then, differently from other related technologies like Secure Multiparty computation,
HE does not require interactivity,
so again in a MLaaS scenario
data owners can send their encrypted data to model owners
and simply wait to receive the results.
Meanwhile, the disadvantages of Homomorphic Encryption regard primarily efficiency concerns
and the overhead that using this technique requires over performing
the same operations on non-encrypted data.
Alright so now that we have a clearer idea of what HE is,
we might want to apply it on our own data
and that's where Pysyft, one of OpenMined's libraries, comes in handy,
providing implementations of the Paillier scheme
which is limited to addition and soon of the CKKS encryption scheme
which supports more complex operations.
Using these schemes is relatively straightforward.
For Paillier we start by importing just syft and torch
then we "hook" torch to add to the objects
of this library, like tensors, the functionalities that we need from syft.
After these steps, we generate the public key that we'll use for encryption
and the private key that we'll use for decryption.
At this point, we can initialize and encrypt two tensors
x and y and sum them together.
This scheme doesn't enable other homomorphic operations
but it is much faster than more complex schemes and it has a small overhead.
The sum tensor decrypts ofc to the correct result.
Using CKKS will be only slightly more complex.
Besides syft and torch, we'll have to also import a python wrapper
of the Seal library from Microsoft that implements the scheme in fast C++ code.
Then we'll proceed by hooking torch,
generating the keys, and initializing the encrypted tensors.
Syft, at first, will support efficient addition, subtraction, and multiplication
on CKKS tensors although the scheme is not limited in the operations it enables.
Understanding the difference between
partially homomorphic encryption schemes
those limited to certain specific
fully homomorphic encryption schemes,
those that support arbitrary functions is crucial. So, let's focus more
specifically on fully homomorphic
encryption
and explain what it means to "compute
arbitrary functions".
Well, If we want to compute a function on a computer we'll have to find a way to
express it as a sequence of combinations of logic gates.
Writing out the 16 possible functions
that take two binary inputs we can see that the AND and XOR gates
are enough to express all of them and
thus they form
a complete set of gates. More interestingly AND and XOR are
operations respectively of binary multiplication and addition,
from this we can infer that if a homomorphic encryption scheme supports
sequences of additions and multiplications then it
should be able to "compute arbitrary functions".
The operations of addition and multiplication suggest the use of a ring as the
underlying algebraic structure.
An example of a ring is the set of
integers modulo N let's say 5. We can imagine this set
projected on a circle split into 5 sectors.
This set is a ring because it has both the neutral elements,
0 and 1, and both the usual addition and multiplication operations.
Besides homomorphic addition and multiplication, what are the other
characteristics common to most FHE schemes?
They tend to be based on schemes that are capable of "somewhat" homomorphic encryption.
These schemes can only perform a limited number of successive
multiplication and addition operations on ciphertext
before the results become unreliable and impossible to decrypt.
This limitation arises directly from the way these systems guarantee security
by relying on random noise to make relatively simple problems computationally intractable.
From here we'll build our own scheme
to see more practically just how hard it is to support long sequences
of additions and multiplications on secure ciphertext.
A very interesting talk from Craig Gentry, one of the most prominent reserchers
in the field,  we'll be our guide. In it, among other things,
he introduces an interesting framework to approach this form of encryption.
The Polly Cracker Framework aims to use encryptions of 0 to disguise any message.
In practice, we can use this idea to build a simple scheme that
encrypts a 1-bit message, so it can either be 0 or 1
and that operates in the ring of integers modulo a large prime p
that should remain hidden, it is, in fact, the secret key.
Let's start with the encryption procedure, if we want to encrypt a
generic message m we first choose randomly both a small
noise term r and also a large q.
The encryption of m  Cp(m) will be simply m + 2r + qp.
So essentially we are circling the p size wheel q-times
and then adding an even value 2r. In the end the parity of the point on the
wheel on which we end up represents the message.
Decryption is as easy as taking the mod p of the ciphertext and then
mod 2 of the result.
The mod p operation removes the qp factor and then we are left with
m +2r which is even if m = 0
and odd if it is equal to 1 as simple as that.
Addition works intuitively. Let's say that we have
C(m1), C(m2) the encryptions of two messages
but we don't know the secret key p. We can still compute a valid encryption of
m1 + m2 albeit with more noise by adding together the encryptions of
the two messages that we have available.
We can verify it
by retracing the encryption as if it was applied to m1+m2 directly.
Of course, multiplication works in a similar way but we can see that in that
case, the noise grows much faster.
Before talking about why a growing noise term is such a crucial issue
let's focus on why we need it in the first place. If we were to remove the
noise from our scheme it would still be capable of performing
homomorphic addition and multiplication.
However, to break the noise-less scheme an attacker would just need to get a hold
of two encrypted messages and proceed by simply calculating the greatest common factor.
If we add noise, the problem becomes much more difficult,
in the literature it is known as the Approximate GCD
or Approximate Common Divisor(ACD) problem and with reasonable parameters,
it is considered to be hard to solve.
Approximate GCD is not the only problem used to secure fully homorphic encryption,
in fact, numerous recent schemes employ
the Learning With Errors or LWE problem which is also conjectured to be hard to solve.
Even schemes that rely on LWE however,
see the noise rise as successive additions are performed and even more so with multiplications.
To understand why that is a problem
we'll go back to our scheme with an example.
Let's say, to keep things simple, that we would like to add to itself a message
m=0 three times, 1982 is a valid encryption of zero using secret key p=29.
three times 1982 is 5946
and as always 5946
doesn't tell us much about the result of
our calculation but with p we can decrypt it as we detailed before.
We start with 5946 mod 29
which is equal to 1 and then we check
whether the result
is even or odd, 1  is odd so we conclude that
3 times m is equal to 1 but of course, we know that is not the case since
m is equal to 0.
so what went wrong?
The error term has gotten too big and instead of being even, it's now odd,
since 30 is bigger than p=29 the modulo p operation that usually
doesn't affect the error because it's too small
now has changed it, corrupting the decryption.
So is there a way to decrease the error? Yes, bootstrapping, a technique that
involves running the decryption procedure homomorphically
without revealing the message and using an encrypted version of the secret key.
To give a sense of how this is possible
we should keep in mind that we can produce a series of additions and
multiplications that perform the decryption operation
in our scheme, modulo p followed by modulo 2.
Furthermore, the scheme we presented here encrypts binary messages
so to get an encryption of the secret key we simply need to have and ordered set
of the encryptions of its bits.
With these two ingredients, we can perform homomorphic decryption
and eliminate the noise produced by previous operations.
Homomorphic decryption, however, introduces some noise
of its own, like any other function, but as long as we can still perform reliably
one operation of addition or of multiplication before needing
bootstrapping again we have reached fully homomorphic encryption.
And so this is the recipe for most proposed
FHE schemes, an underlying somewhat
homomorphic encryption scheme
that supports addition and multiplication, usually secured by adding noise,
and a way to reduce the noise when it
grows too large, usually bootstrapping.
So hopefully, now you know a bit more about this great technology
and if you want to dive deeper, in the article accompanying this video
on the OpenMined blog we have gathered useful links to other learning resources
that focus on specific protocols and expand on the topics
we mentioned today. For bootstrapping in
particular
we are working on a dedicated post to elaborate on the high-level introduction
of this video. This post will be published as part of
the Privacy-Preserving Data Science explained series
along with articles on Privacy by shuffling, Secure multiparty computation
and federated learning which are already online
and a comprehensive mini-series on Differential Privacy which's in the works.
Our blog is also a great place to learn
more about other Privacy-preserving technologies
and to find practical tutorials on all of our libraries.
So check it out and share your favorite article, tutorial, or video
to let other people know about OpenMined.
Lastly a big thank you to all the people
from our community that made this video
possible
and thank you for watching!
