Okay. Let's get started, guys.
So welcome to lecture number 4.
Um, today we will go over two topics that are not discussed,
uh, in the Coursera videos.
Uh, you've been learning C2M1 and C2M2,
if I'm not mistaking.
So you've learned about, uh,
what, uh, an initialization is,
how to tune neural networks,
what tests validation and train sets are.
Today, we're going to go a little further, uh,
you should have the background to understand 80 percent of this, uh, lecture.
There is gonna be 20 percent that I want you to look back
after you've seen the BatchNorm videos for those of you who haven't seen them.
So we split the lecture in two parts,
and I put back the attendance code at the,
at the very end of the lecture so don't worry.
Ah, one topic is attacking, ah,
neural networks, ah, with adversarial examples.
Ah, the second one is generative adversarial networks.
[NOISE] And although these two topics have a common word which is adversarial,
they are two separate topics.
You will understand why it's called adversarial in both cases.
So let's get started with adversarial examples.
And in 2013, ah,
Christian Szegedy and his team have, uh,
published a paper called Intriguing Properties of Neural Networks.
What they noticed is that neural networks,
neural networks have kind of a blind spot, the spots, uh,
for which several machine learning including
the state of the art ones that you will learn about, ah,
VGG 16-19 inception, uh,
networks and resi- re-residual networks
are vulnerable to something called adversarial examples.
These adversarial examples you're going to learn what it is, in three parts.
First, by explaining how these examples in
the context of images can attack a network in their blind spots,
and, and make the network classify these images as something totally wrong.
How to defend against these type of examples,
and why are networks vulnerable to these type of examples.
This is a little bit more theoretical,
and we're going to go over it on the board.
The, the papers that are listed on the bottom are the two big papers that,
that started this field of research.
So I would advise you to go and,
and read them because we have only one hour-and-a-half to go over two big topics,
um, in, in, in deep learning and,
ah, we will not have the time to go into details of everything.
Okay. So let's set up the goal.
The goal is like, is that given a pre-trained network.
So a network trained on ImageNet on 1,000 classes, millions of images.
Ah, find an input image that is not an iguana,
so it doesn't look like the animal iguana.
A batch will be classified by the network as an iguana.
We will call this an adversarial example if we manage to find it.
Okay. Yeah, one question.
Ah, what was the magic code for those that came in this late?
Uh, let me- so 284889,
let me write it down on the board so that you can-
Thank you.
Can you guys see? [NOISE] Okay. Let's move on.
So we have a network pre-trained on image and it's a very good network.
Ah, what I want is to fool it
by giving it an image that doesn't
look like an iguana but is classified as an iguana.
So if I give it a cat image to start with.
The network is obviously going to give me a vector of
probabilities that has the maximum probability for a cat,
because it's a good network.
And you can guess what's the output layer of this network,
it's probably a softmax, so classification network.
Now what I want is to find
an image x that is going to be classified as an iguana by the network.
Okay. Does the, the,
the setting makes sense to everyone?
Okay. Now as usual, uh,
this ma- this, this might remind you of what
we've seen together about neural style transfer.
You remember the, the art generation thing,
where we wanted to generate an image based on
the content of the first image and the style of another image.
And in that problem,
the main difference with classic
supervised learning was that we fixed the parameters of the network,
which was also pre-trained,
and we back propagate the error of the loss all the
way back to the input image to update the pixels,
so that it looks like the content of the content image
and the style of the style image. The first thing we did is that we rephrased the problem.
We, we try to, to,
to phrase what exactly we want.
So wha- what would you say is a sentence that defines our last function let's say.
Any ideas?
Okay. Complicated. Yep.
An image that provides minimum cost.
An image that provides minimum cost.
Okay. What's the cost you're talking about?
Cost of the, the difference between the expected iguana and non-expected iguana.
Expected iguana and non-expected iguana.
Wha- what do you mean exactly by that?
So if we're sort of going back in the training session,
we're trying to train it on
an image and we wanted to think that [NOISE] this is a cat and iguana.
Yeah. Okay. So you want,
ah, this image to minimize a certain loss function,
and the loss function would be the distance metric
between the output you're looking for and the output you want.
Okay. Yeah. So I would say,
we want to find x, the image,
such that y hat of x,
which is the result of the forward propagation of x in the network is equal to y-iguana,
which is a one-hot vector with the one at the position of iguana.
Does that make sense? So now based on that we define our loss function,
which is can be an L2 loss,
can be an L1 loss,
can be a cross-entropy in practice.
Ah, this one, ah, works better.
So you see that minimizing this loss function,
would lead our image x to be outputted as an iguana by the network.
Does that makes sense?
And then the process is very similar to neural style transfer,
where we will optimize the image iteratively.
So we will start with x,
we will forward propagate it,
compute the loss function that we just defined.
And remember, we're not training the network, right?
We'll just take the, the derivative of the loss function all the way back to the inputs,
and update the input using a gradient descent algorithm until we
get something that is classified as iguana.
Yeah, any question on that?
But this doesn't necessarily mean that the x that you get in-
Okay. So you mentioned that it
doesn't guarantee that x is loo- going to look like something.
The only thing is guaranteeing is that
this x will be classified as an iguana if we train properly.
[NOISE] We will, we will talk about that now.
Er, another question in the back I thought. Yeah.
For the last question we miss the one that for logistic regression.
Oh yeah, it could be binary cross en- it could be cross entropy.
Yeah. So in this case not binary cross entropy because we have a, uh, uh, uh,
a vector of, of n classes,
where it could have been cross-entropy.
Okay. So yeah that's true.
We- are we guaranteed that the forged image x,
this one, i- is going to look like an iguana?
Who thinks it's going to look like an iguana?
If you- who thinks it's not going to look like an iguana?
Okay. Majority of people.
So can someone tell me why i- it's not going to look like an iguana?
[NOISE].
[inaudible] making a vector through a vector.
Okay. So you're saying, uh,
the loss function is unconstrained,
is very unconstrained, so we didn't
put any constraints on what the image should look like.
That's true. Actually, the answer to this question is,
it depends. We don't know.
Maybe it looks like an iguana or maybe it doesn't.
But in terms of probabilities,
it's high chance that it doesn't look like an iguana.
So the reason is here. Let's say this is our space of input images.
And the interesting thing is that even if as a human on
a daily basis we deal with images of the real world.
So like, I mean,
if you look at a TV,
uh, that is totally buggy,
you see pixels, random pixels,
but in other contexts,
we usually see real-world distribution images.
A network is deterministic,
it means it takes an image.
Any input image that fits the,
the first layer would,
would be- would produce an output, right.
So this is the whole space of input images that the network can see.
Um, this is the space of real images,
it's a lot smaller.
Can someone tell me what's the size of the, the,
the space of possible input images for a network?
[NOISE].
Infinite.
Huh? Sorry.
Infinite.
Infinite?
Yeah.
Uh, It's not infinite.
It's, it's a lot but not- [NOISE]
It's the number of the pixels to the power of the number of things it could be.
Okay. Uh, yeah, there is an idea here. Someone there?
I also said the same thing with just number of possible pixel permutations.
Yeah, that's true.
So more precisely- you would start with how many pixel values are there?
There are 255, 256 pixel values,
and then what's the size of an image?
Let's say 64 by 64 by 3,
and your results would give you 256,
so you fix the first pixel,
256 possible value, then the second one can be anything else,
then the third one can be anything else,
and you end up with a very big number.
So this is a huge number.
And the space of real images is here.
Now if we had to plot the space of im- of images classified as an iguana,
it would be something like that.
Right. And you see that there is a small overlap between the space
of real images and the space of images classified by- as an iguana by the network.
And this is where we probably are not.
We're probably in the green part that is not overlapping with the red part,
because we didn't constrain our optimization problem.
Does that make sense? Okay. Now we're
going to constrain it a little bit more, because in practice,
these type of attacks are not too dangerous because as a,
as a human we would see that the pictures look like garbage.
The dangerous attack is if the picture looks like a cat,
but the network sees it as an iguana and a human see it as a cat.
Can someone think of, uh,
of like malicious applications of that?
[NOISE] Face recognition, it could show a face of- you,
you could show your, your,
picture of your face, it pushed the network [NOISE] to think it's a face of someone else.
What else? Yeah.
Breaking CAPTCHAs and breaking like against bot detection.
Yeah. Breaking CAPTCHAs.
If you know what the output,
what output you want you can force the network to think that these CAPTCHA,
uh, thi- this input CAPTCHA is the output it's looking for.
Or in general, I would say like social medias, uh,
if someone is malicious and wants to put, uh,
violent content online,
there is- all these companies have algorithms to check for this violent content.
If people can use adversarial examples that look still violent,
but are not detected as violent by the algorithms using this methodology,
they could still publish their violent pictures.
Uh, think about self-driving cars.
A stop sign that looks like a stop sign for everyone,
but when the self-driving car sees it, it's not a stop sign.
So these are malicious applications of adversarial examples, and there are a lot more.
Okay. And in fact, the picture we generated
previously would look like that. It's nothing special.
So now let's constrain our problem a little bit more.
We're going to say we want the picture to look like a cat but be classified as an iguana.
Okay. So now say we have our neural network.
If we give it a cat it's going to predict that it's a cat.
What we want is still give it a cat but predict that it's an iguana.
Okay. I, I go quickly over that because it's very similar to what we did before,
so I just plucked, I just put back what we had on the previous slide.
Okay, exactly the same thing.
Now, the way we rephrase our problem will be a little different.
Instead of saying we want only y hat of x equals y- iguana,
we have another constraint.
What's the other constraint?
The picture x should be closer to a picture of a cat.
So we want x equal or very close to x-cat.
And in terms of loss function,
what it does is that it adds
another term which is going to decide how x should be close to x-cat.
If we minimize this loss now,
we should have an image that looks like a cat because of the second term,
and that is predicted as an iguana because of the first term.
Does that makes sense? So we're just building up our loss functions,
and I guess you guys are very familiar with this type of thought process now.
Okay, and same process,
we optimize until we hopefully get the cat.
Now our question is,
what should be the initial image we start with?
We didn't talk about that in the previous example [NOISE].Yeah.
White noise?
White noise.
Yeah, possibly white noise.
Any other, uh, proposals?
Maybe a cat.
A cat? Yeah, which cat?
The [inaudible] [NOISE].
I don't know. Probably the cat that we put in the loss function, right?
Because it's- is the closest one to what we want to get.
So if we want to have a fast process,
we'd better start with exactly this cat,
which is the one we put in our loss function here, right?
If we put another cat,
it's going to be a little longer because we have to
change the pixel of the other cat to look like this cat.
That's what we told our loss function.
If we start with white noise,
it will take even longer because we have to change the pixels all the way
so that it looks real and then it looks like a cat that we defined here.
So yeah, the best thing would be probably to start with the picture of the cat.
Does that makes sense? And then move the pixels so that
this term is also minimized. Yeah.
So when you write that loss function,
it seems like you are implicitly saying that what a human sees as a cat will
just be like minimizing the RMSE error to the actual cat picture, right?
Yeah?
Is that- I mean, I thought that RMSE error was
actually a really bad way to gauge whether or not a human,
like saw two images as similar.
Yeah. This is, this is empirical,
the fact that we use that type of, of loss function.
But in practice, it could have been any distance between X and X cat,
and any distance between Y hat and Y cat, yeah,
and Y iguana, sorry. Yes.
So when you say X cat is [inaudible] just one specific cat.
Yeah.
[inaudible].
Exactly.
I can't think of a way of making a constrained,
like a complex loss function that takes a bunch of cats.
And then it puts like something like a minimum of it, right?
The minimum distance between [inaudible]
Can we just look at this wide [inaudible]
probability of like 0.55 probability of iguana and cat and then try to [inaudible]
I'm not sure about the second method.
But just to repeat the point you mentioned,
is that here we had to choose a cat.
It means the X cat is actually an image of a cat.
So what if we don't know what the cat should look like,
we just want a random cat to come out and be classified as an iguana.
We're going to see uh,  generating networks
after which can be used to do that type of stuff.
But, uh, but for the second part of the question,
I'm not sure what the optimization process would look like.
Okay, let's move on?
So yeah, it's probably a good idea to start
with the cat image that we specified in the loss function.
Okay. And so then we have an image of a cat that originally
was classified as 92 percent cat and we modified a few pixels.
So you can see that this image looks a little blurry.
So by doing this modification,
the network will think it's an iguana.
Okay? And sometimes this modification can be very slight and we
can even not be able to notice it. Sounds good.
Now, let's add something else to this,
uh, to this, uh,
to this, uh, draft.
We add a third set which is the space of images that look real to a human.
So that's interesting because the,
the space of images that look real to a human is
actually bigger than space- than the space of real images.
An example is this one.
This is probably an image that looks real to human,
but it's not an image that we could see in,
in the daily life because of these slight pixel changes.
Okay? So these are the space of dangerous adversarial examples.
They looked real to human but they're not actually real.
They might be used to fool a model.
Okay. Now let's see a video, uh,
by Kurakin et al, uh,
on real-world example of adversarial examples.
So for those who cannot read,
they're taking, uh, a camera which,
which classify- which has a classifier.
And the classifier classifies the first part as
a library and the second image that is that- the same as a prison.
So the second image has slight different pixels but
it's hard to see for a human. Same here.
So the, the the classifier on the phone classifies
the first image as a washer with 52 percent accuracy,
confidence, and the second one as a door mat.
Yeah. So yeah, this is,
uh, a small example of- of what can, what can be done.
Okay. Now let's go,
we've seen how to generate these adversarial examples.
It's an optimization process.
We will see, uh,
what are the type of attacks that we can
lead and what are defenses against these adversarial examples.
So we would usually,
uh, split the attacks into two parts.
non-targeted attacks and targeted attacks.
So non-targeted attacks mean that we just want outputs,
we just want to find an adversarial example that is going to fool the model.
While targeted attack is we want to force
this adversarial example to be output- to output a specific class that we chose.
These are two different type of attacks that,
that are widely discussed in, in the research.
Knowledge of the attacker is something very important.
For those of you who did some crypto,
you know that we talk about white-box attacks, black-box attacks.
So one interesting thing is that,
uh, a black-box attack- a white-box attack is when you have access to a network.
So we have our image in pre-train- in pre-trained network.
We have fully access to,
to all the parameters and, and the gradients.
So it's probably an easier attack.
Right? We can, we can back-propagate all the way
back to the image and update the image, like we did.
A black-box attack is when the model is probably encrypted or something like that,
so that we don't have access to its parameters,
activations, and, uh, architecture.
So the question is how do we attack in
black-box attack if we cannot back-propagate because we don't have access to the layers?
Any ideas? Yeah.
Numerical gradient.
Numerical gradient. Yeah, good idea.
So you know you will trick the image a little
bit and you will see how it changes the loss.
Looking at these you can,
you can do have an estimate of the numerical gradient.
Even if the model is a black-box model.
This assumes that you can query the model,
right? You can query it.
What if you cannot even query the model or you can query it one time only,
it's to send the adversarial example.
How would you do that? So this becomes more complicated.
So, there is ve- a very complex property of
these adversarial examples is that they're highly transferable.
It means I have a model here that is,
uh, an animal classifier, okay?
I don't have access to it.
I cannot even query it.
I still wanna fool it.
What I'm going to do is that I'm going to build my own animal classifier,
forge an adversarial example on it.
It's highly likely that it's going to
be an adversarial example for the other one as well.
So, this is called transferability,
and it's still a, uh, research topic, okay?
We're trying to understand why this happens and,
uh, also, uh, how to defend against that.
You know, maybe a defense against that is to,
is to- we're going to see it after, I'm not gonna say it now, sorry.
Uh, does that make sense or no, this transferability?
Probably is because two animal classifiers look at the same features in images, right?
And maybe these pixels that are play- we're playing
with are changing also the output of the other network.
Let's go over some kind of defenses.
So, one solution to defend against
these adversarial networks is to create a safet- Safety Net. A Safety Net is what?
Is, uh, a net that- like a firewall,
you would put it before your network.
Every image that comes in will be classified as fake like forged or real
by the network and you only take those which are real and no- not adversarial.
Does that make sense? So, you could- you could- you could say that,
okay, but we can also build an adversarial network that,
that fools this network, right?
Just we need black-box or white-box,
we can just create an adversarial net- example for this network.
It's true. But the issue is that now we have two constraints.
We have to fool the first one and the second one at the same time.
You know, maybe if you fool the first one,
there is a chance that the second one is going to be fooled.
We don't know, okay?
It just makes it more complex.
There is no good defense at this point to- to- to all type of adversarial examples.
This is an option that people are researching for.
So, the paper is here if you want to check it out.
Can you guys think of another solution?
[NOISE].
I've got one.
Yeah.
Just like multiple in terms of loss functions [inaudible]
adversarial examples loss functions and train them.
Train on multiple loss functions of different networks?
Yes.
So, you're talking about ensembling.
Maybe we can- maybe we can create five networks to do our tasks,
and it's highly unlikely that the adversarial example is going
to fool the five networks the same way, right?
Any other idea? Yes.
Uh, generate adversarial examples and train on them.
Exactly. Generate adversarial examples and train on those, okay?
So, you will generate a cat image that is adversarial.
So, some pixels have been changed to fool a network.
You will label it as the human sees it.
So as a cat because you want the network to still
see that as a cat and you will train on those.
The downside of that is that it's very costly.
We've seen that generating adversarial examples is super
costly and also we don't know if it can generalize to other adversarial examples.
Maybe we are going to overfit to the ones we have.
So, it is another optimization problem.
Now, another solution is to
train on adversarial examples at the same time as we train on- on normal examples.
So, look at this loss function.
This loss function, the loss mu is a sum of two loss functions.
One is the classic loss function we would use.
So, let's say, cross entropy in the case of a- of
a classification and the second one is
the same loss function but we give it the adversarial version of x.
So, what's the complexity of that at every gradient descent step?
For every iteration of our gradient descent,
we're going to have to iterate enough to forge
an adversarial example at every step, right?
Because we have x, what we wanna do is
forward propagate x through the network to compute the first term,
generate x adversarial with the optimization process and forward propagate
it to calculate the second term and then back propagate over the weights of the network.
This is super costly as well and is very similar to what you said,
it's just online just all the time, okay?
So, what is interesting is we're going to delve a little more.
There's another technique called logit pairing, I just put it here.
We're not going to talk about it.
There is the paper here if you want to check it.
It's another way to do adversarial training.
Uh, but what I would like to talk about is more,
from a theoretical perspective,
why are neural network vulnerable to adversarial examples?
So, let's, let's do some,
some work on the board.
Yeah, one question.
Let's say, uh, so, when you want to expose the [inaudible] probably look like a cat, all right?
So, you expect to be able to [inaudible] can't you just [inaudible] denoise it [inaudible]?
Denoising is also a method that's interesting, but you- so the thing is that it's just like in crypto,
every time you come up with a defense,
someone will come up with an attack and it's a race between humans, you know.
So, this is the same type of problem.
And security problems are open-ended.
Okay. So, let's go over, uh,
something interesting that is more on the ins- on  the intuition side of adversarial examples.
So, let me- let me write down something.
Uh, so, one question we ask ourselves
is why do adversarial example exist? What's the reason?
And Ian Goodfellow and- and his team have came up
with explaining- with the- one of the seminal papers of adversarial examples,
where they argue that although many people in the past have- have
attributed this existence of adversarial examples to
high non-lineari- non-linearities of neural networks and overfitting.
So, because we over-fit to a specific dataset,
we actually don't understand what cats are.
We just understanding what,
what we've been trained on.
Uh, they argue that it's actually the linear parts of networks that
is the cause of the existence of adversarial examples. So, let's see why.
And the example I'm gonna- I'm gonna look at is linear regression.
So, together we've seen logistic regression.
Linear regression is basically the same thing without the sigmoid.
So, before the sigmoid,
we have y-hat equals wx plus b.
So, the forward propagation of our network is going
to be y-hat equals wx plus b, okay?
And our first example is going to be a six-dimensional input.
Okay. We have a neuron here,
but the neuron doesn't have any activation because we're in linear regression.
So here what happens is simply w x plus b.
Okay? And then we get y-hats.
And we probably use an L1 or L2 loss because it's a regression problem to,
uh, to train this network.
Now let's look at our first example.
Our first example where, uh,
where it's- where we trained our network.
So network has been trained- sorry.
Network has been trained and
converged to
w equals one,
three, minus one, two, two, three.
This is w. And you know, like,
because we defined x to be a vector of size 6,
a column vector, w has to be a row vector of size 6.
So the network converge to this value of w and b equals 0.
So now, we're going to look at these inputs.
We're giving a new input to the network.
And the net- th- the input is going to be one,
minus one, two, zero, three, minus two.
Okay. So I'm going to forward propagate this to get y-hat equals wx plus b.
[NOISE].
And this value is going to be 1 times 1 minus 3
minus 2 plus 0 plus 6 minus 6.
If I didn't make a mistake, up, up,
2 minus 3 plus, okay.
[NOISE] Okay.
And so we- we- we basically get minus 4.
And so this is the- the- the first- the first example that was propagated.
Now, the question is [NOISE] how to change x
into x-star
such that y-hat changes
radically but x-star is close to x?
So this is basically a problem of adversarial examples.
Can we find an example that is very close to
x but radically- radically changes the output of our network?
And we're trying to build intuition on- on adversarial neural networks.
So the interesting part is to- is to identify how we should modify x.
And the intuition comes from the derivative.
If you take the derivative of y-hat with respect to x,
you know that the definition of this term is- is like correlated to
the impact on y-hat of
small changes of x, right?
How- what's the impact of small changes of x to- on- on the output?
And if you compute it, what do you get?
W.
W? Everybody agrees?
What's- what's the shape of this thing?
Shape of that is the same as shape of x.
So it should be w-transpose.
Remember, derivative of a scalar with respect to a vector is the shape of the vector.
Okay. Now it's interesting to- to see this because if we compute x-star to be,
let's say, x plus a small perturbation like,
I will call it, perturbation value.
Can you write bigger?
Yeah. Sorry. And can you see the top one?
Yeah.
You said yes or no?
Yes.
Okay. [NOISE]. So what if x-star equals x plus epsilon times w-transpose?
You know, and this epsilon,
I will call it value of the perturbation.
Now, if we forward propagate x-star,
it means we do y-hat-star equals w x-star plus b,
would be zero at this point.
We're going to get w x plus epsilon w times w-transpose.
And w times w-transpose is a dot product, right?
So this is the same as w-squared.
So what is interesting?
It's interesting because the-
the smart part was that this term is always going to be positive.
It means we- we moved a little bit x because we
can make this change little by changing epsilon to a small value.
But it's going to push y-hat to a larger value for sure. You know?
And if I had a minus here instead of a plus,
it will push y-hat to a smaller value.
And the- the interesting thing is, now,
if we compute x-star to be x plus epsilon times w-transpose,
and we take epsilon to be a small value like, let's say, 0.2.
You can make the calculation.
What we get is- is this.
So 1, minus 1, 2, 0,
3, minus 2, plus 0.2 times 1,
0.2 times 3, minus 0.2,
plus 0.4, plus 0.4, and plus 0.6.
So if you look at that,
all the positive values have been pushed on the right. You agree?
And all the negative values- uh, sorry, sorry.
No, that's my bad. No, no, that's not it.
So let- let's finish the calculation and I'll give the insight after.
1.2, minus 0.4,
1.8, 0.4, 3.4, and minus 1.4.
So this is our x-star that we hope to be adversarial.
Okay. Let's compute y-hat-star to see what happens.
It's w x-star plus b, which is zero.
So what we get when we multiply w by x-star is 1.2-
[NOISE]
1.2 minus 1.2,
minus 1.8 plus 0.8
plus 6.8 and minus 4.2.
[NOISE], which I believe is going to give us 0.5.
All right.
So we see that a very slight change in x-star has pushed y-hat from minus four to 0.5.
And so a few things we want to notice here.
[NOISE].
So insights on this- on this small example.
The first one is that, uh,
if W is large,
then X star is not similar to X, right?
The larger the W, the less X star is- is likely to be like X.
And specifically, if one entry of W is very large,
XI, the pixel corresponding to this entry is going to be very different from XI star.
Um, if W is large,
X star is going to be different than X.
So what we're going to do is that we are going to take
sign- sign of W instead of taking W. What's the reason why we do that?
Because the interesting part is the sign of- of the W. It means,
if we play correctly with the sign of W,
we will always push the X,
this term WX star in the positive side.
Because every entry here,
this multiplication is going to give us a positive number, right?
And the second insight is that as X grows in dimension,
the impact of plus epsilon sign of W increases.
Does that make sense? So the impact
of sign of W on Y hat increases.
And so what's interesting to notice is that we can keep epsilon as small as possible.
It means X and X star will be very similar but as we grow in dimension,
we're going to get more term in this, a lot more term.
And the change in Y hat is going to grow and grow and grow and grow and grow.
And so the one reason why adversarial examples
exist for images is because the dimension is very high,
64 by 64 by three.
So we can make epsilon very small and take the sign of W,
we will still get Y hat to be far from the original value that it had.
Does that make sense? Yeah. Do you guys have any questions on that?
So epsilon doesn't grow with the dimension,
but its impact of this term increases with the dimension.
[NOISE] Okay.
[NOISE].
The one hot encoder changes what into what? So you have the input image cat, right?
Yeah.
It puts it right between these two that gives [inaudible].
Okay. So you like- you try to unadversarially [inaudible] the cat?
Yeah.
Yeah. I- I don't know if that had been done.
I don't think that has been done.
So you're talking about taking an encoder that takes the adversarial example,
convert it into a normal image of the cat and then give the cat.
Yeah.
Maybe yeah. I don't know.
So it's a topic of research.
Uh, okay, let's move on because we don't have too much time.
So just to conclude,
what we're going to count as
a general way to generate adversarial examples is this formula.
[NOISE]
This is going
to be a fast way to generate adversarial example.
So this method is called the fas- Fast Gradient Sign Method.
So basically what we're doing is that we can- we- we are linearizing
the cost function in- in the proximity of, uh, the parameters.
And we're saying that what's applied to linear networks here
is going to also apply for this general formula for deeper networks.
So we're pushing the pixel images in one direction
that is going to impact highly the output, okay?
So that's the intuition behind it.
Now you might say that, okay,
we did this example on the linear network,
but neural networks are not linear,
they are highly non-linear.
In fact, if you look where the research has been going for the past few years,
we are trying to linearize all the behaviors of these neural networks.
With ReLU for example, or with Xavier initialization.
All that type of methods,
even the sigmoid, when we train on sigmoid,
we do all we can to put sigmoid in the linear regime,
because we want fast training.
Okay? And one last thing that I'll mention for
adversarial examples is if I have a network like this.
[NOISE]
So fully connected
with three-dimensional inputs, up, yeah.
And then one here and then the output.
What's interesting is computing the chain rule on- on- on this neuron,
will give you that derivative of the loss function with respect to, let's say,
X is equal to the derivative of the loss function with respect to Z one, one.
Here times derivative of Z one,
one with respect to X.
Let's say we're- we're going- we're going,
there is actually a summation here.
But anyway. Uh, just let me illustrate the point.
Uh, what we're- what we're saying is that- what we're- what we
try to do with neural networks is to have this gradient be high.
Because if this gradient is not high,
we're not able to train the parameters of
this neuron and we need this gradient to be high.
Because if you want to do the same thing with the- with W one,
one, which is the parameter related to this neuron,
you would need to go to this chain rule.
Correct? So we need this gradient to be high.
And if this gradient is high,
the gradient with respect to the input is also going to be high.
Because you use the same gradient in the chain rule.
So networks that are- that have
high gradients and that are operating in the linear regime are even more,
uh, vulnerable to adversarial examples because of this observation.
So any question on, on adversarial examples?
Before we move on, I think we don't have time and I would like to,
to go over the, the GANs with you guys.
So let's move on to GANs.
I'll stick around to answer questions on that part.
So the general question we're asking now is,
uh, do neural networks understand the data?
Because we've seen that some,
some data points look like they would be real,
uh, but the neural networks don't understand it.
So more generally, uh, can we build generated networks that
can mimic the real-world distribution of images?
Let's say, and this is what we will call generative adversarial networks.
We'll start by motivating it,
and then we look at something called the minimax game between two networks,
a generator and a discriminator,
that are going to help each other improve,
and finally we'll see that GANs are hard to train, uh,
we'll see some tips to train them, and finally,
go over some nice results and methods to evaluate GANs, okay?
So, uh, the motivation behind generative adversarial networks is to handle
computers with an understanding of our world, okay?
So by, by that we mean that we want to collect a lot of data,
use it to train a model that can generate
images that look like they're real even if they're not,
so a dog that has never existed can be generated by this network.
Um, and finally, uh,
the number of parameters of the model, uh,
is smaller than the amount of data,
we already talked about that,
and this is the intuition behind why a generated network can exist.
Is because there is too much data in the world,
any images count as data for generating the network,
and there are not enough parameters to mimic this data.
You know, you have- the network needs to understand the salient features of the dataset,
because it doesn't have enough parameters to overfit everything.
So let's talk about probability distributions.
So these are samples from real images that have been taken,
and if you plot this real data distribution in a 2-D map,
uh, it would look like something like that.
I made it up, but this is
the image space similar to what we talked about in adversarial networks,
and this green shape is the space of real-world images.
Now, uh, if you train a generator and generate some images that look like this,
and these images come from StackGAN, uh, from Zhang et al.
Uh, this distribution, if the generator is not good,
is not going to match the real world distribution.
So our goal here is to do something so
that the red distribution matches the real-world distribution,
then to train the network so that it realizes what we want.
So this is our generator and it's what counts,
is what, what we want to train ultimately.
We want to give it, let's say,
a random number or a random latent code of 100 dimension scalar numbers,
and we want it to output an image.
But of course because it's not trained initially,
it's going to output a random image,
looks like something like that random pixels.
Now, this image doesn't look very good.
What we want is these images to look like
generated images that are very similar to the real world.
So how are we going to help this generator train?
It's not like what we did in classic supervised learning,
because we don't have,
uh, we don't really have inputs and labels,
you know, there is no label.
We could maybe give it an image of a cat and ask it to output another cat,
but we want the network to be able to output things that don't exist,
things that we've never seen.
Right. So we want the network to understand what a cat
is but not overfit to the cat we give it.
So the way we're going to do it is through
a small game between these network called the generator G,
and another network called the discriminator D. Let's,
let's look at how it works.
We have a database of real images,
and we're going to start with this distribution on the bottom,
which is the real-world data distribution,
is the distribution of the images in this database.
Now our generator has this distribution initially,
it means the pixels that you see here
probably follow a distribution that doesn't match the real world.
We'll define the discriminator D,
and the goal of the discriminator will be to detect if an image is real or not.
So we're going to give several images to this discriminator,
sometimes we will give it generated images,
and sometimes we will give it real-world images.
What we want is that this discriminator is a binary classifier that outputs
one if the image is real and zero if the image was generated, okay?
So let's say we give it x coming from the generated image is going to give us zero,
because we want the discriminator to detect that x was actually G of z.
If the image came from our database of real images,
we want the discriminator to say one.
So it seems like the discriminator would be easy to train, right?
It's just a binary classification.
We can define a loss function.
That is the binary cross entropy.
And the good thing is we can have as many label as we want,
like it's, it's unsupervised but a little bit supervised, you know,
we have this database and we label it all as one,
it's just this image exists,
let's label them as one for discriminator,
and everything that comes out of the generator let's label it as zero for discriminator.
So basically, data is not costly at all in this point.
The way we will train is that we will backpropagate
the gradient to the discriminator to train the discriminator,
using a binary cross entropy.
But what we ultimately want is to train the generator, that's what we want.
At the end, we were not going to use the discriminator,
we just want to generate images.
So we are going to direct the gradient to go back to the generator.
And why does this gradient can go back to the generator?
The reason is that x is G of z,
it means we can backpropagate
the gradient all the way back to the input of the discriminator.
But this input depends on the input of the generator if the image was generated.
So we can also backpropagate and direct
the gradient to the generator. Does it make sense?
There is a direct relation between z and the loss function,
in the case where the image was generated.
If the image was real,
then the generator couldn't get the gradient,
because x doesn't depend on z or on the features and parameters of the generator.
Okay? So we would run an algorithm such as Adam,
um, simultaneously on two minibatches,
one for the true data and from, from generated data.
Does this scheme makes sense to everyone?
Yeah, one question?
So you said there was two minibatches, you're not mixing two and generating it together.
So there's many methods of, your question is about mixing the minibatches.
Usually we would use, uh, we would,
we would use one minibatch for the real data and one minibatch for the fake data.
But in, in practice,
you can try other things.
Yeah. So there are many methods that are being tried to train GANs property.
We're going to delve a little more into
the details of that when we will see the loss functions.
So we hope that the probability distributions will match at the end,
and if it matches,
we're going to just take the generator and generate images,
normally it should be able to generate images that look real,
[NOISE] that looked like they came from this distribution.
Okay? Sounds good?
So now let's talk more about the training procedure and
try to figure out what the loss functions should be in this case.
What should be the cost of the discriminator?
Assuming, assuming we give two minibatches,
one for real data, so real images,
and one for generated data that come from G [NOISE].
Yes.
The same basic loss function we use for every binary classes, right?
The same basic loss function we use from binary class- for binary class case.
It's true we're going to tweak it a tiny bit,
but it's the same idea.
So this is what it can look like.
We're going to call it JD,
cost function of the discriminator.
It has two terms. What does the first term say?
What does the second term say?
And you can recognize the binary cross-entropy here.
[NOISE].
The only difference is that we have
a label that is Y_real and a label that is Y_generated.
In practice, Y_real and Y_generated are always going to be set to values.
We know that Y_generated is zero and we know that Y_real is one.
So we can just remove these two terms because they're both equal to one.
The first term is telling us this should correctly
label real data as one, the cross-entropy term.
The first term of a binary cross-entropy.
The second term is going to tell us,
D should correctly labeled generated data as zero.
So the difference with classic cross-entropy we've seen is that,
this summation is the summation over the real mini-batch.
And the summation on the second cross-entropy is a summation on generated mini-batch.
Does that makes sense?
So we both want the D to correctly identify real data,
and also correctly identify fake data.
That's why we have two terms.
Now, what about the generator?
What do you think should be the cost function of the generator? Yes.
So just about that cost function.
If I've been putting data that's from the generator,
I won't run the first pass because I don't have a,
uh, a Y_real if I have the- an input that's coming in from the generator.
Yeah. Exactly.
It's about half of this.
Yeah. But in your batch, we have had, like,
a certain number of real example,
a certain number of generated examples.
The generated examples have no impact on the first cross-entropy,
and same for the real examples on the second cross-entropy. Any other questions?
Okay. So coming back to the cross- to the- to the cost of the generator.
What should it be? This is a tiny bit complicated.
Let's move- let's move on because we don't have too much time.
The cost of the generator basically should say that G should try to
prove D. [NOISE] The goal is to for G to generate real samples.
And in order to generate real samples,
we want to fool D. If G managed to fool D and D is very good,
it means G is very good, right?
The problem is that it's a game.
Because if D is bad and G fools D,
it doesn't mean that G is good.
Because G- because D is bad,
it doesn't detect very well the real versus fake examples.
We want D to go up to- to be very good and G to go up at the same time.
Until the equilibrium is reached at a certain point where D will always output one-half,
like, random probabilities because it cannot
distinguish the samples coming from G versus the real samples.
So this cost function is basically saying, uh,
for generated images, we want D to classify them as one.
That's what it's saying. We want to fool D,
okay? Yeah. One question.
Uh, just a little bit of a side question, um, I
can kind of see- so if you're implementing this,
I can kind of see how you would, uh, you know,
implement for D, but how would you implement for D as if you're actually implementing this?
Um, is there- has there been a module to dot train this
because it's not immediately obvious how you do this setup?
So, you know, like, if you- if,
if you're using- so how to implement that?
If you're using a deep learning framework,
you've been building a graph, right?
And at the end of your graph,
you've been building your cost function D that is very close to a binary cross-entropy.
Uh, what you're going to just do is to define a node that is going to be minus
the cost function of D. It's going- every time you are going to call the function J of G,
it's going to run the graph that you define for J of D and run,
uh, an in- an opposition operation- an oppositive operation. Yeah.
So now you have two different cost functions.
How can they propagate gradients back the same way?
These are two different cost functions.
Propagate gradients back the same way?
Yeah.
We're not going to propagate the same way.
We are going to- to returning [OVERLAPPING]
to a minus sign for the grad- for the generator.
So, you know, you- you- you backpropagate on the- on
the- on- on D. And when you backpropagate on G,
you would flip- you would flip the sign. That's all we do.
The same thing with the sign flipped.
In terms of implementation it's just, uh, another operation.
Okay. Now, let's look at someth- something interesting is that this, uh, log- logarithm.
Let's look at [NOISE] at the graph of the logarithm.
So I'm going to plot against the axes, axis G,
oh sorry, D of G of z.
So what does this mean?
This axis is the output of D when given a generated example, G of z.
It's going to be between zero and one because it's a probability.
D is a binary classifier with a sigmoid, uh, output probably.
Um, if we plot logarithm of X.
So, like, this type of thing.
This would be log of D of G of z.
Does it makes sense? That's the logarithm function.
Um, if I plot minus that, minus that.
So let me- let me plot minus logarithm of G of D of z or,
or let me- let me do something else.
Let me plot logarithm of minus D of G of z.
This is it. Do, do you guys agree?
Now, what I'm going to do is that I'm going to plot another function that is this one.
That is logarithm of one minus D of
G of z, okay?
So the question is,
right now, what we're doing is that we're saying the,
the cost function of the generator is logarithm of 1 minus D of G of z.
So it looks like this,
right? It looks like this one.
[NOISE] What's the issue with this one?
What do you think is the issue with this cost function looking at it like that?
It goes to negative infinity?
Sorry.
It goes to negative infinity?
Can you say it louder?
I mean, it go- goes to negative in- infinity.
It goes to negative infinity in,
in one, that's what you mean?
Yeah.
Yeah. And so the, the consequence of that is that
the gradient here is going to be very large,
the closer we go to one.
But the closer we are to zero,
the lower is the gradient.
And it's the reverse phenomenon for this lo- logarithm.
The gradient is very high,
and very high I mean in absolute value.
Very high when we're close to zero,
but it's very low when we go close to one, okay?
So which loss function do you think would be better?
A loss function that looks like this one or a loss function that looks like
this one to train our generator?
The broader question is where are we early in the training?
Are we close to here or are we close to there?
What does it mean to be close there?
Close to one? [NOISE].
You're fooling the network.
Hmm?
You're fooling the network.
You're fooling the network. It means that D thinks that generated,
uh, samples are real.
They're here. This place is the contrary.
D thinks that generated samples are fake.
It means, correctly finds out that they're fake.
Early on, we're generally here.
Because the discriminator is better than the generator.
Generator outputs garbage at the beginning,
and it's very easy for the discriminator to figure out that it's fake
because this garbage looks very different from real world data.
So early on, we're here.
So which function is the best one to- to- to- to- to be our cost?
[inaudible].
Yeah. So probably, this one is better.
So we have to use a mathematical trick to change this into that.
Right. And the mathematical trick is pretty standard.
Right now, we're minimizing something that is in log of one minus X.
We can say that doing so is the same as maximizing something that is in log of X.
Do you agree? Simple flip.
I mean, max flip.
And we can also say that it is the same as minimizing something in minus log of X.
Does it make sense? So we are going to use this mathematical trick
to convert our function that is a saturating cost,
we would say, into a non-saturating cost that is going to look more like this.
Let's see what it looks like.
So to sum up,
our cost function currently looks like that.
It's a saturating cost.
Because early on, the gradients are small.
We cannot train G. We're going to do a flip that I just talked about on the board,
and converts this into another function that is a non-saturating cost.
Okay. Yeah. Well, actually, yeah.
So the reason it's the blue one is like that is because I added a minus sign here.
So I'm flipping this.
Okay? And it's the same thing,
it's just the- the sign of the gradient that is going to be different.
Like that, the gradient is high at the beginning and low at the end. That makes sense?
[NOISE] So we're going to do the- use this flip.
And so we have a new training procedure now where J of D
didn't change but J of G changed.
We have a minus sign here and instead of the log of one minus D of G of Z,
we have the log of G,
uh, D of G of Z.
Does that make sense to everyone?
Good. And actually, so this is a fun thing
if you- if you check this paper which is really cool, our GANs
created a large,
study of many, many different GANs.
It shows what people have tried.
And you can see that people have tried all types of loss to make GANs trainable.
So it looks- it looks complicated here.
But actually, the MM GAN is the first one we saw together.
It's the mini-max loss function.
The second one is the non-saturating one that we just see.
So you see between the first two.
The only difference is that on the generator,
we gets the log of one minus D of X hat becoming log- minus log of D of X hat.
Okay. Now, another trick to train GANs is to use the fact that,
uh, a non-saturating, uh,
to use the fact that D is usually easier to train
than G. But as D improves, G can improve.
If D doesn't improve, G cannot improve.
So you can see the- the- the- the performance
of D as an upper bound to what G can achieve.
Because of that, we will usually train D more time than we will train G.
So we will basically train for num_iteration,
K times D, one times G. K times D,
one times G, and so on.
So that the discriminator becomes better than the- the generator can catch up.
Better than can catch up,
and so on. That make sense?
There's also methods to use like
different learning rates for D and G to take this into account,
to train faster the discriminator.
Okay. Uh, because we don't have too much time,
I'm going to skip the BatchNorm with GANs.
We are going to sit probably next week, uh,
together after you guys have seen the BatchNorm videos.
Okay. It's cool. So just to sum up.
Some- some tips to train GANs is to modify the cost function.
We've seen one modification, there are many more.
Uh, keeping D up-to-date with respect to G. So updating D
more than you update G using Virtual BatchNorm which is a derivate of BatchNorm,
so it's a different type of BatchNorm that is used here.
And something called one-sided la- label
smoothing that I'm not going to talk about it today because we don't have time.
So let's see some nice result now,
and that's the funniest part.
Um, so some of you have worked with word embeddings,
and you- you might know that word embeddings
are vectors that can encode the meaning of a word.
And you can compute operations sometimes on these- on these words.
So if you take, um,
if you take king minus queen,
it should be equal to man minus woman.
Operations like that.
That's happened in the space of encoding. So here's the thing.
You can use a generator to generate faces,
and the paper is listed on the bottom here.
So you give a code that is a random code and it will give you an image of a- a face.
You can give it a second code,
it's going to give you a second image that is
different from the first one because the code was different.
You can give it a third one,
it's going to give you a third fa- third face.
The fun part is,
if you take code one minus code two plus code three.
So basically, image of a man with glasses minus image of
a man plus image of a woman will give you an image of a woman with glasses.
So [OVERLAPPING].
So this is interesting because it means that linear operation in
the latent space of codes have impact directly on the image space.
Okay. Let's look at something even better.
So you can use GANs for image generation.
Of course, these are very nice samples.
You see that sometimes,
GANs have problem with- with the- [LAUGHTER] I don't know. I don't think that's a dog.
But- but- but the- but these
are StackGAN++ is a- is a very impressive GAN
that has generated- that has been state of the art for a long time.
Okay. So let's see something fun.
Something called image-to-image translation.
So, uh, actually, the- the-
the project winners last quarter in Spring was a project dealing with exactly that.
Generating satellite images based on the map image.
So given the map image, generate the satellite image using a GAN.
So you see that instead of giving a latent code that was 100 dimensional,
you could give a very detailed code.
The code can be this image.
Right? And you have to find a way to constrain your network in a certain- with- in
a certain way to push it to output
exactly the satellite image that corresponded to this map image.
There are many other results that are fun.
Converting zebras to- horses to zebras and zebras to horses.
Um, and apples to oranges and oranges to apple.
So let's do a- a case study together.
Let's say our goal is to convert horses to zebras on images and vice versa.
Can you tell me what data we need?
Let's go quickly so that we have some time.
Horses and zebras?
Yeah. Horses and zebras.
Do you need per images?
You know, like, do you need to have the same image of a horse as a zebra?
No.
Yeah. So the problem is, uh, okay,
we could have labeled images, you know,
like uh, a horse and its,
uh, zebra doppelganger in the same position.
Uh, and we could train a network to take one and output the other.
Unfortunately, we don't- not- every horse has
a doppelganger that is a zebra, so we cannot do that.
Uh, so instead, we're going to do unpaired,
uh, unpaired generative adversarial networks.
It means we have a database of horses and a database of zebras.
But these are different horses and different zebras.
They're not one-to-one- there's no one-to-one mapping between them.
There's no mapping at all. What architecture do you wanna use?
GAN?
Nice.
[LAUGHTER] GAN, not a [inaudible].
Okay. So let's see about the architecture and the cost.
So I'm going over it very quickly because it's a-
it's a very fun GAN with- it's called CycleGAN.
So the way we are going to work it out is we have a horse called
capital H. We want to generate the zebra version of this horse, right.
So we give it to a generator that we call G1.
You can call it H2Z,
like horse to zebra.
It should give us this horse H as a zebra, right.
And in fact, if we're training a GAN,
we need a discriminator.
So, we will add a discriminator that is going to be a binary classifier to tell us
if this image outputted by Generator 1 is real or not.
So this discriminator is going to take in some images of zebras, probably,
or-yeah, zebras or horses [NOISE],
and it's going to also take the generated images
and going to see if which one is fake which one is real.
On the other hand, we're going to do- and the vice versa is very important.
We need to enforce the fact that this horse G1 of H
should be the same horse as H. In order to do that,
we're going to create another gen- generator which is going to take the generated image,
and generate back the input image.
And this is where we will be able to enforce the constraints that G2 of G1 of
H should be equal to H. Do you see why this loop is super important?
Because if we don't have this loop,
we don't have a constraint on the fact that the horse
should be the- the zebra should be the horse as a zebra,
the same horse as H. So we'll do that and
we have a second discriminator to decide if this image is real.
This is one step, H2Z.
Another state might be Z2H where we start with zebra,
give it to Generator 2,
ge nerate the horse version of the zebra.
Discriminate, generate back the zebra version
of the zebra and discriminate. Does that makes sense?
So this is the general pattern using CycleGANs.
And what I'd like to go over is what loss should we minimize in
order to enforce the fact that we want
the horse to be converted to a zebra that is the same as the horse.
Can someone give me the terms that we need?
Someone wants to give it a try?
Go for it. Two minutes. Yes.
So you want to make sure that the picture in
the end that is of the zebra that you started off with,
matches the zebra that you started it with or
the horse that you started off matches the horse that you had originally.
Okay.
But at the same time, you also need to have Discriminator 2
identifying that the image is a real zebra or a real horse-
Yeah.
-because you don't want it to just sort of input
in the sample image and it output back to you the sample image.
Yeah, correct.
So I think you'd want to add the output of the cost function for Discriminator
2 to the cost that you get at for comparing the starting images.
Okay, that's correct. So you're saying we need
the classic cost functions that we've seen previously,
plus another one that is the matching between H and G2 of G1 of H,
and Z and G1 of G2 of Z.
Yes.
Correct. So we have all these terms.
One term to train G1,
which is the classic term we've seen,
differentiate real images from generated images.
G1 is what? Same. We are using the non-saturating costs on generating images.
Same for D2. Same for G2. These are classics.
The one we need to add to all of this is
the cycle costs which is the distance between this term,
G2 of G1 of H and H,
and the same thing for zebras.
Does that make sense? So you have the intuition to build that type of loss.
We just sum everything and it gives us the cost function we're looking for. Yeah.
Can we use the same,
uh, D1 as D2?
It's the same [inaudible] recognized [inaudible]
Oh, the same cost function for D1 and D2?
Yeah. Could you use the same-
So, the, the- you could but it's not going to work that well.
I think- So I think there's a- there's a tiny mistake here,
is that, uh, the Zi here,
the small Zi should be small Hi,
and the small Hi on top should be a small Zi.
Because the Discriminator 1 is going to receive
generated samples that look like zebras because it came out of G1.
So you want the real database to- that you give it to to be zebras as well.
To force- to force the generator one to output things that look like zebras,
and vice versa for the second one.
Okay? And this my favorite.
So you can convert the ramen to a face and back to a ramen.
[LAUGHTER] It's the most fun application I found.
It's from Naritomi et al, and Takuya Tako.
So it's Japanese research lab are working hard to,
to, to do face2ramen [LAUGHTER].
And actually, in two- in two to three weeks,
you will learn, um,
object detection, you know, to detect faces.
And if you learn that, maybe you can start a project to like
detect the face and then replace it by a ramen.
[LAUGHTER] Because I don't know, this is also a funny,
funny work by Naritomi.
Okay. Oh, this is a super cool application as well.
So let's look at that.
Okay. So we have- so this model is a conditional GAN that was conditioned on learning,
um, learning edges and generating cats based on the edges.
So I'm gonna- I'm gonna to try to draw a cat.
[LAUGHTER] Okay, sorry.
I cannot see [LAUGHTER].
Again, I'm not a good drawer- [LAUGHTER]. It's a cat.
Okay. It's going to download the model.
I hope it's gonna work. [LAUGHTER] Okay.
Yeah,
I,
I don't think it worked,
but it's supposed to work.
So you can generate cats based on,
on edges and you can do it for different things.
You can do it for a shoe.
So all these models have been trained for that. Okay.
Yeah, I have a question.
Yes, go for it.
[NOISE] So, so for this model,
would you have the specific things for the things that you want it to generate?
Like two things, so cats and shoes in this case?
Uh, sorry. Can you repeat?
Is it generalizable or do you have to train it specifically for the domains?
You have to train it specifically for the domain.
So like these models are different models that have [NOISE] been trained.
Okay.
Okay. I'm looking for my presentation,
[NOISE] I missed it. The presentation disappeared.
Okay. Another application is super resolution.
You can give a lower resolution image and generate
the super resolution version of it using GANs.
And this is pretty cool because you can get,
uh, a high resolution image,
down-sample it, and use this as the minimax game, you know.
[NOISE] Like you have
the high resolution version of the lower ver- ver- lower-resolution image.
Um, other applications can be privacy-preserving.
So some people have been working on- you know in medical- uh,
in the medical space privacy is a huge issue.
You cannot share a dataset among hospitals,
among medical teams it's common,
so people have been looking at generating a dataset that looks like a medical dataset.
If you train a model on this dataset,
it's going to give you the same type of parameters than the other one,
but this dataset is anonymized.
So they can share the anonymized data with each
other and train their model on that, without
being able to access, uh,
the information of the patient and who it is.
Um, manufacturing is important as well,
so GANs can generate, um,
very specific, uh, objects that can replace bones for humans,
personalized to, to the human body.
So same for dental.
If you lose the teeth, uh, the,
the technician can take a picture and decide what's the,
the crown should look like.
The GAN can generate it.
Um, another topic is how to evaluate GANs, you know.
Um, you might say we can just look at the images and see if they
look real and it will give us an idea if the GAN is working well.
In fact, this is hard because maybe the images you're looking at are over-fitting images
from the real samples you gave to the- to the- to the discriminator.
Uh, so how do you check that?
It's very complicated.
So human annotation is a big one,
where you would, uh,
[NOISE] you would build a software,
push it on the cloud and people all around the world are
gonna select which images look generated,
which images look not generated to see if a human can, can,
can compare your GAN to real-world data,
and how your GAN performs.
So it would look like that.
A web app indicates which image is fake, which image is real.
You can- you can do different experiments like you can show very quickly
an image for a fraction of a second and ask them was it real or not,
or you can give them unlimited time.
Different experiments can be led.
Uh, there is another one that is more scalable because human annotation is very painful.
You know, every time you train a GAN,
you want to do that to verify if the GAN is working well. It takes a lot of time.
So instead of using humans,
why don't we use a very good network that is good at classification.
In fact, in fact, the Inception network is a tremendous network that does classification.
We're going to give our image samples to
this Inception network and see what the network thinks of this image.
Does it think that it's a dog or not?
Does it look like a dog for the network or not?
And we can scale it and make it very quick.
And there is a Inception score that,
that we can talk next week about when we'll have time.
Uh, it measures the quality of
the samples and also it measures the diversity of the sample.
I'll go over it next week, hopefully.
Uh, there is another distance that is very popular, uh,
that has been growingly popular recently called the Frechet Inception Distance.
And I, I- I'll advise you to check some of
these paper if you're more interested in it for, for your projects.
So just to end, um, for next Wednesday,
we'll have, uh, C2 and three and also the whole C3 modules.
[NOISE] Uh, you'll have three quizzes.
Be careful, these two quiz,
C3M1 and C3M2 are longer than ca- than normal quizzes.
They're like wide case studies, so take your time,
and go over it, um,
and you have one programming assignment.
Uh, make sure you understand the BatchNorm videos,
so that we can go over the virtual BatchNorm hopefully next week together.
Um, and hands-on section this Friday, uh,
you will receive your project proposal as soon as possible, uh,
and meet with your project TAs to go over the proposal and
to make decisions regarding the next steps for your projects.
Uh, I'll stick around in case you have any questions. Okay. Thanks, guys.
