(engaging music)
- Alright, hi everyone. Hi.
So if you ever watch the TV show CSI,
you may have noticed that
while in entertainment
value it's pretty good,
in terms of accuracy
of portraying what
computers can actually do,
it is not as good.
(audience laughs)
I found this great episode
where they're like,
we need to find the killer's IP address,
and someone says, I know, I'll code a GUI
in Visual Basic to do that.
(audience laughs)
And, I have no words.
Well, part of me wants to say,
have you heard of React Native?
(audience laughs)
Anyway, the most kind of egregious
and consistent offense of this type
that you see on TV shows is of course
the infamous zoom and enhance,
where you have this low-resolution photo
of a license plate or the criminal's face
and you need to upscale it magically
into this perfectly detailed image
that you can use to solve the crime.
Now, as people who know
something about computers,
you probably see this and think
that is completely impossible.
But as it turns out that
with recent developments
in machine learning, it's
actually kind of possible.
We're gonna show, I'm gonna
show kind of how that works.
So imagine that you're an engineer
at the tech department
at the police department
in CSI, and you're sitting in your office,
questioning your coworker's sanity,
and your boss walks in and says,
hey we have this image of this face
and it looks like that, can you please
just enhance it like we do all the time?
(audience laughs)
And you say well, it
doesn't really work that...
Okay, we'll give it a shot.
So where do we even start?
Well, I know what you're all thinking.
(audience laughs)
No, the answer is not to use...
That was yesterday, enough
Excel for one weekend.
Alright, but seriously,
what are we gonna use,
I guess Photoshop?
So you know, in Photoshop
you can make an image bigger
just by changing the
numbers of Pixels in it.
And it uses this thing
called bicubic interpolation,
which is a really simple algorithm.
All it does is, you kind
of put some new blank
pixels in between the pixels you already
have and for each pixel
you kind of fill it in
based on the pixels around it.
Just do that for every pixel
and you get this new image.
So if we do that we get
something that looks like that.
You know, it's not great,
but it's kind of the
best we have in the real world.
So we show it to our boss...
(audience laughs)
Alright, it's not gonna cut it.
Alright, so it's 2017, we
have other things we can try.
Like maybe a neural network?
So what is a neural network, right?
All a neural network is
is a function that takes
some sort of input and turns
it into some sort of output.
Except instead of
actually writing the code
for the function, we
just train the function
based on a bunch of examples.
So in this case what we
want is a neural network
that takes this
low-resolution photo and turns
it into the high-resolution
version of the same photo.
And all we need to train
that function is a bunch
of pairs of low-resolution
face, high-resolution face.
So we can actually get those online.
There's this great data
set of like 200,000
photos of celebrity faces.
So that gives us our high-res versions,
and for each one we can just downscale it
to get the low-res version,
and that actually gives us
a pretty nice training set.
So what we're gonna do
is we feed each of these
into our neural network
and for each training pair
what we do is we give the
low-res version to our
neural network and then it
takes its current best guess
at what the up-scaled
version should look like.
But because we're in the training process
we actually have the real high-res
version available as well.
What we wanna do is on
each iteration of training
make the neural network
a little bit better
at actually producing the real answer.
So what we can do is
kind of do a per-pixel
subtraction between these two images
and see where their different.
And you notice there's
this bright red pixel here?
That's because the neural network guessed
that that bottom left pixel
is black but it's actually
white in the real
answer, so it was way off
on that pixel so we
kind of want to tell it,
hey you should fix that next time.
And if we take the total
per-pixel difference
for the entire image, that gives us
what's called our loss
function which is basically
just what we're gonna
try to tell the network
to minimize as we train it.
Which makes sense, right,
because if that difference
goes to zero then that
means our neural network
is perfect at reproducing the real
high-resolution image every single time.
And it's never gonna get
to zero, but we train
for a while with that loss function
and we just tell it to minimize
that total per-pixel difference
and it can get pretty low.
So great, that was pretty easy.
So let's see how it worked out.
So we give it our low-res image...
(audience laughs)
Not quite the silver
bullet that neural networks
were supposed to be, interesting.
So we compare it to the
bicubic interpolation result
and it doesn't really look
that much better, right?
But you know, whatever,
we used a neural network,
it was cool, so let's show it to the boss.
No no, not good enough.
This is not gonna cut it for TV quality.
We need to figure out what's going on.
So why didn't the neural
network work that well?
Well it actually comes back to
our choice of loss function.
So if we think about what's going on here,
it's kind of illustrative
to use an example
where there's a sharp edge in the image.
So you see here the low-resolution image
you can kind of see that edge,
and if our neural network
goes out on a limb
and kind of draws a sharp edge
in the high-res image,
what you see here is that
it actually got it a little bit off
from where the real edge was.
And when that happens, our
loss function gives a really
high loss because the
sharp edge that it drew
wasn't exactly where
the real sharp edge is.
So if our neural network
kind of goes out on a limb
and tries to draw a
sharp image, it's never
gonna get it exactly
right and so it really
gets penalized for that.
And so what we end up
incentivizing out neural network
to do with the loss function we chose
is to kind of generate this blurry blob
that will never be too far
off from the real answer.
So it kind of limits
how far off it can be.
But that's not what we want for our
TV quality perfect face, right?
We want it to generate
this realistic sharp image,
so how can we do that?
Well, we can come up
with a new loss function
that minimizes that total
per-pixel difference
but also makes the guess looks
kind of sharp and realistic.
So how can we do that?
Well we wanna add
something else to the loss.
And that's where we come to
generative adversarial networks.
So this is a cool idea
that was invented in 2014
by a graduate student
named Ian Goodfellow.
And it was inspired by adversarial games.
So as an example of an adversarial game,
you might imagine the example of people
counterfeiting money.
So the counterfeiters make some fake bills
and they look okay, and
they work for a while
but eventually the police catch on
and they develop these better techniques
for catching these counterfeit bills.
But then the counterfeiters
catch onto that
and their gonna develop
some counterfeit bills,
and the police catch onto
that, and so on, and so on.
And you see this sort
of adversarial dynamic
in lots of different
places like online fraud
or DRM in music, things like that.
(audience laughs)
And so what if there was a way to apply
this idea to our neural network?
And it turns out there is.
So what we do is we have
the same low-res high-res
training example and we
take the neural network
we had before and we
call it the generator,
kind of like the counterfeiter.
And so it's still trying
to produce that high-res
image and we have our real high-res image
but the difference is
that instead of doing
the per-pixel difference, we add a new
neural network into the system
called the discriminator.
And it's kind of like the police.
So what it's goal is it
gets an image as input
and its output is to say the probability
that that image is actually
a real high-res image
or whether it's a fake high-res image
generated by the neural network.
And so we train it to
get really good at that,
and it's goal is to get
it right every time.
Is this real or is this fake?
We just kind of randomly give it either
a real or a fake on each time.
Now the generator on the other hand,
its goal now becomes to
minimize the accuracy
of the discriminator to fool that other
neural network.
And so what that ends up
doing is incentivizing
the generator network to produce images
that look realistic and
indistinguishable from real ones.
So the one last step is
that every time we train
the discriminator gives
feedback back to the generator
that says hey, I detected
that this image was fake.
And here's how you can make
it a little better this time
that would make it harder for me to tell.
It would be kind of as if the police were
telling the counterfeiters
like, what they were doing,
and the counterfeiters
were learning from that.
So we do that, and we set that up.
And we train for a while
and you can see, this is
as we're training the results
are getting better and better.
And they start out looking pretty weird
but as we go, they start
getting kind of sharper
and converging on something
that looks pretty realistic.
And so now, let's see how it worked.
So we just pull out our
generator network now
which has been trained in
the GAN structure, and...
Isn't that pretty cool?
So we kind of just imagined this face
just out of these pixels.
If you compare it to the original result
from the earlier neural
network it's way better.
Alright, boss is happy.
(audience laughs)
So we can kind of see what this looks like
with a couple other faces.
And you see it produces these results
that look pretty realistic.
If you look at what the original
high-res image actually
was, you can see that
it doesn't quite match up.
So that's kind of the catch here, right?
Is that the information
needed to reconstruct
the face isn't actually in the image.
So we're just kind of making it up.
But we don't really care
about that, it's just TV,
it doesn't really matter if
we catch the wrong person.
(audience laughs)
So it's not CSI ready
yet in the real world.
You can also, I tried it on my own face.
I found out that it doesn't
always work that well.
So this is me.
(audience laughs)
It gave me like a really long nose
and no mouth, I don't know why.
The other catch is that
we only train it on faces,
so if we try to upscale the Bang Bang Con
logo of example, it's gonna try to like,
make a face out of it.
(audience laughs)
I think I see a face in there somewhere,
I'm not really sure.
So people have done all
sorts of crazy stuff
with this technique, it's
blowing up in ML these days.
You can dump a bunch of
photos of zebras and horses
into a GAN and it'll learn
how to convert between them.
It doesn't always work, either.
(audience laughs)
Nice try, I guess.
What if the input was a sketch
instead of another photo?
You can generate a
realistic photo of a cat.
You may have seen this demo floating
around online, it uses GANs.
I tried drawing a cat myself
and it actually worked pretty well.
(audience laughs)
And you know, one other
crazy example out of many.
What if you take captioned
images and try to convert
just from text describing a
bird to a photo of that bird?
And that actually, people
have gotten that to work.
And so this is just scratching the surface
of what people are doing
with GANs these days,
it's really cool.
I'd encourage you to go on Google
and kind of check out the
other stuff that's happening.
That's it, thanks.
(audience applause)
