Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
We had many episodes about new wondrous AI-related
algorithms, but today, we are going to talk
about an AI safety which is an increasingly
important field of AI research.
Deep neural networks are excellent classifiers,
which means that after we train them on a
large amount of data, they will be remarkably
accurate at image recognition.
So generally, accuracy is subject to maximization.
But, no one has said a word about robustness.
And here is where these new neural-network
defeating techniques come into play.
Earlier we have shown that we can fool neural
networks by adding carefully crafted noise
to an image.
If done well, this noise is barely perceptible
and can fool the classifier into looking at
a bus and thinking that it is an ostrich.
We often refer to this as an adversarial attack
on a neural network.
This is one way of doing it, but note that
we have to change many-many pixels of the
image to perform such an attack.
So the next question is clear.
What is the lowest number of pixel changes
that we have to perform to fool a neural network?
What is the magic number?
One would think that a reasonable number would
at least be a hundred.
Hold on to your papers because this paper
shows that many neural networks can be defeated
by only changing one pixel.
By changing only one pixel in an image that
depicts a horse, the AI will be 99.9% sure
that we are seeing a frog.
A ship can also be disguised as a car, or,
amusingly, almost anything can be seen as
an airplane.
So how can we perform such an attack?
As you can see here, these neural networks
typically don't provide a class directly,
but a bunch of confidence values.
What does this mean exactly?
The confidence values denote how sure the
network is that we see a labrador or a tiger
cat.
To come to a decision, we usually look at
all of these confidence values and choose
the object type that has the highest confidence.
Now clearly, we have to know which pixel position
to choose and what color it should be to perform
a successful attack.
We can do this by performing a bunch of random
changes to the image and checking how each
of these changes performed in decreasing the
confidence of the network in the appropriate
class.
After this, we filter out the bad ones and
continue our search around the most promising
candidates.
This process we refer to as differential evolution,
and if we perform it properly, in the end,
the confidence value for the correct class
will be so low that a different class will
take over.
If this happens, the network has been defeated.
Now, note that this also means that we have
to be able to look into the neural network
and have access to the confidence values.
There is also plenty of research works on
training more robust neural networks that
can withstand as many adversarial changes
to the inputs as possible.
I cannot wait to report on these works as
well in the future!
Also, our next episode is going to be on adversarial
attacks on the human vision system.
Can you believe that?
That paper is absolutely insane, so make sure
to subscribe and hit the bell icon to get
notified.
You don't want to miss that one!
Thanks for watching and for your generous
support, and I'll see you next time!
