Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
As convolutional neural network-based image
classifiers are able to correctly identify
objects in images and are getting more and
more pervasive, scientists at the University
of Tübingen decided to embark on a project
to learn more about the inner workings of
these networks.
Their key question was whether they really
work similarly to humans or not.
Now, one way of doing this is visualizing
the inner workings of the neural network.
This is a research field on its own, I try
to report on it to you every now and then,
and we talked about some damn good papers
on this, with more to come.
A different way would be to disregard the
inner workings of the neural network, in other
words, to treat it like a black box, at least
temporarily.
But what does this mean exactly?
Let’s have a look at an example!
And in this example, our test subject shall
be none other than this cat.
Here we have a bunch of neural networks that
have been trained on the classical ImageNet
dataset, and, a set of humans.
This cat is successfully identified by all
classical neural network architectures and
most humans.
Now, onwards to a grayscale version of the
same cat.
The neural networks are still quite confident
that this is a cat, some humans faltered,
but still, nothing too crazy going on here.
Now let’s look at the silhouette of the
cat.
Whoa!
Suddenly, humans are doing much better at
identifying the cat than neural networks.
This is even more so true when we’re only
given the edges of the image.
However, when looking at a heavily zoomed
in image of the texture of an Indian elephant,
neural networks are very confident with their
correct guess, where some humans falter.
Ha!
We have a lead here.
It may be that as opposed to humans, neural
networks think more in terms of textures than
shapes.
Let’s test that hypothesis.
Step number one: Indian elephant.
This is correctly identified.
Now, cat — again, correctly identified.
And now, hold on to your papers — a cat
with an elephant texture.
And there we go: a cat with an elephant texture
is still a cat to us, humans, but, is an elephant
to convolutional neural networks.
After looking some more at the problem, they
found that the most common convolutional neural
network architectures that were trained on
the ImageNet dataset vastly overvalue textures
over shapes.
That is fundamentally different to how we,
humans think.
So, can we try to remedy this problem?
Is this even a problem at all?
Neural networks need not to think like humans,
but who knows, it’s research - we might
find something useful along the way.
So how could we create a dataset that would
teach a neural network a better understanding
of shapes?
Well, that’s a great question, and one possible
answer is — style transfer!
Let me explain.
Style transfer is the process of fusing together
two images, where the content of one image
and the style of the other image is taken.
So now, let’s take the ImageNet dataset,
and run style transfer on each of these images.
This is useful because it repaints the textures,
but the shapes are mostly left intact.
The authors call it the Stylized-ImageNet
dataset and have made it publicly available
for everyone.
This new dataset will no doubt coerce the
neural network to build a better understanding
of shapes, which will bring it closer to human
thinking.
We don’t know if that is a good thing yet,
so let’s look at the results.
And here comes the surprise!
When training a neural network architecture
by the name ResNet-50 jointly on the regular
and the stylized ImageNet dataset, after a
little fine tuning, they have found two remarkable
things.
One, the resulting neural network now see
more similarly to humans.
The old, blue squares on the right mean that
the old thinking is texture-based, but the
new neural networks, denoted with the orange
squares, are now much closer to the shape-based
thinking of humans, which is indicated with
the red circles.
And now hold on to your papers, because two,
the new neural network also outperforms the
old ones in terms of accuracy.
Dear Fellow Scholars, this is research at
its finest - the authors explored an interesting
idea, and look where they ended up.
Amazing.
If you enjoyed this episode and you feel that
a bunch of these videos a month are worth
3 dollars, please consider supporting us on
Patreon.
This helps us get more independent and create
better videos for you.
You can find us at Patreon.com/twominutepapers,
or just click the link in the video description.
Thanks for watching and for your generous
support, and I'll see you next time!
