Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
In machine learning, we often encounter classification
problems where we have to decide whether an
image depicts a dog or a cat. We'll have an
intuitive, but simplified example where we
imagine that the red dots represent dogs,
and the green ones are the cats. We first
start learning on a training set, which means
that we get a bunch of images that are
points on this plane, and from these points
we try to paint the parts of the plane red
and green. This way, we can specify which
regions correspond to the concept of dogs
and cats.
After that, we'll get new points that we don't
know anything about, and we'll ask the algorithm,
for instance, a neural network to classify
these unknown images, so it tells us whether
it thinks that it is a dog or a cat. This
is what we call a test set.
We have had a lots of fun with neural networks
and deep learning in previous Two Minute Papers
episodes, I've put some links in the description
box, check them out!
In this example, it is reasonably easy to
tell that the reds roughly correspond to the
left, and the greens to the right. However,
if we just jumped on the deep learning hype
train, and don't know much about a neural
networks, we may get extremely poor results
like this.
What we see here is the problem of overfitting.
Overfitting means that our beloved neural
network does not learn the concept of dogs
or cats, it just tries to adapt as much as
possible to the training set.
As an intuition, think of poorly made real-life
exams. We have a textbook where we can practice
with exercises, so this textbook is our training
set.
Our test set is the exam. The goal is to learn
from the textbook and obtain knowledge that
proves to be useful at the exam.
Overfitting means that we simply memorize
parts of the textbook instead of obtaining
real knowledge. If you're on page 5, and you
see a bus, then the right answer is B. Memorizing
patterns like this, is not real learning.
The worst case is if the exam questions are
also from the textbook, because you can get
a great grade just by overfitting. So, this
kind of overfitting has been a big looming
problem in many education systems.
Now the question is, which kind of neural
network do we want? Something that works like
a lazy student, or one that can learn many
complicated concepts.
If we're aiming for the latter, we have to
combat overfitting, which is the bane of so
many machine learning techniques.
Now, there's several ways of doing that, but
today we're going to talk about one possible
solution by the name L1 and L2 regularization.
The intuition of our problem is that the deeper
and bigger neural networks we train, the more
potent they are, but at the same time, they
get more prone to overfitting. The smarter
the student is, the more patterns he can memorize.
One solution is to hurl a smaller neural network
at the problem. If this smaller version is
powerful enough to take on the problem, we're
good. A student who cannot afford to memorize
all the examples is forced to learn the actual
underlying concepts.
However, it is very possible that this smaller
neural network is not powerful enough to solve
the problem. So we need to use a bigger one.
But, bigger network, more overfitting. Damn.
So what do we do?
Here is where L1 and L2 regularization comes
to save the day. It is a tool to favor simpler
models instead of complicated ones. The idea
is that the simpler the model is, the better
it transfers the textbook knowledge to the
exam, and that's exactly what we're looking for.
Here you see images of the same network with
different regularization strengths. The first
one barely helps anything and as you can see,
overfitting is still rampant. With a stronger
L2 regularization, you see that the model
is simplified substantially, and is likely
to perform better on the exam. However, if
we add more regularization, it might be that
we simplified the model too much, and it is
almost the same as a smaller neural network
that is not powerful enough to grasp the underlying
concepts of the exam. Keep your neural network
as simple as possible, but not simpler.
One has to find the right balance which is
an art by itself, and it shows that training
deep neural networks takes a bit of expertise.
It is more than just a plug and play tool
that solves every problem by magic.
If you want to play with the neural networks
you've seen in this video, just click on the
link in the description box. I hope you'll
have at least as much fun with it as I had!
Thanks for watching, and for your generous
support, and I'll see you next time!
