Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
A quick recap for the Fellow Scholars out
there who missed some of our earlier episodes.
A neural network is a machine learning technique
that was inspired by the human brain. It is
not a brain simulation by any stretch of the
imagination, but it was inspired by the inner
workings of the human brain. We can train
it on input and output pairs like images,
and descriptions, whether the images depict
a mug or a bus.
The goal is that after training, we would
give unknown images to the network and expect
it to recognize whether there is a mug or
a bus on them.
It may happen that during training, it seems
that the neural network is doing quite well,
but when we provide the unknown images, it
falters and almost never gets the answer right.
This is the problem of overfitting, and intuitively,
it is a bit like students who are not preparing
for an exam by obtaining useful knowledge,
but students who prepare by memorizing answers
from the textbook instead. No wonder their
results will be rubbish on a real exam!
But no worries, because we have dropout, which
is a spectacular way of creating diligent students.
This is a technique where we create a network
where each of the neurons have a chance to
be activated or disabled. A network that is
filled with unrealiable units. And I really
want you to think about this. If we could
have a system with perfectly reliable units,
we should probably never go for one that is
built from less reliable units instead. What
is even more, this piece of work proposes
that we should cripple our systems, and seemingly
make them worse on purpose. This sounds like
a travesty. Why would anyone want to try anything
like this?
And what is really amazing is that these unreliable
units can potentially build a much more useful
system that is less prone to overfitting.
If we want to win competitions, we have to
train many models and average them, as we
have seen with the Netflix prize winning algorithm
in an earlier episode. It also relates back
to the committee of doctors example that is
usually more useful than just asking one doctor.
And the absolutely amazing thing is that this
is exactly what dropout gives us.
It gives the average of a very large number
of possible neural networks, and we only have
to train one network that we cripple here
and there to obtain that.
This procedure, without dropout, would normally
take years and such exorbitant timeframes
to compute, and would also raise all kinds
of pesky problems we really don't want to
deal with.
To engage in modesty, let's say that if we
are struggling with overfitting, we could
do a lot worse than using dropout. It indeed
teaches slacking students how to do their
homework properly.
Please keep in mind using dropout also leads
to longer training times, my experience has
been between 2 to 10x, but of course, it heavily
depends on other external factors. So it is
indeed true that dropout is slow compared
to training one network, but it is blazing
fast at what it actually approximates, which
is training an exponential number of models.
I think dropout is one of the greatest examples
of the beauty and the perils of research,
where sometimes the most counterintuitive
ideas give us the best solutions.
Thanks for watching, and for your generous
support, and I'll see you next time!
