Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
As we know from the series, neural network-based
techniques are extraordinarily successful
in defeating problems that were considered
to be absolutely impossible as little as ten
years ago.
When we'd like to use them for something,
choosing the right kind of neural network
is one part of the task, but usually the even
bigger problem is choosing the right architecture.
Architecture typically, at a bare minimum,
means the type and number of layers in the
network, and the number of neurons to be used
in each layer.
Bigger networks can learn solutions for more
complex problems.
So it seems that the answer is quite easy:
just throw the biggest possible neural network
we can at the problem and hope for the best.
But if you think that it is that easy or trivial,
you need to think again.
Here's why.
Bigger networks come at a cost: they take
longer to train, and even worse, if we have
a networks that are too big, we bump into
the problem of overfitting.
Overfitting is the phenomenon when a learning
algorithm starts essentially memorizing the
training data without actually doing the learning.
As a result, its knowledge is not going to
generalize for unseen data at all.
Imagine a student in a school who has a tremendous
aptitude in memorizing everything from the
textbook.
If the exam questions happen to be the same,
this student will do extremely well, but in
the case of even the slightest deviations,
well, too bad.
Even though people like to call this rote
learning, there is nothing about the whole
process that resembles any kind of learning
at all.
A smaller neural network, a less knowledgeable
student, who has done their homework properly
would do way, way better.
So this is overfitting, the bane of so many
modern learning algorithms.
It can be kind of defeated by using techniques
like L1 and L2 regularization or dropout,
these often help, but none of them are silver
bullets.
If you would like to hear more about these,
we've covered them in an earlier episode,
actually, two episodes, as always, the links
are in the video description for the more
curious Fellow Scholars out there.
So, the algorithm itself is learning, but
for some reason, we have to design their architecture
by hand.
As we discussed, some architectures, like
some students, of course, significantly outperform
other ones and we are left to perform a lengthy
trial end error to find the best ones by hand.
So, speaking about learning algorithms, why
don't we make them learn their own architectures?
And, this new algorithm is about architecture
search that does exactly that.
I'll note that this is by far not the first
crack at this problem, but it definitely is
a remarkable improvement over the state of
the art.
It represents the neural network architecture
as an organism and makes it evolve via genetic
programming.
This is just as cool as you would think it
is and not half as complex as you may imagine
at first - we have an earlier episode on genetic
algorithms, I wrote some source code as well
which is available free of charge, for everyone,
make sure to have a look at the video description
for more on that, you'll love it!
In this chart, you can see the number of evolution
steps on the horizontal x axis, and the performance
of these evolved architectures over time on
the vertical y axis.
Finally, after taking about 1.5 days to perform
these few thousand evolutionary steps, the
best architectures found by this algorithm
are only slightly inferior to the best existing
neural networks for many classical datasets,
which is bloody amazing.
Please refer to the paper for details and
comparisons against state of the art neural
networks and other architecture search approaches,
there are lots of very easily readable results
reported there.
Note that this is still a preliminary work
and uses hundreds of graphics cards in the
process.
However, if you remember how it went with
AlphaGo, the computational costs were cut
down by a factor of ten within a little more
than a year.
And until that happens, we have learning algorithms
that learn to optimize themselves.
This sounds like science fiction.
How cool is that?
Thanks for watching and for your generous
support, and I'll see you next time!
