Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
Hold on to your papers, because this work
on AlphaGo is absolute insanity.
In the game of Go, the players put stones
on a table where the objective is to surround
more territory than the opponent.
This is a beautiful game that is particularly
interesting for AI research, because the space
of possible moves is vastly larger than in
chess, which means that using any sort of
exhaustive search is out of question and we
have to resort to smart algorithms that are
able to identify a small number of strong
moves within this stupendously large search
space.
The first incarnation of DeepMind's Go AI,
AlphaGo uses a combination of a policy network
that is responsible for predicting the moves,
and a value network that predicts the winner
of the game after it plays it to the end against
itself.
These are both deep neural networks and they
are then combined with a technique called
Monte Carlo Tree Search to be able to narrow
down the search in this this large search
space.
This algorithm started out with a bootstrapping
process where it was shown thousands of games
that were used to learn the basics of Go.
Based on this, it is clear that such an algorithm
can learn to be as good as formidable human
players.
But the big question was, how could it possibly
become even better than the professionals
that it has observed?
How could the disciple become better than
its master?
The solution is that after it has learned
what it can from these games, it plays against
itself many-many times to improve its skills.
This second phase is the main part of the
training that takes the most time.
Let's call this base algorithm AlphaGo Fan,
which was used to play against Fan Hui a 2-dan
European Go champion, who was defeated 5 to
0.
This was a historic moment and the first time
an AI beat a professional Go player without
a handicap.
Fan Hui described his experience as playing
against a very strong and stable player and
he also mentioned that the algorithm felt
very human-like.
Some voiced their doubts within the Go community
and noted that the algorithm would never be
able to beat Lee Sedol, a 9-dan world champion,
and winner of 18 international titles.
Just to give you an intuition of the difference,
based on their Elo points, Lee Sedol is expected
to beat Fan Hui 97 times out of 100 games.
So a few months later, DeepMind organized
a huge media event where they would challenge
him to play against AlphaGo.
This was a slightly modified version of the
base algorithm that used a deeper neural network
with more layers and was trained using more
resources than the previous version.
There was also an algorithmic change to the
policy networks, the details on this are available
in the paper in the description, it is a great
read, make sure to have a look.
Let's call this algorithm AlphaGo Lee.
This event was watched all around the world
and can perhaps be compared to Kasparov's
public chess games against Deep Blue.
I have the fondest memories of waking up super
early in the morning, jumping out of the bed
in excitement to watch all these Go matches.
And in a long and nailbiting series, Lee Sedol
was defeated 4 to 1 by the AI.
With significantly less media attention, the
next phase came bearing the name AlphaGo Master,
which used around ten times less tensor processing
units than the AlphaGo Lee and became an even
stronger player.
This algorithm played against human professionals
online in January 2017 and won all 60 matches
it had played.
This is insanity, but if you think that's
it, well, hold on to your papers now.
In this newest work, AlphaGo has reached its
next form, AlphaGo Zero.
This variant does not have access to any human
played games in the first phase and learns
completely through self-play.
It starts out from absolutely nothing, with
just the knowledge of the rules of the game.
It was trained for 40 days, and by day 3,
it reached the level of AlphaGo Lee, this
is above World champion level.
Around day 21, it hits the level of AlphaGo
Master, which is practically unbeatable to
all human beings.
And get this, at 40 days, this version surpasses
all previous AlphaGo versions and defeats
the previously published worldbeater version
100-0.
This has kept me up for several nights now
and I am completely out of words.
In this version, the two neural networks are
fused into one, which can be trained more
efficiently.
It is beautiful to see these curves as they
show this neural network starting from a random
initialization.
It knows the rules, but beyond that, it is
completely clueless about the game itself,
and it rapidly becomes practically unbeatable.
And I left the best part for last - it uses
only one single machine.
I think it is fair to say that is history
unfolding before our eyes.
What a time to be alive!
Congratulations to the DeepMind team for this
remarkable achievement.
And, for me, I love talking about research
to a wider audience and it is a true privilege
to be able to tell these stories to you.
Thank you very much for your generous support
on Patreon and making me able to spend more
and more time with what I love most.
Absolutely amazing.
And now, I know it's a bit redundant, but
from muscle memory, I'll sign out the usual
way.
Thanks for watching and for your generous
support, and I'll see you next time!
