Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
A few months ago, AlphaGo played and defeated
Fan Hui, a 2 dan master and European champion
player in the game of Go. However, the next
opponent, Lee Sedol is a 9 dan master and
world champion player. Just to give an intuition
of the difference, Lee Sedol is expected to
beat Fan Hui 97 times out of 100 games. Google
DeepMind had 6 months of preparation for this bout
Five matches were played over five days. In
my timezone, the matches started around 4
am, and the results would usually pop up exactly
a few minutes after I woke up. It was amazing.
I could barely fall asleep I was so excited
for the results, and when I woke up, I kissed
my daughter and immediately ran to my computer
to see what was going on.
Most people were convinced that Lee Sedol
was going to beat the machine 5-0, and I was
stunned to see AlphaGo triumphed over Lee
Sedol in the first match, and then the second,
and then the third. Huge respect for both
Google DeepMind for putting together such
a spectacular algorithm and for Lee Sedol
who played extremely well under enormous pressure.
He is indeed a true champion.
The game of Go has a stupendously large search
space that makes it completely impossible
to check every move and choose the best.
What is also not often talked about is that
processing through many moves is one thing,
but judging which move is advantageous and
which is not, it just as difficult as the
search itself. The definition of the best
move is not clear-cut by any stretch of the
imagination.
We also have to look into the future and simulate
the moves of the opponent. I think it is easy
to see that the difficulty of this problem
is completely out of this world.
A neural network is a crude approximation
of the human brain, just like a stick figure
is a crude approximation of a human being.
In this work, neural networks are used to
reduce the size of the search space, and value
networks are used to predict the expected
outcome of a move. This value network basically
tries to determine who will win if a sequence
of moves is made.
To defeat AlphaGo, or any computer opponent,
playing non-traditional moves that it surely
hasn't practiced sounds like a great idea.
However, there is no database involed per
se, this technique is simulating the moves
until the very end of the game, so non-traditional
"weird" moves won't throw it off.
It is also very important to know that the
structure of AlphaGo is not like Deep Blue
for chess. Deep Blue was specifically designed
to maximize metrics that are likely to lead
to victory, such as pawn advantage, king safety,
tempo and more. AlphaGo doesn't do any of
that. It is a general technique that can learn
to solve a large number of different problems.
I cannot overstate the significance of this.
Almost the entirety of computer science research
revolves around creating algorithms that are
specifically tailored to one task. Different
task, different research projects, different
algorithm. Imagine how empowering it would
be to have a general algorithm that can solve
a large amount of problems. It's incredible!
Just as people who don't speak a word of Chinese
can write an artificial intelligence program
to recognize handwritten Chinese text, someone
who hasn't played more than a few games can
write a chess or Go program that is beyond
the skill of most professional players. This
is a wonderful testament of the power of mathematics
and science.
It was quite surprising to see that AlphaGo
played seemingly suboptimal moves when it
was ahead to reduce variance and maximize
its chance of victory. Take a look at at DeepMind's
other technique by the name Deep Q-Learning
that plays space invaders on a superhuman
level. This shot, at first, looks like a blunder,
but if you wait it out, you'll see how brilliant
it really is.
A move that seems like a blunder at a time
may be the optimal move in the grand scheme
of things. It not a blunder. It is a move
from someone whose brilliance is way beyond
the capabilities of even the best human players.
There is an excellent analysis of this phenomenon
on the Go reddit, I've put a link in the description
box, check it out.
I'd like to emphasize that the technique learns
at first, by looking at a large number of
games by amateurs. But the question is, how
can it get beyond the level of amateurs? After
looking at these games, it will learn the
basics and will play millions of games against
itself and learn from them.
And, to be emphasized: nothing in this algorithm
is specific to Go. Nothing. It can be used
to solve a number of different problems without
significant changes. It would be immensely
difficult to overstate the significance of
that.
Shoutout to Brady Daniels who has an excellent
Go educational channel. He has very fluid,
enjoyable and understandable explanations,
highly recommended, check it out. There is
a link to one of his videos in the description
box.
It is a possibility that the first Go grandmaster
to reach 10 dans may not be a human, but a
computer. My mind is officially blown. Insanity.
One more cobblestone has been laid on the
path to artificial general intelligence.
This achievement I find to be of equivalent
magnitude to landing on the Moon. And this
is just the beginning. I can't wait to see
this technique being used for research in
medicine.
Huge respect for Demis Hassabis and Lee Sedol,
who were both respectful and humble both in
victory, and in defeat. They are true champions
of their craft.
Thanks so much for DeepMind for creating
this rivetingly awesome event.
My daughter, Jázmin was born one day before
this glorious day. What an exciting time to
be alive!
Thanks for watching and for your generous
support, and I'll see you next time!
