Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
After defeating pretty much every highly ranked
professional player in the game of Go, Google
DeepMind now ventured into the realm of Chess.
They recently challenged not the best humans,
no-no-no, that was long ago.
They challenged Stockfish, the best computer
chess engine in existence in quite possibly
the most exciting chess-related event since
Kasparov's matches against Deep Blue.
I will note that I was told by DeepMind that
this is the preliminary version of the paper,
so now we shall have an initial look, and
perhaps make a part 2 video with the newer
results when the final paper drops.
AlphaZero is based on a neural network and
reinforcement learning and is trained entirely
through self-play after being given the rules
of the game.
It is not to be confused with AlphaGo Zero
that played Go.
It is also noted that this is not simply AlphaGo
Zero applied to chess.
This is a new variant of the algorithm.
The differences include:
- one, the rules of chess are asymmetric,
for instance pawns only move forward, castling
is different on kingside and queenside, and
this means that neural network-based techniques
are less effective at it.
- two, the algorithm not only has to predict
a binary win or loss probability when given
a move, but draws are also a possibility and
that is to be taken into consideration.
Sometimes a draw is the best we can do, actually.
There are many more changes to the previous
incarnation of the algorithm, please make
sure to have a look at the paper for details.
Before we start with the results and more
details, a word on Elo ratings for perspective.
The Elo rating is a number that measures the
relative skill level of a player.
Currently, the human player with the highest
Elo rating, Magnus Carlssen is hovering around
2800.
This man played chess blindfolded against
10 opponents simultaneously in Vienna a couple
years ago and won most of these games.
That's how good he is.
And Stockfish is one of the best current chess
engines, with Elo rating over 3300.
A difference of 500 Elo points means that
if it were to play against Magnus Carlssen,
it would be expected to win at least 95 games
out of a 100.
Though it is noted that there is a rule suggesting
a hard cutoff at around a 400 point difference.
The two algorithms then played each other.
AlphaZero versus Stockfish.
They were both given 60 seconds of thinking
time per move, which is considered to be plenty
given that both of the algorithms take around
10 seconds at most per move.
And here are the results.
AlphaZero was able to outperform Stockfish
in about 4 hours of learning from scratch.
They played a 100 games - AlphaZero won 28
times, drew 72 times and never lost to Stockfish.
Holy mother of papers, do you hear that?
Stockfish is already unfathomably powerful
compared to even the best human prodigies,
and AlphaZero basically crushed it after four
hours of self-play.
And, it was run with a similar hardware as
AlphaGo Zero, one machine with 4 Tensor Processing
Units.
This is hardly commodity hardware, but given
the trajectory of the improvements we've seen
lately, it might very well be in a couple
of years.
Note that Stockfish does not use machine learning
and is a handcrafted algorithm.
People like to refer to computer opponents
in computer games as AI, but it is not doing
any sort of learning.
So, you know what the best part is?
AlphaZero is a much more general algorithm
that can also play Shogi on an extremely high
level, which is also referred to as Japanese
chess.
And this is one of the most interesting points
- AlphaZero would be highly useful even it
if were slightly weaker than Stockfish, because
it is built on more general learning algorithms
that can be reused for other tasks without
investing significant human effort.
But in fact, it is more general, and it also
crushes Stockfish.
With every paper from DeepMind, the algorithm
becomes better AND more and more general.
I can tell you, this is very, very rarely
the case.
Total insanity.
Two more interesting tidbits about the paper:
one, all the domain knowledge the algorithm
is given is stated precisely for clarity.
two, one might think that as computers and
processing power increases over time, all
we have to do is add more brute force to the
algorithm and just evaluate more positions.
If you think this is the case, have a look
at this - it is noted that AlphaZero was able
to reliably defeat Stockfish WHILE evaluating
ten times fewer positions per second.
Maybe we could call this the AI equivalent
of intuition, in other words, being able to
identify a small number of promising moves
and focusing on them.
Chills run down my spine as I read this paper.
Being a researcher is the best job in the
world.
And we are even being paid for this.
Unreal.
This is a hot paper, there is lot of discussions
out there on this, lots of chess experts analyze
and try to make sense of the games.
I had a ton of fun reading and watching through
some of these, as always, Two Minute Papers
encourages you to explore and read more, and
the video description is ample in useful materials.
You will find videos with some really cool
analysis from Grandmaster Daniel King, International
Chess Master Daniel Rensch, and the YouTube
channel ChessNetwork.
All quality materials.
And, if you have enjoyed this episode and
you think that 8 of these videos a month is
worth a few dollars, please throw a coin our
way on Patreon, or, if you favor cryptocurrencies
instead, you can throw Bitcoin or Ethereum
our way.
You support has been amazing as always and
thanks so much for keeping with us through
thick and thin, even in times when weird Patreon
decisions happen.
Luckily, this last one has been reverted.
I am honored to have supporters like you Fellow
Scholars.
Thanks for watching and for your generous
support, and I'll see you next time!
