Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
Some papers come with an intense media campaign
and a lot of nice videos, and some other amazing
papers are at the risk of slipping under the
radar because of the lack of such a media
presence. This new work from DeepMind is indeed
absolutely amazing, you’ll see in a moment
why, and is not really talked about. So in
this video, let’s try to reward such a work.
In many episodes, you get ice cream for your
eyes, but today, you get ice cream for your
mind. Buckle up.
In the last few years, we have seen DeepMind’s
AI defeat the best Go players in the world,
and after OpenAI’s venture in the game of
DOTA2, DeepMind embarked on a journey to defeat
pro players in Starcraft 2, a real-time strategy
game. This is a game that requires a great
deal of mechanical skill, split-second decision
making and we have imperfect information as
we only see what our units can see. A nightmare
situation for any AI. You see some footage
of its previous games here on the screen.
And, in my opinion, people seem to pay too
much attention to how good a given algorithm
performs, and too little to how general it
is. Let me explain.
DeepMind has developed a new technique that
tries to rely more on its predictions of the
future, and generalizes to many many more
games than previous techniques. This includes
AlphaZero, a previous technique also from
them that was able to play Go, Chess, and
Japanese Chess or Shogi as well and beat any
human player at these games confidently.
This new method is so general, that it does
as well as AlphaZero at these games, however,
it can also play a wide variety of Atari games
as well. And that is the key here: writing
an algorithm that plays chess well has been
a possibility for decades. For instance, if
you wish to know more, make sure to check
out Stockfish, which is an incredible open-source
project and a very potent algorithm. However,
Stockfish cannot play anything else - whenever
we look at a new game, we have to derive a
new algorithm that solves it. Not so much
with these learning methods, that can generalize
to a wide variety of games! This is why I
would like to argue that the generalization
capability of these AIs is just as important
as their performance. In other words, if there
were a narrow algorithm that is the best possible
Chess algorithm that ever existed, or a somewhat
below world-champion level AI that can play
any game we can possibly imagine, I would
take the latter in a heartbeat.
Now, speaking about generalization, let’s
see how well it does at these Atari games,
shall we? After 30 minutes of time on each
game, it significantly outperforms humans
on nearly all of these games, the percentages
show you here what kind of outperformance
we are talking about. In many cases, the algorithm
outperforms us several times, and up to several
hundred times. Absolutely incredible.
As you see, it has a more than formidable
score on almost all of these games, and therefore
it generalizes quite well. I’ll tell you
in a moment about the games it falters at,
but for now, let’s compare it to three other
competing algorithms. You see one bold number
per row, which always highlights the best
performing algorithm for your convenience.
The new technique beats the others on about
66% of the games, including the Recurrent
Experience Replay technique, in short, R2D2.
Yes, this is another one of those crazy paper
names. And even when it falls short, it is
typically very close. As a reference, humans
triumphed on less than 10% of the games.
We still have a big fat zero on Pitfall and
Montezuma’s Revenge games. So why is that?
Well, these games require long-term planning,
which is one of the more difficult cases for
reinforcement learning algorithms. In an earlier
episode, we discussed how we can infuse an
AI agent with a curiosity to go out there
and explore some more with success. However,
note that these algorithms are more narrow
than the one we’ve been talking about today.
So there is still plenty of work to be done,
but I hope you see that this is incredibly
nimble progress on AI research. Bravo DeepMind!
What a time to be alive!
This episode has been supported by Linode.
Linode is the world’s largest independent
cloud computing provider. They offer affordable
GPU instances featuring the Quadro RTX 6000
which is tailor-made for AI, scientific computing
and computer graphics projects. Exactly the
kind of works you see here in this series.
If you feel inspired by these works and you
wish to run your experiments or deploy your
already existing works through a simple and
reliable hosting service, make sure to join
over 800,000 other happy customers and choose
Linode. To spin up your own GPU instance and
receive a $20 free credit, visit linode.com/papers or
click the link in the description and use
the promo code “papers20” during signup.
Give it a try today! Our thanks to Linode
for supporting the series and helping us make
better videos for you.
Thanks for watching and for your generous
support, and I'll see you next time!
