Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
After having a look at OpenAI's effort to
master the DOTA 2 game, of course, we all
know that scientists at DeepMind are also
hard at work on an AI that beats the Capture
The Flag game mode in Quake 3.
Quake III Arena is an iconic first person
shooter game and Capture the Flag is a fun
game mode where each team tries to take the
other team's flag and carry it to their own
base while protecting their own.
This game mode requires good aiming skills,
map presence, reading the opponents well,
and tons of strategy.
A nightmare situation for any kind of AI.
Not only that, but in this version, the map
changes from game to game, therefore the AI
has to learn general concepts and be able
to pull them off in a variety of different
previously unseen conditions.
This doesn't seem to be within the realm of
possibilities to pull off.
The minimaps here always show the location
of the players, each are color coded to blue
or red to indicate their teams.
Much like humans, these AI agents learned
by looking at the video output of the game
and have never been told anything about the
game or what the rules are.
These scientists at DeepMind ran a tournament
with 40 human players who were matched up
against these agents randomly, both as opponents
and teammates.
In this tournament, a team of average human
players had a win probability of 43%, where
a team of strong players won slightly more
than half, 52% of their games.
And now hold on to your papers, because the
agents were able to win 74% of their games.
So the difference between the average and
strong human player's winrate is 9%, and the
difference between the strongest humans and
the AI is more than twice that margin, 22%.
This is insanity.
And as you see, it barely matters what the
size or the layout of the map is or how many
teammates there are, the AI's winrate is always
remarkably high.
These agents showcase many humanlike behaviors
such as staying at their own base to defend
it, camping within the opponent's base, or
following teammates.
This builds on a new architecture by the name
For The Win, FTW in short, good work folks.
Instead of training one agent, it uses a population
of agents that train and evolve from each
other to make sure that a diverse set of playstyles
are discovered.
This uses recurrent neural networks, these
are neural network variants that are able
to learn and produce sequences of data.
Here, two of these are used, a fast and a
slow one that operate on different timescales
but share a memory module.
This means that one of them has a very accurate
look at the near past, and the other one has
a more coarse look, but can look back more
into the past in return.
If these two work together correctly, decisions
can be made that are both good locally, at
this point in time, and globally, to maximize
the probability of winning the whole game.
This is really huge because this algorithm
can perform long-term planning, which is one
of the key reasons why many difficult games
and tasks still remain unsolved.
Well, as it seems now, not for long.
An additional challenge is that the game score
is not necessarily subject to maximization
like in most games, but there is a mapping
from the scores into an internal reward, which
means that the algorithm has to be able to
predict its own progress towards winning.
And note that even though Quake 3 and Capture
The Flag is an excellent way to demonstrate
the capabilities of this algorithm, this architecture
can be generalized to other problems.
I am going to give you a few more tidbits
that I have found super interesting, but before,
if you are enjoying this episode and would
like to pick up some cool perks like early
access, deciding the topic of future episodes
or getting your name listed in the video description
as a key supporter, why not support the show
on Patreon?
With this, you also help us make better videos
in the future.
You can find us at Patreon.com/TwoMinutePapers
and we also support Bitcoin and other cryptocurrencies,
the addresses are available in the video description.
And now, onwards to the cool tidbits:
- a human+agent team has been able to defeat
an agent+agent team 5% of the time, indicating
that these AIs are able to coordinate and
play together with anyone they are given.
I get goosebumps from this.
Love it.
- The reaction time and accuracy of the agents
is better than that of humans, but not nearly
perfect as many people would think.
However, they outclass humans even if we artificially
reduce their accuracy and reaction times.
- In another experiment, two agents were paired
up against two professional game tester humans
who could freely communicate and train against
the same agents for 12 hours to see if they
can learn their patterns and force them to
make mistakes.
Even with this, humans had only won 25% of
these games.
Given the other numbers we have, it is very
likely that this unfair advantage made no
difference whatsoever.
How about that.
If there are any more questions, make sure
to have a look at the paper that describes
every possible tidbit you can possibly imagine.
Thanks for watching and for your generous
support, and I'll see you next time!
