Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér.
The paper that we are going to cover today
in my view, is one of the more important things
that happened in AI research lately.
In the last few years, we have seen DeepMind’s
AI defeat the best Go players in the world,
and after OpenAI’s venture in the game of
DOTA2, DeepMind embarked on a journey to defeat
pro players in Starcraft 2, a real-time strategy
game.
This is a game that requires a great deal
of mechanical skill, split-second decision
making and we have imperfect information as
we only see what our units can see.
A nightmare situation for any AI.
The previous version of AlphaStar we covered
in this series was able to beat at least mid-grandmaster
level players, which is truly remarkable,
but, as with every project of this complexity,
there were limitations and caveats.
In our earlier video, the paper was still
pending, and now, it has finally appeared,
so my sleepless nights have officially ended,
at least for this work, and now, we can look
into some more results.
One of the limitations of the earlier version
was that DeepMind needed to further tune some
of the parameters and rules to make sure that
the AI and the players play on an even footing.
For instance, the camera movement and the
number of actions the AI can make per minute
has been limited some more and are now more
human-like.
TLO, a professional StarCraft 2 player noted
that this time around, it indeed felt very
much like playing another human player.
The second limitation was that the AI was
only able to play Protoss, which is one of
the three races available in the game.
This new version can now play all three races,
and here you see its MMR ratings, a number
that describes the skill level of the AI,
and for non-experts, win percentages for each
individual race.
As you see, it is still the best with Protoss,
however, all three races are well over the
99% winrate mark.
Absolutely amazing.
In this version, there is also more emphasis
on self-play, and the goal is to create a
learning algorithm that is able to learn how
to play really well by playing against previous
versions of itself millions and millions of
times.
This is, again, one of those curious cases
where the agents train against themselves
in a simulated world, and then, when the final
AI was deployed on the official game servers,
it played against human players for the very
first time.
I promise to tell you about the results in
a moment, but for now, please note that relying
more on self-play is extremely difficult.
Let me explain why.
Self-play agents have the well-known drawback
of forgetting, which means that as they improve,
they might forget how to win against a previous
version of themselves.
Since StarCraft 2 is designed in a way that
every unit and strategy has an antidote, we
have a rock-paper-scissors kind of situation
where the agent plays rock all the time because
it encountered a lot of scissors lately.
Then, when a lot of papers appear, it will
start playing scissors more often, and completely
forget about the olden times when the rock
was all the rage.
And, on and on this circle goes without any
real learning or progress.
This doesn’t just lead to suboptimal results
- this leads to disastrously bad learning,
if any learning at all.
But it gets even worse.
This situation opens up the possibility for
an exploiter to take advantage of this information
and easily beat these agents.
In concrete StarCraft terms, such an exploit
could be trying to defeat the AlphaStar AI
early by rushing it with workers and warping
in photon cannons to their base.
This strategy is also known as a cannon rush,
and as you can see here the red agent performing
this, it can quickly defeat the unsuspecting
blue opponent.
So, how do we defend against such exploits?
DeepMind used a clever idea here, by trying
to turn the whole thing around and use these
exploits to its advantage.
How?
Well, they proposed a novel self-play method
where they additionally insert these exploiter
AIs to expose the main AI’s flaws and create
an overall, more knowledgeable and robust
agent.
So, how did it go?
Well, as a result, you can see how the green
agent has learned to adapt to this by pulling
its worker line and successfully defended
the cannon rush of the red AI.
This is proper machine learning progress happening
right before our eyes.
Glorious!
This is just one example of using exploiters
to create a better main AI, but the training
process continually creates newer and newer
kinds of exploiters, for instance, you will
see in a moment that it later came up with
a nasty strategy including attacking the main
base with cloaking units.
One of the coolest parts of the work, in my
opinion, is that this kind of exploitation
is a general concept that will surely come
useful for completely different test domains
as well.
We noted earlier that it finally started playing
humans for the first time on the official
servers.
So, how did that go?
In my opinion, given the difficulty and the
vast search space we have in StarCraft 2,
creating a self-learning AI that has the skills
of an amateur player is already incredible.
But that’s not what happened.
Hold on to your papers, because it quickly
reached grandmaster level with all three races
and ranked above 99.8% of the officially ranked
human players.
Bravo, DeepMind.
Stunning work.
Later, it also played Serral, a decorated,
world champion Zerg player, one of the most
dominant players of our time.
I will not spoil the results, especially given
there were limitations as Serral wasn’t
playing on his equipment, but I will note
that Artosis, a well-known and beloved Starcraft
player and commentator analyzed these matches
and said “The results are so impressive
and I really feel like we can learn a lot
from it.
I would be surprised if a non-human entity
could get this good and there was nothing
to learn”.
His commentary is excellent and is tailored
towards people who don’t know anything about
the game.
He’ll often pause the game and slowly explain
what is going on.
In these matches, I loved the fact that so
many times it makes so many plays that we
consider to be very poor and somehow, overall,
it still plays outrageously well.
It has unit compositions that nobody in their
right minds would play.
It is kind of like a drunken kung fu master,
but in StarCraft 2.
Love it.
But no more spoilers - I think you should
really watch these matches and, of course,
I put a link to his analysis videos in the
video description.
Even though both this video and the paper
appears to be laser focused on playing StarCraft
2, it is of utmost importance to note that
this is still just a testbed to demonstrate
the learning capabilities of this AI.
As amazing as it sounds, DeepMind wasn’t
just looking to spend millions and millions
of dollars on research to just play video
games.
The building blocks of AlphaStar are meant
to be reasonably general, which means that
parts of this AI can be reused for other things,
for instance, Demis Hassabis mentioned weather
prediction and climate modeling as examples.
If you take only one thought from this video,
let it be this one.
There is really so much to talk about, so
make sure to head over to the video description,
watch the matches and check out the paper
as well.
The evaluation section is as detailed as it
can possibly get.
What a time to 
be alive!
Thanks for watching and for your generous
support, and I'll see you next time!
