Hello World its Siraj and today, we're Gonna Talk about Deep Mind's
Starcraft 2 Ai
Environment so Recently There Released This
Starcraft 2 Environment That Lets You Train
Reinforcement Learning Models and That's what you're Seeing right now it's a demo of it Happening but Basically you Can use
Starcraft 2 The Game as A testbed to train and run your ai Models on and These can be
Reinforcement Learning Models That Can Be Deep Learning Models that can be Really Anything They Can
Just Be scripts that Are that aren't even Machine Learning Just like Hard Coded Bots Whatever
But the point Is that it's it's meant to be A testbed for people to train and Test Their
Ai Models on so it's a really exciting Time right now for Deep Reinforcement Learning as A field because
Open Ai
Beat Dota 2
Recently in The World Champion They Beat The World Champion at Dota 2 and then After That Deep Mine
Released his Starcraft 2
Ai Environment so there's A lot of exciting things Happening right Now in Deep Reinforcement Learning There's A lot of Low-Hanging
Fruit in This
Section of Machine Machine Learning as Opposed to supervised Learning where A lot of it has Been Solved more Or less right Gradient Based Optimization
You Compute the Gradient and then you Update your weights and you know your labels
It's very you know it's Been Tried and Done Before but for Deep
Reinforcement Learning There's A lot of unanswered Questions so it's a really exciting Time right now what
We're Gonna do Is we're Gonna we're going to run A
Pre Trained Model and What I'm gonna do Is i'm going to on my machine Set Up and install all the Required Dependencies and
The script Everything you need Basically to go from Zero to Having Starcraft 2 Running on your Computer with Deep Mines
Environment Installed and
Model A pre Trained Model Running, okay
So what the what the Model is Is it's Called a Deep Q learner I'll talk about what that Is but it's a Deep Q
Learner and it's going to be Running on the Collect mineral Shards Midi Game of Starcraft 2
Which Means
It's just A bot that's collecting Little
Trinkets Called Mineral Shards and It'll do This Autonomously Without you Needing to do Anything and so
From There You could you know Modify it Or run your Own
Algorithms but Once You have something Set up it'll be A lot Easier to to get Into
the
The Bottom Things and what am I doing with my hands okay so let's get to this for a second
Okay, so first things first what is a history here so deep Mines first attempt at Running Game Simulations Came
For Atari Games Right They that's what Google Bought them they They created an Algorithm
Call the Deep Queue, Learner Which is it which is the Algorithm that
We're Running and They use that Algorithm to be Any Atari Game and the way they
Did This Is they Combine Two Different
Ideas in Machine Learning They Combined the Idea of Deep Learning Which is all about Learning Features you
Don't have to Engineer what Those Features Are like I'm Looking for A dog that has Long Ears and it has Brown fur
No it will learn what the Necessary Features are to map what it Sees to some Label and so they Use A
Convolutional Neural Network for This to
To create Features Learned dense Representations from Game Screens right so all it got
Were the Pixels of the game the Pixels and then it learned dense
Representations From Those Pixels and Then it Come and then it Converted what it saw Into an Output and That Output
Was an Up down Left, or right value right
Anything that you Can Use on a joystick for an Atari Game and
The way it did this is it Didn't Just Take in The input from the what I saw in The game it
Also used What's called cue Learning?
It's cue Learning Is a type of reinforcement learning Where We Initialize what's Known as a cue Matrix and the Q Matrix has A
Collection of Possible Actions That an agent Can Take in A game and all these Actions are weighted Like this Is an, ok
Action This is a better Action
This Action Could Be the Best Action and what it does is it picks an Action from the Q Matrix
Using some Strategy that you Decide it could be Random it Could be Based on some pre Weighted
Value like an Epsilon Whatever but you pick an Action from the Q Matrix
You Perform it in the game you observe what's Happening
And then you See if You got a reward or not right A plus 1 a minus 1 and then based on that Reward
You'll Update the Q Matrix so that the Actions are all
Way Differently right and so the idea Is that eventually the Q Matrix
Will Have the Best Actions for you to perform at
Whatever Time step You're in Right the Q Matrix Acts as A weight similar to how in A neural Network the weights
Improve Over Time in Reinforcement Learning in Q learning the Q Matrix Improves, over Time so they combined Both of Those ideas
Together Right so Deep Q Learning and the way it does this Is and I've got a little bit of Pseudocode
Here For How it Works and This is the Full Pseudocode
For How Their Algorithm Works Now Keep in Mind that Their algorithm wasn't just a simple
You Know Take A convolutional Network and Then run to Learning on Top of it it's, also got
Two Different
Features From Neuroscience That The First one Is Called Replay Memory and Replay Memory is Essentially A
Buffer in Memory That's A Temporary Buffer that Stores
States Actions and Rewards
Basically your Experience of what what what has Happened in The Game and This Basically This?
Improved Their ai When They use this concept of Replay memory that acted as A
Temporary Buffer That They Could Pull Actions from as, well as the Q Matrix
so if We look at this pseudocode We'll see That first it Initializes some Replay Memory Matrix and
Initializes A Q Matrix
Randomly and it Observes The Initial State of the Game where are We in the Game and that would be the the
Collection of Pixels that it first Sees and then it runs the Training Loop so in the Training loop it will select an Action from
The Q Matrix Either Randomly By using some Probability Value Epsilon or bike
Selecting The Optimal Q Value From the Queue the the optimal Action from the Q Matrix
That it Sees and then it will Execute That Action and in Open A eyes
Universe Environment They Call this the step right the Environment step Function so it'll execute The Action and then It'll Observe the reward that it
Receives and it will Store that the this the state The reward the Old State and The Action all of Those that that
Those Four values Into Replay Memory and then the next step Is for it to
Compute A loss Function and We can See the loss Function here it's, also called the Bellman Equation?
Okay This is also called the Bellman Equation not to get you too Confused but that's Just what it's Called as
Well but Basically it will Randomly Sample From the Replay Memory and then it will use a sample that it that it retrieved
To Compute A, loss Function and then it will Minimize the Square of That loss Function at every Iteration and
As A loss Function is Minimized the Q Matrix Is values are Improved at
Every Time Step Which Means That eventually Every Time the agent Pulls an Action from the Q Matrix it's gonna be More and More
Optimal Such That it's going to Minimize A, loss which will Maximize the reward it Receives I notice that's quite a Mouthful to
You know Say in 30 Seconds B but that's that's how the Deep Q learner Works at
A high level the next Step for them
Was to?
Try it out on go, that the ancient Chinese Game
of go so A Lot of Ai Experts Experts Said that it Would Take 10 Years 20 Years 30 Years
For an Ai to be able to beat the game of go because There were so many
Possibilities Right There's so Many Possibilities and The search Space Is
Far Too vast for an ai to just Brute-Force Through all the Options
There are so Many Different Combinations of Game States that it's Just Too hard for an ai to
Compute With our With the limits of Computing Power that, We have now that was Their Thinking but They?
Were wrong because They're Always wrong when it Comes to Deep Learning and all These new Technologies
But Basically Alphago was Their attempt Their Successful attempt at beating the game of go and
They Used They used Two Different Neural Networks here They Actually used Three
They use Three
Different Neural Networks They use Three Different Neural Networks here one was for the Policy Network and One was for the value Network and
Both of These Computers Two Different
Values One is the Policy and then the next is the Value and it used Both the Policy and the Value
To Help it Help guide what is Essentially A?
Gigantic Tree Search and The Tree search Is called a monte Carlo tree search and here is a Brief
Description of How it Works but Basically the Monte Carlo Tree search or Mc tst
Simulates An A search Tree and the ai selects an Action at each Time Step
Based on The Action Value and Prior Probability Which is the Output of the Policy Network and
Some Exploration Parameter so it uses the values of the Policy Network and The value Network as Guides to help it search
The Tree of Possible Moves that it Can Play at every Time Step and They trained Alphago on
Tens of Thousands of Hours of Expert Gameplay and then They
gave it
They gave Alphago to the World Champion Lisa dole and it Beat Lisa dole
So beating the game of go
Was a much Harder challenge Than beating the 20 Different Atari games given Just the Pixel values of the game Screen right
But now More Recently They Decided Let's let's up the Ante Even More right and They Decided Let's not Just do this alone
Let's let's Open Source this so that everybody Gets to use this Technology right so Starcraft
Starcraft Is Arguably one of if not the greatest Pc game of all Times Pc. Fanboys come at me but Anyway
Starcraft Is One of the best Pc games of all Time and it's got Hundreds of
Thousands of Players Across The World it's got People, who's a day, Job is to just play Starcraft all day
Competitively Right in South Korea Specifically Much Close to South Korea if You're out There but Anyway so
so
Starcraft 2 Huge awesome Game if You've Never Played it before then this is a great Opportunity to download it it's free
I'll Show you a little bit About that in A second
And if You Have Played it before this is a great way to help Improve your Own Strategy right because When
you're Building an Ai for Starcraft 2
You're Thinking About all the things that Requires to be a good Starcraft Player right when you're Thinking about
When you Should Spend your wealth how you Should build your army what you Should invest your resources?
and Time and Energy and all of These things you're Gonna Try, to want to Replicate in an ai that you built and
So if You Think About an Ai for Starcraft
There it has to be able to do a bunch of things that are Quite difficult first of all it's got to Have an Effective
Use of Memory right it's got to be able to remember not Just the things that Have that have happened in The Short-Term
It's got to be able to remember the things that happened Back in The Long-Term in the Past and not Just in the Past
It's got to be able to Plan over A long period of Time
Sometimes You want to Make Decisions That Help Maximize your current value right you you
Want to kill an enemy Because the enemy Is next to your you know vulnerable Troops but but other Times you?
Want to Make an Action that Is not as Intuitive in The Short Term but in The Long Term it Is right like Sometimes you?
Want to Spend A lot of Money on some Resource and so you're Gonna Have a little Money right now but in the Long Term
That Resource that you Purchased is Gonna Help you a lot more right so it's not as
It's not as you know Obviously Intuitive as something like an Atari Game Where it's Just like all you have to do Is
Defeat you know Get from Point A to point B or
Remove Some Block Or kill all the aliens on the Screen it's not that simple right
and even As something as even Some Tasks as Simple as Expand your base to some Location Is
Actually Pretty Complicated you hope you have to coordinate Mouse Clicks your Camera Available resources and what this does is it Makes
Actions and Planning
Hierarchical and This is Generally very Hard for
Reinforcement Learning algorithms to Grasp the concept of Hierarchy is Quite Hard for reinforcement Learning algorithms to grasp
Right Because you're Performing an Action and you're Receiving A reward right so there's the agent Environment loop
It's not like A Deep Learning Where
We have all of These layers and There's all this Structure that's built over Time so Deep Queue Learning
Was One Good example of Having A?
Hierarchical Structure A, hierarchical Model in A reinforcement Learning Environment and I think it was One of the first but
We're Gonna See A lot more and A lot of the key
Discoveries That Are Gonna Come out of the Field the entire Field of machine Learning
This Year and next Year Are going to come from Deep Reinforcement learning When some Really smart People
Combine The Ideas That Come From Deep Learning Mainly
Hierarchical Learning and The Ideas of Reinforcement Learning from Learning from an Environment in Real Time
Andre Karpati Had a recent Talk at y Combinator I think it was called why?
Conf Where he Said that Ag I artificial general Intelligence Is going to result from Having
Simulations Right from From Creating an ai That Can adapt in A simulation similar to how
We adapt in The real World and this is a Simulation?
Enter
Twilight Zone Music but Anyway
So reinforcement Learning Deep Reinforcement Learning super hot Field and This is your chance to get Into it right you don't have to Work at
Deep mind you, don't have to Work at open Ai you can, Just be some Kid who has a Time and
Energy to Work on this Stuff and if You have internet Access
And you have the Time to Work on this you too Can Make an Amazing
Algorithm and you Post it on github you post it on Hacker News on the Machine-Learning Subreddit you'll get great Feedback you can
Join some Online Research Groups on A slack Channel or on several of the Forums online and you can Just do great Work and all?
of this Can be Added to your Portfolio your Github your resume for
Future Prospects Whether That Be Studying at a University or working at
One of These Fields but The point Is in order to get Anywhere You've got, you've got to do something right and
Starcraft 2 is a great Test Fit it's a great Set of tools I've Tested it out Myself and
I think it's it's it's it's a really, great Place to get Started With Deep Reinforcement Learning, ok so
Ok, now so on to the Code right so
The Basically so it was a joint Collaboration With Blizzard so blizzard Already Released an Api That Lets A user create scripted Bots
Machine Learning Based Bots That Are Running from Pickle Files you
Know pre Training Models Replay Analysis and Tool Assisted Human Play and so Deepmind's
Environment its Repository is Called pi Sc 2 Pi Starcraft 2
So it's all in Python
Thank the Pods Right it's it's in Python but it has Four Components To it the first is the Api that it wraps
From Blizzard in Python the next Is a Data Set of Anonymized Game Replays
okay
So it's got a lot of These anonymized Game Replays that you Can Download from right here I'll go Through that in A second
And it's got a A
Series of Simple Rl Minigames One of them Is what we're Going to do for this Demo and
To test out Different
Environments Right to test out the Different algorithms I mean so this was Blizzards Initial Api and Then Deepmind Wrapped it with Their Own
Python
Repository Right so what we're Gonna do Is we're going to just Set up everything right so i'm Gonna go Through These
Installations That There Are seven Steps here and I'm Gonna go Through all of Them to get Started, okay?
So first of all before you do Anything you've got to download
Starcraft To the Blizzard Client it's free you just Sign up on Blizzard and you you select the Starter Edition right that's it just to
Let The Starter Edition and it's free and you Can Just download that and then you Can play it just like that it'll Take you
Know
Depends on your Bandwidth but It took me About an Hour to download and Set UP and Already I was Running Through the tutorial in
Starcraft 2 Right Just an Hour for me so Definitely Download that and
Once You've Downloaded Starcraft 2 Then Move on to These seven Steps so the first one is to install Pi
Sc2 right and so Luckily They Have Wrapped
It Into A
Nice Little Python Library for Us so I can go ahead and install it using pip so I'll Say Sudo pip 3
Install pi Sc 2
ok
and
It's downloading
And it's got it's Building off of all those Dependencies that right it's it's it's built on Top of
The Blizzard Api that I just Talked about and
and some other Things
Ok so now
That was Step one We've Installed Pi?
Se2 and so the next Step is to install the sample Code so the sample Code you Can just Clone it Directly from my github
Just like git Clone and it's Gonna download it Just
Like That right and the Sample Code Contains the Pre-Trained Model and it Contains all the Python Files you need to run this very Simple
Reinforcement Learning Bot and
So Once You've got that then that's step Two and then
Step Three is to download the mini Games From the Starcraft 2. Maps right so, we Can click on this Link and Just Like that
Here Are our Maps our Minigame Maps okay and so we Can move all of These Into our?
Folder for Starcraft 2 and it's got to be in the Maps Folder right
Here We go in Maps so I'll just Copy and paste it just like that so now my Starcraft 2
Application That I've downloaded with the blizzard Client has These Maps and so Once I have These Maps
Now i Can install Tensorflow and Open Ai Baselines, oh
ok, so
So right if You, don't have Tensorflow you Can Install Tensorflow With pip 3 Or pip Install you can
Install Tensorflow, I've Already got Tensorflow and then Once You've got
Tensorflow so Tensorflow Is to be able to train and run These machine Learning Models and then you have to install open Ai
Baselines so Baselines Is A collection of high Quality
Reinforcement Learning Algorithms Right the Deep Q Network is one of them that
We're Going to be using and it's got Policy Gradients Which Is Another Popular
Reinforcement Learning Algorithm Basically it's your way of being able to implement These Reinforcement Learning
Algorithms Without Having to Code Them from up from scratch and then you can
Modify Existing Ones Tweak Them to See if you can get better results, so it's a good way
To you know Test around With some Tests some Different effects okay so then I Can download the baselines Environment
and
Once I've downloaded Baselines I've put my folder I put my maps Into my
Starcraft 2 Folder Then I can go Ahead and Open The Project With
Intellij so the Reason I say Use Intellij for this and not Just open it With
Sublime or you know some Regular Text Editor is because this, oh hold on
Permission Error of Course, We got to do Sudo of Course There, we go so the Reason I Say
Intellij is Because There's some Logs that that are Really nice to to view When it Comes to
How your agent is running Which is Easy to do With Intellij so if You've Never used A
An Ide in the Integrated development Environment like Intellij this is a great
This is a great Reason to do it so go ahead and download Intellij you Can download from here There is a paid Version
But There's, also A free Version the Community Version get the Community Version so you
Don't have to get the Paid Version and then if You want to train the Model right from scratch
Then You Can run Python 3 Train Mineral Shards Up pie but what
We're Gonna do Is we're Gonna Run A pre trained Model so that we don't
Have to train it from Scratch, We just want to?
We just want to get something to Work right so let's go Ahead and Just Open Up that Model
Okay, so intellij is Opening right Now and
Once it's open I can go Ahead and Open My Project, so i'm going to import a project and
Where is my Project so my Project Is
in downloads pi Sc - so I'll go to downloads
Pi I see two examples
Alright and then I'll open it
From Existing, sources Finish
you Can install Plugins to support Python
Which I'll, do right Now so Clearly I'm going to I'm doing all the steps as if I've Never
Done Any of This Before and Then I'm going to restart it to Initialize the Python Plugin
okay
All right so now I've got the the Code in Important Into Intellij and it's
Detected a Python Framework and I'm gonna Say ok to this and
Then
And There We have it and now the Client is running in the Background?
It's it's it's
Executed Starcraft 2 its Detected that my system has Starcraft 2 Installed and then it's run this script to
Run A pre Trained Model Inside of the Starcraft 2 Environment so it Acts so it's able to Access
The Starcraft 2 Game Because the Deep Because Deep Mines
Pi Sc2
Repository under The Hood it's using blizzards Api it's but it's a local Api
So it's not like it's Connecting, some more Remotely it's Connecting to the game That's right on your Desktop or your Laptop
in Terms of the Code the
The Model and The Environment are built from us so in This in this Code it's got the Deep
Convolutional Network Right here as you can See the Parameters Are here the Number of Hidden Layers are Here and then it's got the
And then it's got the it's Wrapped that open Ai
Environment That's Step Function right so it's it's giving it it's given it These Parameters and it's combined Both the
Convolutional Network and The Q Network Together to then
Train This ai and it saves it as a pickle file it saves the Pre-Trained Model as a pickle File and then Once it's
Trained The Pickle File, We can
We Can Access that Pickle file to run the Pre-Trained Model in The Deepmind Environment in the Starcraft 2 Environment
And if We look at this code it's Actually you know it's quite A lot of Code and I Can Make a Different Video to
Talk About How all the Code Works
But right Now I just Wanted to Help you install and Configure This script so that you Can run it Yourself
Don't be Afraid to run it it's Actually Pretty Easy and all in All including downloading Starcraft 2
Downloading the Code Installing all your, Dependencies It'll Take Probably an Hour and a half to go from Zero to running your Own
Rl Algorithms in This Game
Ok so i hope that I hope that helped Please Subscribe for more Programming Videos and for now I'm Gonna Play some Starcraft 2 so
Thanks for Watching
