WINNER:
Yes! Beat you again, sucker!
[WINNER LAUGHS]
SIRAJ:
One day I'm going to make a bot that beats you in any game!
Keep telling yourself that, Siraj.
Hello world, it's Siraj,
and let's make an amazing video game bot in just ten lines of code
that can play a huge variety of games.
Video games have been around since the 50's,
when Joseph Kates publicly demoed Tic-Tac-Toe
at the Canadian National Exposition.
That bot used simple scripted actions
that ran the same way every time
regardless of whatever move the player made.
His demo got people hyped, though,
because no one had ever seen a computer play a game before
and they were lining up off the block to check it out.
The game bots that were invented afterwards for games like Nim
and Spacewar were similar
but along came Polly. I-I mean Pong
The Pong bot's paddle had to make decisions based on the human player's
actions, and that made it feel more realistic.
Pong marked the beginning of using heuristics to create game bots.
Heuristics are educated guesses and
pretty much every single video game bot since Pong's has used them.
A bot will map out a possible set of decisions
as a tree of possibilities,
then use one of many techniques to pick the best one.
But as cool as that sounds, it's still always boiled down to a bunch of
if-then statements.
If Pac-Man moves this way, then the blue ghost should move this way.
If Master Chief sees a Grunt, then it should move in circles like my Facebook news feed.
If Captain Falcon is being annoying AF
Then your team bots should help you pwn him.
Squad Goals
But yeah, video game bots have pretty much always sucked because
there are only so many edge cases that a programmer can predict
like, if the human in Fallout 3 has a pistol AND isn't moving
AND there are no enemies nearby,
run into each other.
[sigh]
We need to think about this problem differently.
When you or I start playing a game,
we don't know anything about its environment beforehand.
The hallmark of intelligence is our ability to generalize,
but can we make artificial intelligence that can generalize
to solve any task?
A team of researches at DeepMind recently got close
by creating one bot that could beat almost any Atari game
knowing literally nothing about the game beforehand.
No game-specific hard-coded rules at all.
It was just fed the raw pixels of the game
and its controls.
Using those two things, it learned how to beat almost any Atari game it was given.
It did this using a technology called "deep learning."
If you take a deep neural network, and feed it lots of data and compute,
it can learn to do a whole lot of incredible things.
The field of deep learning right now is where physics was in the early 1900's.
The state of the art in a huge number of subfields
like vision and speech
is being broken almost every other day.
It's a very exciting time right now.
The Marie Curies and Albert Einsteins of computer science
are all alive right now, and newcomers are coming in every day.
DeepMind is awesome, and they keep a good chunk of their code
private, since Google uses it to outperform its competitors.
But then Elon Musk came along and was all like,
ELON MUSK:
I think it's important that if we have this incredible power of AI
that it not be concentrated in the hands of a few.
SIRAJ:
And so he cofounded a nonprofit called OpenAI
whose goal is to democratize AI so anyone can use it.
And just today they released something called Universe.
Universe is a platform that lets you build a bot
and test it out in thousands of different environments
from games as simple as Space Invaders,
to Grand Theft Auto,
to protein-folding simulations that could cure cancer.
You can create a bot, and the better you make it,
the more games it'll learn to become amazing at.
You can compete with other bot developers, to see whose bot beats the most games
and Universe has other environments, too, or web interface tasks like
managing emails and booking flights.
If you create a bot that's able to defeat any environment,
you're not only the dopest coder of all time,
you just solved intelligence.
We could then use your bot to solve literally everything
from global warming to poverty to all known diseases.
So with that, let's create our first simple bot
in just ten lines of Python code.
In our first two lines of code, we'll import gym and universe.
gym is OpenAI's original codebase
that Universe builds on and extends
to include way more environments and features.
Those are the only two dependencies we'll need.
Now, we can select our environment.
We'll define an environment variable called "env,"
and use gym's make() method to define our environment parameter.
There's so many to choose from, it's hard to pick, but
let's go ahead and pick the popular Flash game
Coaster Racer.
Universe lets us run as many environments at the same time as we want.
but for now, let's just use one.
Our next step is to initialize our environment
with the reset() method.
It'll return a list of what we call "observations"
for every environment we've initialized.
An observation is an environment-specific object
that represents what the agent observes,
like pixel data of what it sees and the state of the game.
Initially, we'll just have an empty set of observations
since the game hasn't started yet.
Now that we've initialized our environment, let's go ahead and create a while statement
so our agent will just keep running indefinately.
We're just going to have our bot do one simple thing.
It's going to hit the up arrow
[REPEATED BUTTON PRESSING SOUNDS]
This is formatted by first specifying the type of event
the key, then true,
which means "press it,"
and we'll do this for each environment's observation.
We'll call this an "action" and store it in our action variable.
Now we'll call our environment step method
to move forward one time step
and use the action as a parameter.
This is our implementation of reinforcement learning.
Our bot will take an action, in our case pushing the up arrow
then it'll observe the result, and may or may not receive a reward
if that action was beneficial to its goal,
which in our case is increasing the game score.
OpenAI uses a custom image recognition module here
to read the game score in order to return a reward.
This module is included in the environment, so we don't need to worry about it.
If it does receive a reward, we can update our bot to do similar actions in the future
so it gets better over time through trial and error.
So the step method returns four variables:
an observation of the environment,
a reward,
a yes or no value if the game is done,
and some info like performance timings and latencies for debugging,
and it'll do this for all the environments you've trained your bot in
simultaneously.
Lastly, we'll render the environment so it's visible to us.
Let's demo this baby.
I'll run the code in terminal, and it'll connect to our
VNC server in our local Docker container,
running a Flash enabled Chrome browser.
The pre-scripted mouse will click through the necessary screens
to get the game started.
then our bot will start programmatically controlling the game remotely.
Yeah, our bot really sucks, but how dope is this?
We can do this for as many games as we like
and to make it better, we can try different strategies like
random search, or hill climbing,
or just replicate what DeepMind did.
They fed the observations that their bot received
into a neural network
that updated its connections to get better if it received a reward.
OpenAI already has a starter bot that uses
deep reinforcement learning
via TensorFlow that I'll put a link to in the description.
And so, to break it down, OpenAI's Universe
is a platform that lets you train
and test bots for thousands of games
and other environments.
Reinforcement learning is the process of using trial and error,
similar to how we learn, to improve a bot.
and if you create one bot that can succeed in any environment it's given,
you've just solved intelligence.
The coding challenge for this video is to create a bot
for just Coaster Racer
that is better than this video's demo code.
Post your GitHub link in the comments
and I'll give a shoutout to the winner in my video
one week from today.
and
I'll do a one-on-one Google Hangout with them just to say hi and
talk about whatever.
For now, I've got to make a laundry folding robot,
so thanks for watching
