Fry: Artificial intelligence
is slowly appearing
in every aspect
of our modern lives.
It’s in our smart phones,
our central heating,
on our sideboards
and in our cars.
But what about artificial
general intelligence?
That is the real quest.
The aim to build an agent -
an algorithm -
that can learn to solve any problem
from scratch without being taught how.
Welcome to Deep Mind - the podcast.
I’m Hannah Fry,
I’m a mathematician who has worked
with algorithms for almost a decade.
In this series of podcasts,
we’re following the fast moving
story of artificial intelligence.
For the past 12 months
we’ve been tracking the latest work
of scientists, researchers and engineers
at DeepMind in London.
We’re looking at how they’re
approaching the science of AI
and some of the tricky decisions
the whole field is wrestling
with at the moment.
So whether you want to know more
about where technology is headed,
or want to be inspired
on your own AI journey,
then you’ve come
to the right place.
Now in the last episode we were talking
about
how pitting artificial intelligence
against world class players
in the game of Chess,
and the game of Go,
is about much more than just
showing off what a computer can do.
Human players can learn
from how the AI plays
and improve their own play as a result.
And there’s also a bigger picture -
the world of games
provides the perfect mini universe
to try out everything we want
our artificial intelligence to do.
But intelligence is much more than
just championing raw logic,
intelligence requires other skills
like the ability to collaborate.
I want to introduce research scientist
Max Jaderburg.
Max and his colleagues are trying
to work out how to train agents
to work together as a team.
02:11
Jaderburg: So imagine
it’s a few decades in the future -
we have all these AI systems out
in the world doing different things
but they maybe have never
seen each other before.
There’s thousands of these things,
hundreds of thousands,
each have their own objectives
but somehow they have to cooperate
and compete in a sensible way
and in a very ad hoc way, in a way that
they’ve never seen each other before.
Fry: Humans are really good at this
when we want to be, anyway.
Even when we haven’t encountered
another person before,
we still know how to understand
their intentions
and how to interact with them.
Our agents of the future need to be able
to do the same thing with each other.
Jaderburg: We already have things
like Google Home
and these sort of
smart devices out there.
We probably have more and more of those
and you can imagine them
having to interact
and work with each other
and one device may not have
ever seen another device before,
but they still somehow have to interact
and get things done for you.
Fry: Are we talking about like
your Google Home
and your dishwasher here -
this kind of stuff or. -
Jaderburg: Yeah, potentially.
You know your dishwasher
might want to actually go
on its cleaning cycle
but Google Home wants it to you know
clean all the dishes and so [laughs]
what’s best for you as a person -
I don’t know?
Fry: And who gets to decide?
Jaderburg: Who gets to decide?
Fry: Who rules supreme - your dishwasher
or your Google assistant?
Jaderburg: yeah. I don’t know.
Fry: There’s an important
distinction here.
If you’ve got a smart light bulb
that you can program to come on
at six o’clock in the evening,
that is an algorithm.
If you’ve got one
that can learn your preferences,
that can understand when you like
the lights to be dimmed,
what kind of mood lighting you like
when you’re reading, that is AI.
But as we switch away from building
things that do rigid, pre-decided tasks,
we are asking our technology
to read the situation and react
to what’s going on around it.
And in the long-term,
that’s going to require collaboration.
So in the spirit of trying things out
in a toy universe,
the team at DeepMind
have been trying to find inspiration
in another kind of game.
One taken straight
from the school playground.
Fry: this is Capture The Flag -
you know the deal -
the first team to steal
the Flag of their opponent
and bring it back
to their own base woods.
You get tagged by the opposition,
then you’re out of the game.
Oh, come on! Don’t cry.
Max dropped whole populations
of AI agents
into a digital version of the game.
Jaderburg: This is an onscreen version,
you sort of you just see
your first person point of view,
so you have to sort of look around
and move through this 3D world
from your own first person perspective,
but interact with these other things
which suit their own
first person perspective.
So here there’s no centralized entity
or being that can see everything.
Fry: No army commander.
Jaderburg: No army commander.
Every player acts independently.
They only see their own observation
and the way that we train
these things we actually train
whole populations of teammates -
let’s say 30 agents in parallel
and they are all playing with
and against each other.
Fry: Rather than just creating
a single agent,
Max and his team build an entire
classroom of them - 30 in total.
And for each round of the game,
he randomly selects
a few of the agents from the class
to play together on the team.
By doing this thousands
and thousands of times,
each agent will learn
from their own experience,
but because they are playing
with each other too -
with their classmates as it were -
they have to learn to interact
with someone who’s different
from themselves.
Jaderburg: The problem is when we start
it’s actually just very random.
Fry: yeah.
Jaderburg: They’re just bouncing
about the place.
Without a clue
[Hannah laughs]
and then one of them
will discover something
and will start actually let’s say
taking control of the flag
and actually scoring points,
and at that point there’s evolutionary
pressure on this population.
Fry: And here’s the clever bit -
Max and his team
aren’t just letting the agents
in the classroom play on and on forever,
they’re also using something
called a genetic algorithm.
A way to make sure the whole culture
of the population of agents evolves.
Jaderburg: So actually
some of the weaker ones
will be removed
from this population.
Fry: So it’s almost like you’re making
that population of 30 have children.
Jaderburg: Yeah, absolutely.
Fry: You’re sort of
breeding them together.
Jaderburg: yeah.
Fry: The original classroom
of agents breeds together
and have kids of their own.
And as you go down the generations,
the strongest traits survive.
Jaderburg: But unlike human children,
when an agent has children
in this set-up,
they inherit everything,
they inherit the knowledge
that’s been gained from their parent.
Fry: But you’re mixing up
the characteristics
as you go from
one generation to the next.
Jaderburg: Yeah, so this agent
has to learn to play
a 5 minute game
of Capture The Flag - which is really -
you play 5 minutes,
you do thousands of actions,
and you just get a win or a loss
whether you have won or lost the game.
Somehow we have to learn
what to do with that,
and so to help bridge that problem,
we have this idea of internal rewards
where the events in the game
such as picking up a flag,
or dropping a flag,
or your teammate tagging an opponent
or an opponent tagging you,
all these sort of things,
and we allow the agents to individually
evolve their own internal rewards,
which is the reward they have assigned
to each one of these events.
Fry: So some agents are going
to care a great deal
about grabbing hold of the flag -
Jaderburg: Yeah, exactly -
Fry: and other agents are going to care
a lot about teammate tagging someone.
Jaderburg: Yeah.
Fry: This kind of evolutionary group
training means
that they can assume
different roles.
Producing better results
for stealing a flag.
And with a bit of practice,
after a few thousand rounds say,
teams of agents become really
rather good at this game.
Jaderburg: They absolutely smashed it.
And the great thing
about training an agent
in this manner is that the robust -
yes they can play themselves
but they can play other agents
that have been trained
in completely different regimes,
they can also play these in-game bots
which are sort of these hard coded bots
that shift with the game,
but most interestingly,
they can also play with people -
so you can drop people
into these games
and have you know
an AI teammate or AI opponents.
Fry: What was it actually like
to play with an agent then - do they -
does it feel like they are guessing
what you are going to do
as well as doing their own thing?
Jaderburg: it feels less like
they are guessing what you are doing
and more like they completely ignore you
and they are very ruthless.
Humans pay a lot of attention
to other humans -
even in game scenarios
like humans will fixate
on the other players of the game,
but these agents have been
trained completely unbiased
without these sort of human biases -
your opponent will run right past you
and not even try and tag you
because they’re so fixated on actually
getting the flag as quickly as possible
because that’s what’s going to maximize
their number of flag captures
and win them the game.
Things that really annoying
human players would do.
Fry: There’s a kind of magic
going on here.
Initially researchers
are working on these agents
trying to see a way
through the muddle.
Then there is that breakthrough moment
when the agent gets it.
When they start to behave
like you think they should.
Fry: Let me tease you
with Koray Kavukcuoglu,
director of research at DeepMind.
Kavukcuoglu: I remember training agents
in the early days -
the first time actually those agents
started behaving like okay,
it’s an environment,
it’s trying to navigate,
it’s trying to avoid certain obstacles
and what not -
the first time it starts doing
that it’s actually-
it is nice, it’s like it’s quite fun
to see that
because you know
that it makes a decision for itself.
It think knowing that you have created
an algorithm that can take decisions
I think that aspect is quite enjoyable.
That is very satisfactory.
10:00
Fry: It’s worth remembering that these
games aren’t just a trivial pursuit for
DeepMind, they’ve invested in this
rigorous training for a reason.
They want to see how an AI develops
these kinds of skills for itself.
Jaderburg: We spent a lot of time
in this
Capture The Flag work looking into the
the neural networks of these agents
to try and understand
what they care about
and how they represent the game world.
And what was really cool
is that we found that the agents
actually have a really, really rich
representation of this game world
without being told anything
about the game world itself.
You know it’s just look
at the pixels of the screen
yet somehow they have clustered
the you know
internal activations into things like oh
I’m in my home base,
I’m in my opponent base,
I’ve got the flag
and I can see my teammates ahead of me.
I’m looking at the opponent flag carrier
while my teammates are holding my flag.
And you can even find individual neurons
which just activate if for example
your teammates are holding the flag.
Fry: You can totally understand
how the agent is seeing the game
as you go through.
Jaderburg: I’m not sure about
totally understand
but we’re really getting an idea of
what is being represented strongly
and what isn’t being
represented strongly.
Fry: Max’s agents are using something
called neural networks.
It’s a type of machine
learning algorithm that is loosely based
on a simplified version
of the human brain.
Layers on layers of artificial neurons
are connected together in a vast network
and fire information
between themselves.
By looking inside an agent’s
electronic brain,
Max can work out which micro level
connections are responsible
for what macro level behavior.
And this can be hugely beneficial
as AI becomes more integrated
in our everyday lives.
Jaderburg: The hope is that well
into the future
we can start actually having agents
which can go out into the real world
that can interact with humans,
with other agents -
Fry: Without fighting -
Jaderburg: Without fighting -
being sensible, yeah.
[Hannah chuckles]
Not squabbling too much.
Fry: Unlike humans.
Jaderburg: yeah, exactly.
Fry: Games without frontiers
team work without tears.
But there is a big leap
between board games
or simple games
like Capture The Flag
and the big bad world with all
of its complexity and messiness.
You’ll remember David Silver -
the man who brought us AlphaGo -
the agent that defeated
the world champion
at the ancient board game of Go.
While he’s also involved
in pushing DeepMind’s AI
into ever-more perplexing environments.
Silver: In the context of games,
I think there is a further challenge
which is many people in the community
are moving towards,
which is to take
the most challenging computer game -
in this case it’s the game of StarCraft
- and many people in the AI community
are viewing this
as the next Grand Challenge -
now how can we actually devise agents
which can play
in this very rich environment
which has challenges
which are not only different
but many times vaster
than Go in other ways.
Fry: This is DeepMind the podcast -
an introduction to AI -
one of the most fascinating fields
in science today.
Have you ever seen footage of those vast
e-tournaments
where an entire arena of dedicated fans
excitedly watches on in support
of highly skilled players,
sat on stage in their gaming chairs.
Armed only with a keyboard,
a mouse and a computer screen.
Well chances are,
they are playing something like
StarCraft II, created
by the American video game
developer Blizzard entertainment.
It is a monumentally
tricky tactical game
where you play as one of three races -
the enigmatically named Zerg,
Protoss or Terrins.
Each player has to mine resources,
build an economy
and acquire increasingly
sophisticated technology,
all the time trying to defeat
your alien opponents in a futuristic
rather bleak looking landscape.
Your field of view of the simulated game
is limited by a moving camera
that you have to operate
and so there’s no way
to see everything at once,
often you can’t see
your opponent at all.
And it is played by 10s
of thousands of people -
sometimes for hefty cash prizes.
And the human players
are staggeringly fast.
The best in the world can manage up
to 800 clicks in a minute.
Feeling inadequate?
Vinyals: Definitely super cool
that I can work on one thing
that has been certainly
a passion of mine in my teenage days.
Fry: Meet Oriol Vinyals,
a research scientist at DeepMind.
He is an ex-pro
StarCraft player and co-leads
the StarCraft effort at DeepMind.
Vinyals: as you develop a new algorithm,
or a new idea, when you test it,
you actually see it play better
the game you like,
so that’s very rewarding
and very visual right,
that you try something new
and you really see
oh my god, it’s really understands
how this unit works.
Fry: StarCraft is a serious business -
so serious in fact
that it has now been professionalized,
and for Oriol that proves that is a game
that pushes human intelligence.
Vinyals: Humans found it interesting,
so that means it’s an interesting game
that challenges intelligence
and creativity
in ways that we like
that we spend many hours playing.
Fry: So how good is the AI
at the moment then?
How well can it play
StarCraft?
Vinyals: It’s better than any
AI anyone has ever built
and it obviously
has learned from experience
not from someone knowing the game
and encoding some set of rules.
This is I mean one of the most
complicated games we’ve ever tackled -
it’s challenging kind
of our understanding
and our algorithms quite a bit.
Fry: The DeepMind team started to see
how good their work in progress really
was by inviting two of the world’s best
StarCraft II players
to take on their own algorithm.
So let me introduce
DeepMind’s AlphaStar -
the first artificial intelligence to
ever take on top professional players.
It plays the full game of StarCraft II
by using a deep neural network
trained directly from raw game data,
by supervised learning
and reinforcement learning.
Your commentators are Dan Stemkoski -
aka Artosis,
and Kevin van der Kooi -
aka Rotterdam.
Van der Kooi: well first of all
it’s really awesome to be here
together with you Dan,
we’re both I think incredibly excited
to see how this evening unfolds.
Stemkoski: I mean this is just so
exciting that
DeepMind is doing all this -
Fry: Taking on AlphaStar in this
benchmarking match is German champion
Dario Wünsch, better known as TLO.
He’s normally a Zerg player,
but he’s playing as Protoss
for this match.
Kevin and Dan are excited.
Maybe even a tad over excited.
Van der Kooi: I’m so incredibly excited.
Stemkoski: Oh my god -
this is like the most exciting
I have personally ever been
for an event.
Van der Kooi: I can’t wait
to break down some -
Stemkoski: So this is AlphaStar -
this is an AI that we don’t know
how good it is yet,
but already we have some
interesting things happening.
Fry: Now I’m not entirely conversant
with the StarCraft playbook lingo here,
so I’ll just say
that AlphaStar’s stalkers
are laying down some sharp moves.
Stemkoski: it feels to me like so far
these attacks have been very well
planned by AlphaStar.
Van der Kooi: and ah relentless -
the last two attacks -
Fry: And in a matter of minutes,
it’s all over.
Stemkoski: Well that is it! [laughs]
the GG is called the Good Game
here from TLO
and the first game from AlphaStar
against a pro gamer goes to AlphaStar.
Fry: David Silver was there, ringside.
Silver: We have a team that has been
working on this
and ramping up our development over
the last few months and this represents
ah you know a milestone
where we actually for the first time
we saw an AI that was actually able
to defeat a professional player.
Fry: Shall we have a quick word
with our defeated challenger - TLO?
TLO: When I was practicing most
of the humans
I played against played
very standard StarCraft.
I once again
I assumed after the first match
I probably have a good idea how to play
against this agent, I did not.
Fry: Next up, the main event.
AlphaStar versus Poland’s finest-
Grzegorz Komincz -
better known as MaNa -
one of the world’s
strongest professional
StarCraft players.
Stemkoski: MaNa -
I need to hear what you’re thinking here
- cause that looks scary.
MaNa: Yeah, AlphaStar like
he’s not scared about
ah the ramp, so if I would be
playing against a human player
right there, nobody’s
going up that ramp.
Fry: I should point
for those of you who play
StarCraft that these matches
are taking place under professional
match conditions
on a competitive ladder map
and without any game restrictions.
This version of AlphaStar
could see the whole of the game map
at any one time,
but otherwise played
in a comparable way to humans.
Silver: Our goal is not
just to defeat these players.
Our goal is to do it in the right way.
Vinyals: two seconds guys!
Fry: And the result - AlphaStar: 5;
MaNa: nil.
Fry: I should tell you that MaNa played
a later version of the algorithm
in the end and won,
so all in all, 5 - 1.
Now to understand how an AI
could learn to play
StarCraft, Oriol Vinyals
put me to the test.
A match to the end
- mathematician versus machine.
I’ve got a quite a funky looking mouse
in front of me and a normal keyboard,
and on the screen
there is a very mean looking alien…
Vinyals: Yeah, Protoss.
Fry: I mean sort of sort of
like an elephant meets -
um well he’s got fists.
I wouldn’t want to meet him
on a dark night.
Vinyals: No.
Fry: Is he my friend or not?
Vinyals: He is you. You’re going to be
the commander of this particular race -
Fry: I quickly found out
there is a lot to take in.
StarCraft is perhaps not for beginners.
You have your worker bees
collecting resources for you.
Are these - are these all -
they’re almost like ant creatures
running out and and grabbing crystals.
Vinyals: Right, exactly.
Fry: And you need to try and work out
how your actions
will affect your game in the future.
This is not easy for humans
to learn let alone agents
who have absolutely no context,
no object recognition,
and definitely no former
StarCraft champion to hold their hand.
Vinyals: Hah! Look this is the enemy.
Fry: oh no!
Vinyals: Um it’s going to be pleasant -
it just came to kind of find you
and now see what you’re doing
which is absolutely nothing so far
so far we have done nothing at all -
Fry: Part of the challenge of StarCraft
is that there isn’t an idea
strategy that wins every time.
it’s a bit like rock, paper,
scissors in that way.
The winning tactic will depend
on how your opponent plays.
But remember you only have
a very narrow field of vision
- outside of where
your camera is pointing,
your opponent could be up to anything.
Vinyals: Because you don’t see
the other player, you must decide
when am I going to see it -
do I already know what’s going on
and should I not go
and scout what it’s doing
but maybe if I do that,
he knows that I know and so on
and so forth,
so these kind of imperfect
information aspect of StarCraft
is extremely interesting as a player
and it’s going to be testing our agents
to levels that we haven’t seen
in any other game.
And then of course there’s sort of
details that have happening in the game
that you must remember for a long time.
Fry: Advice I should have listened
to more carefully perhaps -
Vinyals: Ah - we’re being attacked
and we’re probably going to die.
Fry: Oh no! Oh no! [laughs]
Vinyals: Um, so that’s okay.
Fry: I didn’t last very long.
Vinyals: So but I think that this,
this discovery phase
right where you would now
basically you would lose,
you get the reward of minus one,
and you start again.
Fry: If I was an algorithm,
I wouldn’t be upset by losing,
I would just reset and go again.
Each time armed with
a little more knowledge.
But to even be able to play
StarCraft in the first place,
to even be able to operate the controls,
the AI had to master
quite a few transferrable skills.
Vinyals: You’ve noticed when you were
playing
that there were some movements
that were resembling what it was like
to maybe navigate the web
or like operate
um your laptop,
namely click, drag and click,
drag and drop, like select rectangles
and moving the mouse,
and maybe combining mouse
with keyboard and so on,
and we tried exactly the same agent,
the same architecture absolutely
everything the same the way,
the same code almost, and we changed,
we changed the environment
instead of saying now here
is the
StarCraft, please play to win,
we said here is paint -
Microsoft Paint as an environment -
interact with it
and I’ll reward you if what you paint
looks like a face,
and it actually worked
so I think that’s just learning
this basic skills of point
and click interfaces
that applies so in so many places.
Fry: the same agent that plays
StarCraft can draw real faces
in Microsoft Paint.
Vinyals: Right and here that the point
to be clarified
is not the same agent
that was trained to play
StarCraft, it’s the same algorithm
that can train to play
StarCraft, can also train to do Paint.
Fry: Put that same algorithm to work
drawing celebrities in Paint,
and it can capture all the main
traits of the face.
Clicking and dragging the mouse
to recreate shape
and tone and hairstyle,
much like a street artist would.
Vinyals: It’s the same technique,
but if you will,
it’s kind of a brain that is blank
and this brain can learn to do this
and that and that
and then we kind of by
acting in the environment repeatedly
and getting reward, the brain waits
or gets shaped to do this task
or that task or that task -
we are not yet at the point where
the same brain does both like we do
but obviously that’s one of the things
that we would be very interested
in tackling next as well.
Fry: Because that’s stepping towards
artificial general intelligence I guess.
Vinyals: Exactly.
And that’s what we do every day.
Fry: That is the ultimate goal
and it’s a topic of conversation
that’s never far away
whoever in this building
you find yourself talking to
because the point of getting AI
to play games like StarCraft
or Go is to enhance our understanding
of what intelligence actually is.
Here’s Raia
Hadsell from the Deep Learning team
Hadsell: we write programs,
we run those programs,
those experiments where we might train
an agent to play a game for instance
or to solve a puzzle
in a simulated world
and then we look at the results of that.
It really is trying to understand this
puzzle of learning and representation,
memory, control in terms of actions
that a robot would take.
There’s so many complex parts
to this big puzzle
of what is an intelligent being,
what is an intelligent agent.
Fry: But if you ask people what they
think the future of AI looks like,
it tends to be wrapped up in something
a bit more physical,
something that comes complete
with moving arms and everything.
Silver: I think one natural challenge
for AI
which many people are centering upon
would be to actually have an impact
on the real world
in the guise of robotics
to actually see a robot
which is able to move,
to grip, to manipulate,
to even have locomotion
in anything approaching
not even what a human does -
maybe even an animal.
I think this this would represent
a major stride forwards.
Fry: More on that next time.
If you would like to find out
more about the themes in this episode,
or explore the world
of AI research beyond
DeepMind, you’ll find plenty
of useful links
in the show notes for each episode -
and if there are stories or resources
that you think other listeners
would find helpful, then let us know.
You can message us on Twitter or email
the team at podcast@deepmind.com.
You can also use the address to send us
your questions
or feedback on the series.
But for now, let’s nip out
for a bit of air.
