TOMASO POGGIO:
I'm Tomaso Poggio.
I am the Director of the
Center for Brains, Minds,
and Machines, which is a
center between MIT and Harvard,
located in BCS in Building 46.
And I have the pleasure
of hosting Demis.
I don't need to
say much about him.
If you look on Wikipedia
or Financial Times,
there's a very good
caricature of Demis.
And you can find him everywhere.
He was a chess child prodigy.
He studied computer
science in Cambridge.
He started a couple or more
of successful computer games
companies.
Then he became a neuroscientist,
got a PhD at UCLA in London,
and then I was lucky enough that
I can put on my CV that he was
a post-doc of mine
for a brief period--
between 2009 and 2010, I think.
And then we saw each
other a couple of times.
Once, he came to speak at one
of the symposia for the MIT
150th birthday.
This was 2011.
And we had one session of that
symposium, which was called
"Brains, Minds, and Machine."
One section was titled "The
Marketplace for Intelligence."
And you spoke about DeepMind
that you had just started.
And so DeepMind is an
amazing achievement.
Demis managed to put together
a company, sell it to Google.
The company is also
a great research lab,
I would say the best
one in AI these days,
with high-impact papers
in Nature and so on
and achievements
like AlphaGo winning
against what is
arguably the best
player in the world, Lee Sedol.
I was in Seoul for the
last game, the fifth game,
and it was exciting
and historic.
And it's great to have Demis
here kind of telling us
about what went on and what
was the background of it.
Demis.
[APPLAUSE]
DEMIS HASSABIS: Thanks,
Tommy, for that very generous
introduction.
Thank you all for coming.
It's great being back at MIT.
I always love coming back here
and seeing and catching up
with old friends.
So today, I'm going to
split my talk into two.
The first half of it is going--
I'm going to give you a kind of
whirlwind overview of how we're
approaching AI
development at DeepMind
and the kind of philosophy
behind our approaches.
And then the second half of the
talk will be all about AlphaGo
and the sort of
combination of our work
there and what we're going
to do with it going forwards.
So DeepMind-- first of all,
it was founded in 2010,
and we joined forces with
Google in an early part of 2014,
so we've been there for
just over two years now.
One of the ways we
think about DeepMind,
and one of the ways
I've described it,
is as a kind of
Apollo program for AI,
Apollo program effort for AI.
Currently, we have more
than 200 research scientists
and engineers, so it's
a pretty large team now,
and we're growing all the time.
So obviously, there's
a lot of work going on,
and I'm only going to be able to
touch on a small fraction of it
today.
So apart from experimenting
on AI, which is obviously
the main purpose of
DeepMind, at least half
of my job and half
of my time is spent
on thinking about
how to organize
the endeavor of science.
And what we try
to do at DeepMind
is try to create an optimal
environment for research
to flourish in.
And the way--
I mean, that would be
a whole talk in itself.
But just to sort of give
you a one-line summary, what
we try to do is fuse the best
from Silicon Valley startup
culture with the
best from academia.
And so, you know,
we've tried to combine
the kind of blue-sky
thinking that you
get in interdisciplinary
research,
you get in the best
academic places,
with the focus and
energy and resources
and pace of a top startup.
And I think this fusion
has worked really well.
So our mission, as
some of you have heard
me state, the way I
kind of articulate
that is in two steps.
So step one, try and
fundamentally solve
intelligence.
And then if we
were to do that, I
think step two kind
of follows naturally--
try and use that technology
to solve everything else.
Certainly, that's
why I've always
been obsessed with working
on AI since I can remember,
because I truly
believe that it's
one of the most important things
that mankind could be working
on and will end up being one of
the most powerful technologies
we ever invent.
So more prosaically, what we're
trying to do at DeepMind-- what
we're interested in
doing-- is trying
to build what we call
general-purpose learning
algorithms.
So the algorithms we create
and develop a DeepMind--
you know, we're only
interested in algorithms
that can learn automatically
for themselves, from raw inputs
and raw experience, and
they're not handcrafted
or preprogrammed in any way.
The second important point
is this idea of generality,
so the idea that a single set of
algorithms, or a single system,
can operate out of the box
across a wide range of tasks.
In fact, this sort of connects
with our operational definition
of intelligence.
I know that's kind
of a big debate,
and there isn't really
a kind of consensus
around what intelligence is.
But operationally, we regard it
as the ability to perform well
across a wide range of tasks.
So we really emphasize this
flexibility and generality.
So we call this type of
AI "artificial general
intelligence"
internally at DeepMind.
And the hallmark
of this kind of AI
is that it's
flexible and adaptive
and possibly, you
could argue, inventive.
I'm going to come back
to that at the end,
once we've covered AlphaGo.
And the key thing
about it is that it's
built from the ground up
to deal with the unexpected
and to flexibly deal with
things that it's never
potentially seen before.
So by contrast, obviously AI's
a huge buzz word at the moment
and is hugely popular, both
in academia and industry,
but still a lot of the that we
find around us all or that's
labeled AI is of this kind of
what I would call narrow AI.
And that's really
software that's
been handcrafted for
a particular purpose,
and it's special case
for that purpose.
And often the problem with
those kinds of systems
is that they're hugely brittle.
As soon as the users interact
with those systems in ways
that the teams of
programmers didn't expect,
then obviously they just
catastrophically fail.
Probably still the most famous
example of that kind of system
is Deep Blue.
And obviously, that was a hugely
impressive engineering feat
back in the late 90s when it
beat Garry Kasparov at chess.
But Deep Blue, you know, it's
arguable whether it really
exhibited intelligence,
in the sense
that it wasn't able to
do anything else at all,
not even play strictly simpler
games like tic-tac-toe.
It would have to be
preprogrammed again
from scratch with
expert knowledge.
So the way we think
about AI and intelligence
is actually through the prism
of reinforcement learning.
And most of you will
be probably familiar
with reinforcement
learning, but I'm just
going to cover it quickly
here in this cartoon diagram.
for those of you who
don't know what it is.
So you start off with
an agent or an avatar.
It finds itself in some kind of
environment trying to achieve
a goal in that environment.
That environment
can be, obviously,
the real world, in which case
the agent would be a robot.
Or it could be a
virtual environment,
which is what we mostly
use, in which case
it's a kind of
Avatar of some sort.
Now, the agent only interacts
with the environment
in two ways.
Firstly, it gets observations
through its sensory apparatus
and reward signals.
And we mostly use
vision, but we are
looking to use other
modalities pretty soon.
And the job of the agent
system is kind of twofold.
Firstly, it's got
to try and build
as accurate a statistical model
as it can of the environment
out there based on these
noisy, incomplete observations
that it's getting in real time.
And once it's built
the best model it can,
then it has to decide
what action to take
from the set of actions
that are available to it
at that moment in time to
best get it incrementally
towards its goal.
So, reinforcement
learning, that's
basically the essence of
reinforcement learning.
And this diagram is very
simple but, of course,
this hides huge complexities
and difficulties and challenges
that would need to be
solved to fully solve
what's in this diagram.
But we know that if we
could solve all the issues
and challenges behind
this framework,
then that would be enough
for general intelligence,
human level general
intelligence.
And we know that because
many animal systems,
including humans, use
reinforcement learning as part
of their learning apparatus.
In fact, the dopamine
neurons in the brain
implement a form of TD learning.
So the second thing
that we kind of
committed to
philosophically in terms
of our approach [INAUDIBLE]
at the beginning
was this idea of
grounded cognition.
And this is the notion
that a true thinking
machine has to be grounded in
a rich sensorimotor reality.
But that doesn't mean it
needs to be a physical robot.
As long as you're
strict with the inputs,
you can use virtual
worlds and treat them--
these avatars and these agents
in these virtual worlds--
like virtual robots in the
sense that the only access
they have to the game state is
via their sensory apparatus.
So there's no cheating in terms
of accessing the internal game
code or game states of
underlying the game.
We think, if you treat
games in that way,
then they can be the perfect
platform for developing
and testing AI algorithms.
And that's for many reasons.
Firstly, you can create
unlimited training data.
There's no testing
bias in the sense
that, I think one of
the challenges of AI
is actually creating
the right benchmarks.
And very often, this
sort of turns out
to be an afterthought for an
AI lab to build the benchmarks.
And we think actually crafting
the right benchmarks is just
as difficult, maybe
even more difficult,
than coming up with
the algorithms.
And games, of course, have
been built for other purposes--
to entertain and
challenge human players--
and they've been built
by games designers,
so they weren't built
for testing AI programs.
So, in that sense,
they're really
independent in terms
of a testing/training
ground for our AI ideas.
Obviously you can run millions
of agents in parallel,
and we do that on
the Google Cloud.
And most games have scores,
so it is a convenient way
to incrementally measure
your progress and improvement
of your AI algorithms.
And I think that's very
important when you're
setting off on a very ambitious
goal and mission like we have,
which may be multi-decades.
It's important to have
good incremental measures
that you're going in
the right direction.
So this kind of
commitment then leads
to this idea of
end-to-end learning agents
and this notion of
starting with raw pixels
and going all the way to
deciding on an action.
At DeepMind, we're interested in
that entire stack of problems,
from perception to action.
And I think we've, over
the last five years
that DeepMind's been
going, have pioneered
this use of games
for AI research.
And I see many other
research organizations now,
and industrial groups, starting
to use games themselves
for their own AI development.
So I guess the first
big breakthrough
that we had at
DeepMind was really
starting this new field of
deep reinforcement learning.
And this is the idea of
combining deep learning
with reinforcement learning.
And this allows
reinforcement learning
to really work at scale and
tackle challenging problems.
Until we came up with this
idea of deep reinforcement
learning--
RL, of course, as a
field, has been going
for more than thirty years.
But generally
speaking, up till then,
they've only been
applied to toy problems,
little grid-world problems.
And nothing really
challenging or impressive
had been done with
all our research,
so we wanted to take
that further and apply it
to a really challenging domain.
So initially we picked
Atari 2600 platform,
which is really the first iconic
games platform from the '80s.
And, conveniently, there's
a nice open-source emulator
which we took and improved.
And then there are hundreds of
different classic Atari games
available on this emulator.
I'm just going to run you one
video in a second showing you
how the agent performs in
this Atari environments.
But before I do, just to sort
of confirm with you what you're
going to see, the agents
here only get the raw pixels
as inputs.
So the Atari screens are
200 by 150 pixels in size.
There's about 30,000
pixels per frame.
And the goal here is simply
to maximize the score.
Everything else is
learned from scratch.
So the system is
not told anything
about the rules or their
controlling or even the fact
that pixels in video
streams next to each other
are correlated in time.
It has to find all that
structure for itself.
And then there's this notion
again of generality-- one
system able to play all the
different Atari games out
of the box.
So we call this
system DQN, and we
think it really is a kind
of general Atari player.
So this is a little medley of
the same system out of the box,
the same [INAUDIBLE] is playing
all these very different games,
very different rule sets,
very different objectives,
very different visuals out of
the box with the same settings
and the same architecture.
And it performs better than top
human players on more than half
of the Atari games.
And since our
"Nature" paper, we've
now increased that to about
95% of the Atari games.
And here's the boxing where
it's the red boxer here,
and it does a bit of
sparring with the inbuilt AI
and then eventually corners
it and just racks up
an infinite number of points.
So if you want to know
more about that work,
you can see our "Nature"
paper from last year.
And the actual code is
freely available as well,
linked from the "Nature"
site, so you can play around
with the DQN
algorithm yourselves.
So two planks of
our philosophy is,
grounded cognition and
reinforcement learning.
A third sort of pillar, if
you like, of our approach
is the use of
systems neuroscience.
And as a neuroscientist
myself, you know,
I think this is going to
play a very important part
of understanding
what intelligence is
and then trying to
recreate that artificially.
But when I talk
about neuroscience,
I really want to stress I'm
talk about systems neuroscience.
And what we mean
by that is really
the algorithms, the
representations,
and the architectures
the brain uses
rather than the actual low-level
synaptic details of how
the neural substrate works.
So we're really talking
about this high level,
this computational
level, if you like,
of how the brain functions.
Now, I haven't got time to
really go into all the areas
that we're sort of using
neuroscience inspiration for,
but suffice it to say, some
of the key areas that we're
working on--
memory, attention, concepts,
planning, navigation,
imagination--
all these areas that
we're pushing hard on now,
it's going beyond the
work we did for Atari.
And actually, the area of
the brain that I studied
for my PhD, the hippocampus--
which is the center part
of the brain here in pink--
is actually implicated in
many of these capabilities.
So it seems like,
perhaps the notion
of creating an artificial
hippocampus of some sort which
mimics the functionality
of the hippocampus,
might be a good plan.
So I haven't got time
to go through all
of these different areas of
the work we're doing here,
but I'll just touch on a couple
of the most interesting ones.
So one big push that
we have at the moment
is adding memory
to neural networks.
And what we really want to
do is add very large amounts
of controllable memory.
So what we've done is
created this system,
which we are dubbing the
Neural Turing Machine.
And what it effectively is is
you take a classical computer,
you train a recurrent
neural network on it
from input-output examples, and
that recurring neural network
you can think of as like
the CPU, effectively.
And what we give this recurring
neural network is a huge memory
store, a kind of
KNN memory store,
that it can learn to
access and control.
And this whole system is
differentiable from end to end.
So the recurring neural
network can learn what
to do through gradient descent.
And really, that is then all
the components of a Von Neumann
machine that you need,
except here it's all neural
and it's all been learned.
So that's why we call it
the Neural Turing Machine
because it has all the aspects
you need for a true Turing
machine.
So here's a little
cartoon diagram
of what the Turing machine does.
And you can think of this
input tape, and then the CPU,
which is this recurring neural
network that actually has LSTMs
as part of it, and
then it's trying
to produce the right output.
And then it has this huge
memory store to the side
that it can learn to read and
write elements to, vectors to.
Now, with this
kind of system, we
can start moving towards
symbolic reasoning using
these kinds of
neural systems, which
is really one of the big holy
grails of what we want to do.
And, of course, there's
a classic problem in AI--
many unsolved classic problems.
One of the problems we apply
this Neural Turing Machine to
has been inspired
by the Shrdlu class
of problems, which are these
block worlds from the '70s
and '80s.
And the idea here
is to manipulate
the blocks in some way
and answer questions
about the scene.
Like put the red pyramid
on the green cube.
Or what's next to
the blue square?
And both manipulate this
world, and also answer question
and answer about it.
Now, we're not ready yet to--
Neural Turing Machines can't
scale to the full complexity
of the full Shrdlu problem.
But we have cut it
down to a 2D version,
a blocks world version,
where we can solve
some quite interesting things.
So we call this
Mini-Shrdlu, and it
has aspects of Tower of Hanoi
and other problems in it.
And the idea here is that
you've got this little blocks
world that you're
looking side on
and all these different
colored blocks,
and you're given the
start configuration
here on the left-hand
side and the goal
configuration you want to reach.
And what the system can do is
lift one block from one column
and put it down on the
top of another column.
That's the only moves
you're allowed to do.
And it gets trained through
seeing many starting examples
and end examples and doing trial
and error with reinforcement
learning and improving
itself over time.
And then, once it's
done it's training,
we then test it on new start
positions and goal positions
that it's never seen before.
And it has to try and
solve these problems
in an optimal number of moves.
So I'm just going to run
this little video which
will show you, going from that
start position on the left
to end up on the goal position.
I think this one's
about twelve moves.
It's actually a
pretty hard task to do
in an optimum number of moves.
It's really hard even
for humans to do this.
And so now it's solving pretty
interesting logic puzzles.
Also, what we've been
using Neural Turing
Machines to do recently
is solve graph problems.
Which, as you all know, are
a general class of problems.
And we'll be publishing
something pretty impressive,
I think, in the later part
of this year on this topic
to add to our archive paper that
we already published last year.
Now, we're also site
experimenting with language
as well.
And we've incorporated a
cut-down version of language
into these Shrdlu tasks.
And here, the Neural
Turing Machine
is reading a set
of constraints that
are given to it in
code that you can see
at the bottom of the screen.
So here, each of the
blocks are numbered,
and there are some
constraints that you
want to satisfy with
the goal configuration.
So, in this case, block three
should be down from block five,
four up from two, one up from
four, and six down from three.
And so it reads this in,
character by character,
remembers these
instructions, and then
starts executing the actions.
And then it solves
the puzzle, and this
is the end position
that satisfies
all those constraints.
Another thing
we're moving to now
is, there are still challenges
to overcome in Atari,
but we're also starting to
move towards 3D environments.
So we've repurposed
the Quake III engine
and added modifications to it.
We call it Labyrinth.
And we're starting to tackle
all kinds of navigation problems
and interesting 3D vision
problems within this kind
of labyrinth-like environment.
So I'll just roll the
video of this agent
finding its way through
the 3D environment,
picking up these green
apples which are rewarding,
and then trying to find
its way to the exit point.
And again, all of this behavior
is learned just through--
the only inputs are
the pixel inputs,
and it has to learn
how to control itself
in this 3-D environment and
find its way around and build
maps of the world.
So here, for an agent
like that, we're
starting to integrate some
of these different things
together-- deep reinforcement
learning with memory
and 3D vision perception.
So as we take this
forward, we're
thinking as one of our
goals over this next year
is to create a rat-level
AI, so an AI agent
that's capable of doing all
the things a rat can do.
And, you know, rats
are pretty smart,
so it could do quite
a lot of things.
So we're looking at
the rat literature,
actually, for
experimental ideas,
experimental tests that we
can test our AI agent on.
So now I want to switch
to AlphaGo, which is also
part of these big pushes
that we're doing into going
beyond the Atari work.
So one of the reasons
we took on AlphaGo
is, we wanted to see how
well these neural network
approaches could be meshed
with planning approaches.
And Go is really the perfect
game to test that out with.
So this is the game of Go for
those of you who don't play.
This is what a board looks like.
It's 19 by 19 grid, and there's
two sides-- black and white--
taking turns.
And you can place your stone--
your piece, which
is called a stone--
anywhere on an empty
vertex on the board.
Now, the history of Go has got
a long and storied tradition
in Asia.
It's more than 3,000 years old.
Confucius wrote about
it 2,000 years ago.
And he actually talked about
Go being one of the four arts
you need to master
to be a true scholar.
So it's really regarded
in Asia up there
with poetry and
calligraphy and art forms.
There's 40 million
active players today
and more than
2,000 professionals
who start going to Go school
before they're teenagers,
from around the age of
eight, nine, or ten.
They go to special Go schools
instead of normal schools.
And although the rules of Go
are incredibly simple-- in fact,
I'm going to teach you how
to play Go in two slides
in a minute--
they actually lead to
profound complexity.
One way of quickly
illustrating that
is that there are more than 10
to the power 170 possible board
configurations.
That's more than there
are atoms in the universe
by a large margin.
So the two rules are--
rule one, the capture rule.
Stones are captured
when they have
no free vertices around
them, and these free vertices
are called liberties.
So let's take a
look at our position
from an early part of
a Go game, and let's
zoom into the bottom
right of the board
to just illustrate
this first rule.
So here, you can
see this white stone
that's surrounded by the
three black stones only has
one remaining free vertex,
one remaining free liberty.
So if black was to play
there, it would totally
surround that white stone,
and that white stone
would be captured and
removed from the board.
And actually, big
groups of stones
can be captured in this
way, not just one at a time.
Whole large groups can be
captured if you surround
all of their empty vertices.
So that's the first rule.
The second rule is
called the ko rule.
And that states that repeated
board position is not allowed.
So let's imagine we're
in this position now
and it's white to play.
Now, white could
capture that black stone
by playing here and taking
that black stone off the board.
So now it's blacks move and
you might be wondering, well,
can't black just capture
back by replacing that stone
and taking white?
So what happens if
black was to play this?
And this is not allowed because
if black was to play back there
and remove the white
stone, now you'll
see that this
position we're in now
is identical to the
position we started with.
So that's not allowed.
So that black move
is not allowed.
Black would have to
play somewhere else
first to break this symmetry
and then can go back
and recapture that stone.
And that's it.
That's the rules of Go.
And the idea of Go
is that you obviously
want to take your opponent's
pieces by surrounding it.
But, actually, the main
thing you are trying to do
is wall off parts of empty
territory on the board.
And then at the end of the
game, when both players pass,
they don't think they can
improve their positions
any further, you count up the
number of territory you've got
and you add the
prisoners that you've
taken from your opponent.
And the person with the
most points wins the game.
So the rules of Go are
simple, but it's pretty much
the most profound
and elegant game
I think that mankind
has ever devised.
And I say that as
a chess player.
You know, I think Go
is really the pinnacle
of perfect information games.
It's definitely the
most complex game
that certainly humans have spent
a significant amount of time
mastering and play at a very
high professional level today.
And because of this
huge complexity of Go,
it's been an outstanding
grand challenge for AI
for more than twenty
years, especially
since the Deep Blue match.
And the other interesting
thing for us is that--
and I'm going to come back
to this more in a minute--
that if you ask top
Go players, they'll
tell you that they rely on their
intuition a lot to play Go.
So Go really requires both
intuition and calculation
to play well.
And we thought that
mastering it, therefore,
would involve combining
pattern recognition
techniques with planning.
So why is Go hard for
computers to play?
Well, the huge complexity means
that brute force search is not
tractable.
And really, that breaks down
into two main challenges.
Firstly, the search
space is really huge.
There's a branching
factor of more than 200
in an average position in Go.
And the second point,
which is probably
an even bigger
problem, is that it
was thought to be impossible
to write an evaluation
function to tell the computer
system who is winning
in a mid-game position.
And without that
evaluation function,
it's very difficult to
do efficient search.
So I'm just going to unpack
these by comparing Go to chess,
and you'll see the difference.
So in chess, in an
average position,
there are about
20 possible moves.
So the branching
factor in chess is 20.
In Go, by contrast, as I just
mentioned, it's more like 200.
So there's an order of
magnitude, a larger branching
factor.
Plus, Go games tend to
last two to three times
longer than chess games.
The evaluation function--
Why is this so difficult for Go?
Well, we still
believe, actually,
that it's impossible to
handcraft a set of rules
to tell the system
who's winning.
So you can't really create
a expert system for Go,
for evaluating a Go position.
And the reasons are, there's no
concept of materiality in Go.
In chess, as a
first approximation,
you can just count up the value
of the pieces on each side
and that will tell you
roughly who's winning.
You can't do that in Go because,
obviously, all the pieces
are the same.
Secondly, Go is a
constructive game,
so the board starts
completely empty
and you build up the
position move by move.
So if you're going to try and
evaluate a position halfway
through or at the
beginning of the game,
it's very difficult
because it involves
a huge amount of
prediction about what
might happen in the future.
If you contrast that
with chess, which
is a kind of destructive
game, all the pieces
start on the board
and, actually, the game
gets simplified as you
move towards the endgame.
The other issue
with Go is that it's
very susceptible to
local changes, very
small local changes.
So even moving one piece around
out of this mass of pieces
can actually completely change
the evaluation of the position.
So Go is really a game
about intuition actually
rather than calculation.
And because the
possibility is so huge,
I think it's kind of
at the limit of what
humans can actually cope with
and deal with and master.
And, you know, I've talked to
a lot of top Go players now
and when you ask them about when
they play a brilliant move why
they played it,
they'll just tell you
actually or quite often
that it felt right,
and they'll use those words.
If you ask a chess
grandmaster why
they played a
particular move, they'll
usually be able to
tell you exactly
the reasons behind that move.
You know, I played this move
because I was expecting this,
and if that happens, then
I'm going to do this.
And they'll be able to give
you a very explicit plan of why
that move was good.
And you can see
that Go definitely
has a sort of
history and tradition
of being intuitive
rather than calculating
because it has notions of things
like the idea of a divine move.
And actually, there are
some famous games in history
that get names, and within those
games, there are famous moves.
And those moves are
sometimes named as well.
And if you talk to
a top Go player,
they dream about one day, at
one point in their career,
playing one of
these divine moves,
a move so profound it's almost
as if it was divinely inspired.
And you can look that up online.
They have some really
interesting stories
from the Edo period in Japan of
these incredible games played
in front of the shogun and
these divine moves being played,
ghost moves.
So how did we decide to tackle
this intuitive aspect of Go?
Well, we turned to
deep neural networks.
And, in fact, what we did is, we
used two deep neural networks.
So I'm just going to take you
through the training pipeline
here.
We started off with
human expert data
that we downloaded
about 100,000 games
from internet Go servers
of strong amateurs
playing each other.
And we, first of all, trained
through supervised learning
what we called a policy network.
And this deep neural network,
what it was trained to do
was to mimic the human players.
So we gave it a position
from one of those games.
And, obviously, we know what
the human player played.
And we trained this
network to predict and play
the move the human
player played.
And after a whole
bunch of training,
we could get pretty
reasonably accurate.
We can get to about
60% accuracy in terms
of predicting the move
that the human would play.
But, obviously, we
don't want to just mimic
how human players
play, especially
not just amateur players.
We want to get better
than the human players.
So this is where reinforcement
learning comes in.
Where we then iterate
through self-play this policy
network many millions of
times playing against itself
and incrementally improving
the weights in that network
to slowly increase its win rate.
So after millions of
games of self-play,
this new policy network
has about an 80% win
rate against the
original supervised
learned policy network.
Then we freeze
that network and we
play that network against
itself 30 million times.
And that generates
our new Go data set.
And we take a position from
each of those 30 million games.
And, obviously, we
have the position,
and we also know the
outcome of the game.
We know who finally
won, black or white.
And then, with
that much data, we
were finally able to crack
the holy grail of creating
an evaluation function.
So we created this second neural
network, the value network,
which is a learned
evaluation function.
So it learned to take
in board positions
and try and accurately predict
who is winning and by how much.
So after all of this training,
which is a lot of compute power
and training on that,
we end up finally
with two neural networks.
The policy network, which
takes a board position coders
are trying to
[INAUDIBLE] as an input.
And the output is a
probability distribution
over the likelihood of each
of the moves in that position.
So the green bars
here, and the height
of the green bars
on the green board,
represent the kind of
probability mass associated
with each of the moves
possible from that position.
And then, the second network is,
we get this value network here
in pink.
And, again, you take the
board position as an input.
But, here, the
output of the network
is just a single real
number between 0 and 1.
And that indicates whether
white or black is winning
and by how much.
So if it was 0, that means white
would be completely winning.
And 1, black would
be totally winning.
And 0.5, the position
would be about equal.
So we take those forwards--
but the neural networks
are not enough on their own.
We also need something
to do the planning.
And for that, we turn to
Monte Carlo tree search
to stitch this all
together, and it
uses the neural networks to
make the search more efficient.
So I'm just going to show you
how the search works here.
So imagine that we're in
the middle of pondering what
to do in a particular position,
and imagine that position
is at the root node of this tree
represented by the little mini
Go board here.
And perhaps we've done a
few minutes or a few seconds
of planning already,
so we've already
looked at a few different moves
represented by the other leaf
nodes here.
And what you do is, you've got
two important numbers here--
Q is really the
current action value
of the move, the estimate
of how good the movie is.
And P is this sort
of prime probability
of the move from the
policy network in terms
of how likely it is a
human would play that move.
And let's imagine
we're following
the most promising
path at the moment
that we've found so far
in the bold arrows here
that are coming down,
and we end up at a node,
at a position that we
haven't looked at so far.
So what happens here
is, we expand the tree.
And we do that by first
calling the policy network
to find out which moves are
most probable in this position.
So instead of having to look
at 200 possible moves, all
the different possible
moves in this position,
we just look at the
top three or four
that the policy network
tells us are most likely.
And so that expands
the tree there.
And then once we've
expanded the tree,
we evaluate the desirability
of that path in two ways.
One is that we call
the value network,
and that gives us an instant
estimate of the desirability
of that position.
And we also do a second
evaluation routine
using Monte Carlo
rollouts, so we
roll out maybe a few thousand
games to the end of the games
and then we backup
the statistics
of that back to this node.
And what we've found is, that
by combining these two valuation
strategies, we can get a really
accurate evaluation of how
desirable that position is.
And then, of course, that's
one of the parameters
we experiment with is the
mixing ratio between what
the rollouts are telling us
and what the value network is
telling us.
And as we improved AlphaGo,
we trusted the value networks
more and more.
So I think now, the lambda
parameter's about 0.8
in favor of trusting
the value network.
And when we started on
this around last summer,
it was about 0.5.
So then, once you have that, you
back the Q value up the tree.
And then once you've run out
of time or you allocate a time,
you basically pick the
move that has the highest
Q value associated with it.
So if we think about what these
neural networks are doing then
for us in terms
of the search, you
could think of it in this way.
Imagine that this is the search
tree from the current position.
It's totally intractable.
It's really huge.
What we do is, we call
the policy network
to really cut down the
width of that search,
to narrow that down.
And the value network really
cuts the depth of the search.
So instead of having
to search all the way
to the end of the game and
collect millions of statistics
like that to be even
reasonably accurate,
we can truncate that data search
at any point we like and call
the value network.
So once we built
the AlphaGo system,
it was time to evaluate how
strong it was and test it out.
So the first thing
we did was play it
against the commercially
best available Go programs
out there.
The two best ones are
Crazy Stone and Zen.
They've won all the recent
computer Go competitions
of the last few years.
And they've reached to
about strong amateur level.
So in Go, you start off in
this thing called "cue" K-Y-U
and you go down in
score as an amateur.
And then, as you get
better as a strong amateur,
you get a dan rating
which goes from one dan
to about six or seven dan.
And then, finally, you
can become professional,
and then the dan ratings
start again from one to nine.
So really, these programs
were about the strength
of strong amateurs,
a strong club player.
And AlphaGo did incredibly
well against them.
So in the 495 matches we
tried, it won all but one.
And it could do a 75% win rate
against these other programs,
even when they were given
a four-move head start,
which is huge in Go.
It's called a
four-stone handicap.
And this graph here
that I'm showing you
is just the single machine
version of AlphaGo,
and it was even stronger on
the distributive version.
And these rankings are quite
subjective, these Go rankings.
So we actually created a
numerical ranking, an Elo
ranking, that's on the y-axis
on the left-hand side which
is based on chess Elo ratings
and is purely statistical
in terms of the win rates
of the different programs.
And what we found is that
a gap of about 200 Elo
points, or 250 Elo points,
translates to about an 80% win
rate.
And AlphaGo was more than a
thousand Elo points better
than the other best programs.
And so, this was
back in October,
so this is not the
most recent version.
And we beat all of
these other programs,
so it was time to test ourselves
against some of the world's
top human players.
So what we did back in
October with challenge
this lovely guy called Fan
Hui who's now based in France
but was born and
grew up in China.
He's the current reigning
three-time European champion.
He's a two dan professional.
He started playing go at
seven and turned professional
in China at age 16.
It's very difficult to
turn professional in China,
so he was a top, top player
before moving to France,
and now he coaches the
national French team.
And we change in October,
and this is what happened.
FAN HUI: I think after first
game maybe it don't like fight,
it like play slow.
So it's why begin
second game, I fight.
It do mistake sometimes.
This gives me confidence.
I think maybe I'm right.
It's why for another game,
I fight all the time.
Now it's complicated,
now it's complicated.
But I lose all my games.
DEMIS HASSABIS: So AlphaGo won
five nil, much to our surprise,
and became the first program to
ever be a professional at Go.
And if you ask AI experts,
even the top programmers
of these other programs,
even sort of a year before,
they were predicting
this moment would be
at least another decade away.
So it's about a decade
earlier than the top experts
in the field expected,
and certainly
a decade earlier
than the Go world
thought it was going to happen.
AUDIENCE: Was this the
distributed version
or is it single version?
DEMIS HASSABIS: This was
the distributed version.
And this story ends well though,
he looks distraught here,
but he ended up hiring him
as a consultant on our team
after this.
And he joined this side
of the program afterwards.
But one interesting point about
this, which is interesting,
is that he then came into
the office for about a week,
every month, to make sure--
he was part of our making
sure we went over fitting
in our self play, by carrying
on pitting our wits against him.
And he felt that his
play had improved
by playing against AlphaGo.
And actually he went from
ranked about 600 in the world
at that time in October, to in
January, February, like three
or four months later, being
ranked 300 in the world.
So it was-- and he'll tell you
that it really opened his mind,
he said, it freed my mind from
the constraints of the 3,000
years of tradition to think in
a different way about the game.
So it's very interesting.
So again if you want to read
the technical details of this,
this is another Nature
paper, front cover,
that was a couple of months ago.
And I think it's caused a
really big storm in the AI
world and the Go world.
So then it was
time to take a kind
of ultimate challenge, which
was just a few weeks ago now.
We started to challenge
Lee Sedol, who is
an absolute legend at the game.
I call him like the
Roger Federer of Go.
And he's been indisputably the
best player of the past decade,
and he's won 18 world titles.
And he's also famed
for his creative style
and creative brilliance, so he
was the perfect player for us
to pit our whit's against.
And we played him in early
March for a million dollar
first prize in Korea.
Now just before I
go to the results,
I just want to side
note on compute power
here, which I always
get asked about.
So we use roughly the same
compute power for this match
as we did for the Fan Hui match.
So there's around about
50-60 GPUs worth of compute.
And you might ask, well,
why don't we just use more?
Well, actually this--
asymptotes quite click quickly,
actually, the strength
of the program
with more compute power.
And one of the reasons is
it's actually quite hard
to paralyze MCTS algorithms.
They work much better,
more efficiently,
if you do them sequentially.
And if you batch them across
lots and lots of GPUs,
you don't actually get that much
more effectiveness out of it.
And one measure of that is
that the distributed version,
surprisingly, probably, I think
maybe to many of you, only
wins about 75% of the time
against the single machine
version.
So we play the match,
and many of you
will see that we
actually won 4-1.
And it was pretty
outstanding to us,
because even the day
before the match,
they interviewed
Lee Sedol, and he
was saying he was confident
he was going to win five nil.
And the whole Go
world thought there
was no chance we could win.
Obviously, they we're looking
at the Fan Hui matches
and trying to estimate-- maybe
we'd improve 10-20% since then,
and that would stand no
chance against Lee Sedol.
But actually in the five months
that we had between the two
matches, the new
version of AlphaGo
could beat the old version
of AlphaGo 99.9% of the time.
So it is pretty
astoundingly much stronger.
And actually it's an amazing
experience out there,
and I'll talk about the culture
significance in a second,
but one very nice thing is
the president of the Korean Go
Association, in the
middle there, awarded us
and AlphaGo with an
honorary 9-dan certificate.
So it was really
beautiful, we have that
framed up on the wall--
for it's creative play.
And I just want to touch
on those themes actually
about the creativity
and intuition.
And that's one of the reasons
I explained to you how
to play Go, because I want
to just try and explain
to you some of the significance
of what AlphaGo did.
Now Chess is really my
main game that I play,
but I played Go well enough
to be able to appreciate
what's going on.
Now probably the best
move that AlphaGo
played in the whole
five game series,
and maybe the Go world will
decide to name this move,
is move 37 in game 2.
And this is the position--
AlphaGo was black, and
AlphaGo decided to play here.
It's called a shoulder hit
move, this move here in red.
And it's funny-- I'm going to
try to explain to you why this
is so amazing, this
move, by telling you
a little bit about Go.
So there's two key lines in Go,
the third and the fourth line
of the board, that's the
critical lines in Go.
So here's the third line.
Now if you play a stone on the
third line, what you're really
trying to do, you're
telling your opponent
is I'm interested
in taking territory
on the side of the board.
That's a third line move means.
A fourth line move, by
contrast, is the fourth line.
What that means if you
play on the fourth line
is I'm trying to take
influence and power
into the center of the board.
OK, so you're going
to try and influence
the center of the board,
and radiate that influence
across the board.
So that's towards the center.
And the beauty of Go, and
I think one of the reasons
why it ended up evolving
to 19 by 19 board, is
the playing on the
third and fourth lines
and going for
territory or influence,
is considered to be
perfectly balanced.
Right?
So the territory that you get
for playing the third line
is about equal to
what the opponent gets
by playing on the fourth
line, and getting power
and influence into the center.
The idea is that that influence
that you get and power you get,
you store up for
later and eventually
that will give you territory
somewhere else on the board.
So that's the classic 3,000
years of history of Go,
and yet AlphaGo played
on the fifth line,
to take influence toward
the center of the board.
And so this is kind
of astounding--
goes against 3,000
years of history of Go.
And just to show you how
astounding that was to the Go
fraternity, I just
want to show you
a clip from the commentary,
the live commentary.
And so we had
commentating live, there
were lots of
commentary channels.
There's actually 14
live channels in China,
it's all free national
TV stations in Korea,
but also we had an
America-- we had
an English Channel via YouTube.
And we had this
fantastic commentator
called Michael Redmond who
was the only Westerner ever
to get to 9-dan; he is the
only English-speaking person
to ever get to 9-dan.
And look at his reaction
to this move 37.
So just to show you,
so what turned out
was about 50 moves
after this move,
that move here
influenced the fight over
in the bottom left corner.
Right?
About 50 moves later.
So you can't calculate
that, because there's
too many possibilities.
That was the influence of
the power of that move.
So this is Michael
Redmond seeing this move.
MICHAEL REDMOND: The Google
team was talking about
is this evaluation--
the value of--
DEMIS HASSABIS: He doesn't
even know where it is.
CHRIS GARLOCK: That's
a very surprising move.
MICHAEL REDMOND: I
thought it was a mistake.
CHRIS GARLOCK: Well, I
thought it was a click miss.
MICHAEL REDMOND: If we were
online Go, we'd call it clicko.
CHRIS GARLOCK: Yeah,
it's a very strange move.
Something like this would
be a more normal move.
DEMIS HASSABIS: So I think
he means a miss click
as opposed to a click miss.
So he was thinking that our
operator, the person actually
playing the moves for
AlphaGo, [INAUDIBLE]
Wang is the lead programmer,
had actually entered
the move wrong into the
machine, because it's
that surprising of a move.
And this is what Lee
Sedol thought of it,
he disappeared to the
bathroom for 15 minutes.
So that's his empty seat there.
No one knew what
happened to him.
So he just disappeared
for 15 minutes.
So maybe it will be called the
face washing move or something,
because they're usually named
after something that happened.
And actually later when we
investigated the statistics
behind this, we found
that the policy network
gave prior probability of
this move as less than one
in 10,000.
So AlphaGo overcame--
so it's not just
repeating what it's seeing
in these professional games,
because it would never have
thought to play this move.
Later on, some of the
9-dan pros commented,
it's not a human move,
no human would ever
have played this move.
So it's just really kind of
an original move, if you like.
And one thing that we think
is going on here is that--
AUDIENCE: Do you have
data that [INAUDIBLE]??
DEMIS HASSABIS:
No, we can't yet.
We need to build more
visualization tools
to actually do that, we're
building that at the moment.
It's pretty hard to know why--
to explain why it's
done that, for us.
So here, in terms of
these surprising moves,
I think this shows some
kind of originality.
And what it might mean is--
when I talked to Michael Redmond
about this is that
he said that--
AlphaGo has this
very light touch.
It doesn't commit
to territory early,
and what we think is going
on is that AlphaGo really
likes influence in the
center of the board.
And he likes it
so much, and it's
so good at ultimately making
the influence pay later
on in the game, that it
actually thinks fifth line's
influence is good enough.
So this may cause a whole
rethink in the game of Go
as to what's an
acceptable trade.
Then, I must say, the other
really spectacular move
was played by Lee
Sedol in game 4.
So we won the first three
games, and then Lee Sedol
came back strongly, because
he's an incredible games player.
I've met many of the
best games players
in the world, Garry
Kasparov and others,
but I put Lee Sedol at
the top of all the games
players I've met in terms of his
creativity and fighting spirit.
And in game 4, he won game 4.
And he did it by playing this
incredible move, move 78.
I haven't got time to go
into why this is so special,
but basically when we look
to the data on this as well,
we found that AlphaGo thought
the probability of this move
was also less than
one in 10,000.
So it was totally
unexpected for AlphaGo,
and that meant all the
pondering and search
it'd done and up to
the move prior to this
ended up having
to be thrown away.
So it basically
had to start again
as soon as this move
happened, and for some reason
this caused some
misevaluation in the value
net, which we're
still investigating
what happened there.
So cultural impact of
this match was huge.
We had 280 million
viewers, that's
more than like the Super Bowl.
And 60 million viewers just
in China for the first game.
We were being stopped
in the streets in Korea,
and it was pretty crazy.
And 35,000 press articles
literally every day,
it was front page of all
the newspapers in Korea.
And the thing I
liked most, actually,
was that it popularized
Go in the west,
there was a world wide
shortage of Go boards
for the last few weeks after--
still now, I think.
If you're trying to
order a Go board,
you might have trouble
because of this game, which
is fantastic to see.
The press coverage
was just insane.
These are pictures
of the press room,
just a scrum number
of live TV cameras,
50 live TV cameras in the back.
It was on all the
national TV stations,
jumbo screens in the shopping
districts, it was pretty crazy.
It was amazing to see.
I think for Korea it was
the perfect match up of--
they love technology, they
love AI, and they love Go,
so for them it was
the perfect storm.
And this is one interesting
thing that I want to show,
is the rate of
progress of AlphaGo.
So we started this project only
about just over 18 months ago,
and the progress has been
relentless from the beginning.
We found that these techniques,
which can improve themselves,
and you can create more data,
and then train new versions,
and then that can create more
better, high quality data.
That virtuous
cycle had delivered
about a one rank
improvement per month,
which is pretty astounding.
And the interesting thing
is we haven't really
seen any asymptote yet, so
we're quite anxious to see
how far this can go.
And what is the optimal play, or
to get near optimal play in Go.
How much further is there to go?
And actually, I think, most
of the Go professionals
are really interested in
this question as well.
And I'm pretty sure that,
just like with Fan Hui, when
we ultimately release AlphaGo
in some way to the public,
I think it will improve
the standard of Go,
and bring in whole new ideas.
So after the heat of battle
I had a great dinner catch up
with Lee Sedol, who's also
an amazing and lovely guy,
and we talked about the match.
And he told me that it
was one of the greatest
experiences of his life, and
the fact that it had totally--
just the five
games he played had
made him totally rejuvenated
for his passion for Go,
and the ideas and creativity
about what could be done.
AUDIENCE: How many games
a day does the machine
play against itself?
DEMIS HASSABIS: It's playing
a few thousand a day,
depends on how many
machines we use.
AUDIENCE: So [INAUDIBLE] is
when human experience is at?
DEMIS HASSABIS: Potentially.
I mean these pros play
several thousand games a year,
probably about 1,000-2000
when they're training,
so it's quite a lot.
Plus they read a lot about
all the ancient games.
AUDIENCE: Do you think
the strong culture
in Go has forced human play
into a corner instead of--
DEMIS HASSABIS:
I don't think so,
because there are three
different schools of Go,
the Japanese, the
Koreans, and the Chinese.
And they're very competitive
against each other.
And they approach
the game differently,
and I think that that creative
tension has forced them out
of local maxima, I would say.
So just to compare
Deep Blue with AlphaGo,
just to be clear again
about the differences.
So deep blue, again
not to take away
from the immense
achievement that it
was for its time, absolute
incredible, but it used
handcrafted chess knowledge.
By contrast, AlphaGo has
no handcrafted knowledge,
all the knowledge it has it's
learned from expert games
and through self-play.
Deep Blue did full width
search, pretty much looked
at all the
alternatives, and that's
why I needed to crunch 200
million positions per second.
By contrast, AlphaGo uses
these two neural networks
to guide the search in a
highly selective manner.
And that means we
only need to look
at 100,000 positions
per second to deliver
this kind of performance.
So I just want to finish
by a couple of words
on intuition and creativity.
And this may be a little
bit controversial,
so I don't want to--
I'm not saying this is the
full truth of the matter,
or even fully
encompasses on everything
to do with intuition
and creativity,
but I think these are
interesting thoughts.
So we have to sort of
define a little bit what
do we mean by intuition?
And one way I'd like to--
at least the way I think
about it for Go, is
this implicit knowledge
that humans have acquired
through experience
of playing Go, but
it's not consciously
accessible or expressible.
Certainly not to communicate
to someone else, but not even
to themselves.
But we know this
knowledge is there,
and we know it's a
very high quality,
because we can
test the knowledge
and verify it behaviorally.
Obviously it's the output of
the moves that the player plays.
Secondly, what is creativity?
And I'm sure
everyone in this room
has their own pet definition.
But I think, again, it
definitely encompasses
the ability to synthesize
the knowledge you have
and use that knowledge that
you've accumulated to produce
novel or original ideas.
And I think certainly,
at least within,
albeit the constrained
domain of Go,
I think AlphaGo has pretty
clearly demonstrated these two
abilities.
And obviously while playing
games is a lot of fun,
and I believe the most efficient
way to go about AI development,
obviously that's not
the end goal for us.
We want to apply the
technologies that we've
built here as part of
AlphaGo, that we believe
are pretty general purpose,
extend them, use components
of them, and apply them to have
impacts on big challenging,
real world problems.
And we're looking at
all sorts of areas
at the moment, like
health care, robotics,
and personal assistance.
So I just want to thank
the amazing AlphaGo
team who did all
this incredible work,
really incredible engineering
and research efforts.
And also again I
just want to stress
all this work I've
shown you today
is really less than a
tenth probably of the work
that we're doing at
Deep Mind, and if you're
interested in seeing
all of our publications,
they all are on our
website, and there's
about 70-80 publications there
now of all of our latest work.
And of course, I must mention,
if you want to get involved,
we are hiring both research
scientists and software
engineers.
Thanks for listening.
[APPLAUSE]
DEMIS HASSABIS: Yeah?
TOMASO POGGIO: You have--
DEMIS HASSABIS: Yeah, go for it.
Do you want to use this?
TOMASO POGGIO: For a second.
Thank you.
So let's have a
couple of questions.
Anybody?
Yeah?
OK.
Let me--
AUDIENCE: If groups of people
play together could they
beat AlphaGo?
DEMIS HASSABIS: I
think the question was,
can groups of players
together beat AlphaGo?
Maybe.
So that's something that
we might play in the future
actually, is a group of top
professionals versus AlphaGo.
And it'd be quite
interesting to see,
because it's known that
some of these top players
are really good at opening
or middle game or end game,
and you could
switch between them.
And I'm sure they'd be
a lot stronger together.
So maybe we'll do
that towards the end
of the year or next year.
Yes, behind you.
Yeah.
AUDIENCE: You mentioned
earlier using visualization
to better understand
why AlphaGo--
DEMIS HASSABIS: Yeah.
AUDIENCE: [INAUDIBLE]
Can you talk about that?
DEMIS HASSABIS: Yeah--
AUDIENCE: Can you
repeat the question?
DEMIS HASSABIS:
Yes, so the question
was using visualizations
to understand
better how AlphaGo works.
So we think this is a huge issue
with the whole deep learning
field actually, how can
we better understand
these black boxes that are
doing these amazing things,
but quite opaquely.
And I think what we need is
a whole new suite of analysis
tools and statistical tools and
visualization tools to do that.
And again I look to my
neuroscience background
for inspiration,
for those of you
to fMRI or that
kind of analysis,
I think we need the
equivalent of kind
of SPM for a virtual brain.
So we actually have a project
called virtual brain analytics,
which is around building
these kinds of tools
so that we can better
understand what representations
these networks are building.
So hopefully in
the next year or so
we'll have something much
more to say about that.
Yeah?
AUDIENCE: So you
mentioned that Deep Blue
used sort of human
crafted moves, which
sort of helped them.
And then AlphaGo
didn't have that,
but it still learned from moves
and experiences of the game.
DEMIS HASSABIS: Yeah.
AUDIENCE: Is there any sort of
hope for completely reinforced
learning--
DEMIS HASSABIS: Yeah.
AUDIENCE: In Go or
even in other agents.
What is the--
DEMIS HASSABIS: Yeah, so it's a
really good question, actually.
The question is, can we do away
with the supervised learning
part, and just go all the way
from literally random, using
reinforcement
learning, up to expert.
We plan to do this
experiment actually.
So we think it will be fine,
but it will take a lot longer
to train, obviously,
without bootstrapping
with the human expert play.
So until now, we've
been just concentrating
on trying to build the strongest
program we can in the fastest
time.
So we haven't had time
to experiment with that,
but there are a number of
experiments like that that we
want to go back to and try.
I will say that a very smart
master student from Imperial
College in London did do
this for Chess from scratch,
and they got to International
Master Standard.
So it seems like this is
definitely sort of possible.
And actually we've
hired him now,
and so he may be the
person that would end up--
Matthew Lai, he's called--
that may end up looking
at this as well.
So maybe someone
from near the back.
Yeah?
AUDIENCE: So [INAUDIBLE]
DEMIS HASSABIS: Sorry?
AUDIENCE: [INAUDIBLE]
DEMIS HASSABIS: Yes.
AUDIENCE: That
algorithm [INAUDIBLE]..
DEMIS HASSABIS:
Yes, potentially.
So we're thinking about adding
learning into that part too.
And also maybe there
are ways of doing
away with some of that
[INAUDIBLE] search too,
there are other ways
of doing that search,
more like imagination
based planning.
So we're thinking
about that as well.
Maybe back there, yeah?
AUDIENCE: [INAUDIBLE]
DEMIS HASSABIS: So I think the
question, if I understand it
correctly, is that if agents
play games well, is that AI?
Is that what you're
asking, or is that--
AUDIENCE: Yes.
Can AI [INAUDIBLE]
DEMIS HASSABIS: Well,
I mean, obviously,
that's our thesis is
that this will work.
But I think you have to be
careful how you build the AI.
There are many ways you
could build AI for games that
would not be generalizable.
So I think that's been
the history-- that is what
generally, for
commercial games, which
I've also helped make lots
of commercial games, which
have AI in them.
And usually the M built
AI is a special case,
usually it's finite
state machines
or something for the game.
And it utilizes all kinds
of game state information
that if you were
just using perception
you wouldn't have access to.
So I think you
have to be careful
that you use games
in the right way,
and you treat the agent
really as a virtual robot
with all of that that
entails, in terms
of what it has access to.
And I think as long as
you're careful with that,
then it's fine.
And one way we
enforce this is we
have a whole separate
team, an evaluation team,
of amazing programmers,
most of them
are X Games programmers,
who build the environments
and the APIs to the
environments and so on.
And they're entirely separate
from the algorithm development
teams.
And the only way the AIs
can interface with the games
is by these very thin APIs.
So we know there's no way,
even if the researcher was
to be laxes with this,
or lax with this,
they can access things they're
not supposed to-- the agents.
Pick from the left.
So we'll just go
around, any questions?
Yeah, here.
AUDIENCE: Why does
AlphaGo improve?
Is it [INAUDIBLE],,
self-training,
or do you tweak it?
DEMIS HASSABIS: Well,
we're doing both, actually.
So there's self-training
in terms of--
they're self-training
producing high quality data,
it's tweaking itself through
this deep reinforcement
learning, and we're
also actively doing
tons of research in terms
of new architectures
or parameters and other things.
So it's all of the above.
So we really threw
everything at it.
