
English: 
Okay Michael, so if you go all
the way back to the very first slide,
when we started this conversation.
The slide said something about
generalizing generalizing.
But one of the words that was
written on there was scalability.
So one of the things that we've been
talking about, sometimes explicitly,
sometimes implicitly, is this notion
of getting scale to actually happen.
So just because I think it's
important to think about scale
more than in terms of abstraction.
I just want to take a moment to talk
about a algorithmic approach to getting
scaling to work.
That's different from just doing
abstraction of the sort that
we've done before.
Although it's going to turn out that
all the things that we've done with our
abstraction will actually work in this
new algorithmic view of scalability.
That seem interesting?
>> Yeah that would be really helpful.
>> Cool let's do that, all right, so
here's a particular algorithm
that I want to introduce.
It's called Monte Carlo Tree Search,
and there are four words there, and
there's two different parts.
So for the beginning I just want to
concentrate on the tree search part, and

English: 
Okay Michael, so if you go all
the way back to the very first slide,
when we started this conversation.
The slide said something about
generalizing generalizing.
But one of the words that was
written on there was scalability.
So one of the things that we've been
talking about, sometimes explicitly,
sometimes implicitly, is this notion
of getting scale to actually happen.
So just because I think it's
important to think about scale
more than in terms of abstraction.
I just want to take a moment to talk
about a algorithmic approach to getting
scaling to work.
That's different from just doing
abstraction of the sort that
we've done before.
Although it's going to turn out that
all the things that we've done with our
abstraction will actually work in this
new algorithmic view of scalability.
That seem interesting?
>> Yeah that would be really helpful.
>> Cool let's do that, all right, so
here's a particular algorithm
that I want to introduce.
It's called Monte Carlo Tree Search,
and there are four words there, and
there's two different parts.
So for the beginning I just want to
concentrate on the tree search part, and

English: 
then tell you what I mean
by the Monte Carlo part.
So, when I use this picture on the
right, the little tree, all I'm trying
to get you to think about with this
tree, is what we've done in the past.
Say in an AI course, where you've
got nodes representing states.
There are actions that you might take
that would get you to other states.
And then more actions to get other
states and so on and so forth.
But this particular tree has
a nice little form that's kind of
hidden by the edges and the nodes, and
I want to harp on that for a little bit.
And I think it would help if you
understand the algorithm that I'm trying
to get through on this data structure.
So the algorithm is written on the left,
and
it's a big loop loop
loop loop loop loop loop.
And it works like this, there are four
steps, the first step is selection.
And I'll define what all
these things mean in but
a moment, but I just want to go through
at a high level what each one is.
So the first one is selection.
And selection is basically
the way in which you're going to
decide what actions to take
from a particular state.

English: 
then tell you what I mean
by the Monte Carlo part.
So, when I use this picture on the
right, the little tree, all I'm trying
to get you to think about with this
tree, is what we've done in the past.
Say in an AI course, where you've
got nodes representing states.
There are actions that you might take
that would get you to other states.
And then more actions to get other
states and so on and so forth.
But this particular tree has
a nice little form that's kind of
hidden by the edges and the nodes, and
I want to harp on that for a little bit.
And I think it would help if you
understand the algorithm that I'm trying
to get through on this data structure.
So the algorithm is written on the left,
and
it's a big loop loop
loop loop loop loop loop.
And it works like this, there are four
steps, the first step is selection.
And I'll define what all
these things mean in but
a moment, but I just want to go through
at a high level what each one is.
So the first one is selection.
And selection is basically
the way in which you're going to
decide what actions to take
from a particular state.

English: 
What's going to happen is you're
going to take a bunch of actions and
eventually going to get to
some point in this tree here.
Of possible states and
actions you might see,
where you don't know enough to
know how to make a selection.
Once you're at a place where you don't
exactly know how you should make
a selection, the idea is that you're
going to expand the tree there.
And then do simulation to figure out
what it is you ought to be doing
to make selections.
And the way that's going to happen,
is you're going to estimate
from that expanded set of nodes the true
value of taking actions in those states.
And then in a very kind of,
in a very Bellman equation way,
you're going to back up what you
learned through your simulation.
And then that will allow you
to select what to do next.
And you'll just keep doing that over and
over and over again making selections
where you know what to do.
Expanding, simulating the world for
a little while, so
that you learn what to do, and then make
those decisions, and so on and so forth.
So does that makes sense
a very high level?

English: 
What's going to happen is you're
going to take a bunch of actions and
eventually going to get to
some point in this tree here.
Of possible states and
actions you might see,
where you don't know enough to
know how to make a selection.
Once you're at a place where you don't
exactly know how you should make
a selection, the idea is that you're
going to expand the tree there.
And then do simulation to figure out
what it is you ought to be doing
to make selections.
And the way that's going to happen,
is you're going to estimate
from that expanded set of nodes the true
value of taking actions in those states.
And then in a very kind of,
in a very Bellman equation way,
you're going to back up what you
learned through your simulation.
And then that will allow you
to select what to do next.
And you'll just keep doing that over and
over and over again making selections
where you know what to do.
Expanding, simulating the world for
a little while, so
that you learn what to do, and then make
those decisions, and so on and so forth.
So does that makes sense
a very high level?

English: 
See what I'm trying to accomplish here?
>> Yeah I think so.
>> Okay.
>> It reminds me of a lot of other
kinds of tree search that
I've seen in AI classes.
>> Like what?
>> A star is a kind of tree search,
a game tree search is
a kind of tree search.
They have similar pictures where
you repeatedly expand nodes and
get estimates of values.
>> Right, and in fact I like the game
trees or the games search one.
This is not game search because we're
living in an MVP, and it's just us, and
so we're not playing
against an opponent.
But something very nice
about that particular one,
is the way it works in
that world often is.
You have a particular way of
figuring out what action to take or
you sort of expand and do a search
among a bunch of possibilities.
And then eventually you run out of time,
and so
you have to decide what the value is or
whatever leaf nodes you've gotten to.
And the way you do that is use
something like an evaluation function,
which tells you how good
we think this node is.
And that's just what you
what you've got to work on.
And you use that and
you back up the values, that helps you
to make a decision about what to do.

English: 
See what I'm trying to accomplish here?
>> Yeah I think so.
>> Okay.
>> It reminds me of a lot of other
kinds of tree search that
I've seen in AI classes.
>> Like what?
>> A star is a kind of tree search,
a game tree search is
a kind of tree search.
They have similar pictures where
you repeatedly expand nodes and
get estimates of values.
>> Right, and in fact I like the game
trees or the games search one.
This is not game search because we're
living in an MVP, and it's just us, and
so we're not playing
against an opponent.
But something very nice
about that particular one,
is the way it works in
that world often is.
You have a particular way of
figuring out what action to take or
you sort of expand and do a search
among a bunch of possibilities.
And then eventually you run out of time,
and so
you have to decide what the value is or
whatever leaf nodes you've gotten to.
And the way you do that is use
something like an evaluation function,
which tells you how good
we think this node is.
And that's just what you
what you've got to work on.
And you use that and
you back up the values, that helps you
to make a decision about what to do.

English: 
And then you make a decision,
then your opponent makes a decision,
you end up in a state, and
you do it all over again.
Basically you search out as far as
you have time to search out for.
When you run out of time, you use some
estimate that you got from somewhere of
how good the node is, and
then you back that up.
And that's basically what
we're going to be doing here,
except we have another wrinkle.
And the wrinkle here is where
the Monte Carlo part comes in.
So the tree search is very standard.
The Monte Carlo part is,
well, we've got randomness,
we've got stochasticity
in our transition model.
And so we're going to have to
come up with some way to do this
estimation that takes that into account.
And that's where the simulation
is going to come from.
All right, so
let's break that up in the little parts.
Actually that's a lot of words,
but I think walking through this
picture might help a little bit.

English: 
And then you make a decision,
then your opponent makes a decision,
you end up in a state, and
you do it all over again.
Basically you search out as far as
you have time to search out for.
When you run out of time, you use some
estimate that you got from somewhere of
how good the node is, and
then you back that up.
And that's basically what
we're going to be doing here,
except we have another wrinkle.
And the wrinkle here is where
the Monte Carlo part comes in.
So the tree search is very standard.
The Monte Carlo part is,
well, we've got randomness,
we've got stochasticity
in our transition model.
And so we're going to have to
come up with some way to do this
estimation that takes that into account.
And that's where the simulation
is going to come from.
All right, so
let's break that up in the little parts.
Actually that's a lot of words,
but I think walking through this
picture might help a little bit.
