Hey Charles, this is when we get to
talk about reinforcement learning.
>> Hi Michael.
This is when I get to
hear about reinforcement learning.
>> Wow.
I'm glad we're on the same page.
So-
>> Are we on the same page?
Is this all of reinforcement learning or
is it just the reinforcement
learning basics?
>> We're going to start with the basics.
>> Oh, okay.
I can't wait to hear what that is.
>> So the first concept to try
to understand when you're doing
reinforcement learning is that a lot of
it takes place as a conversation between
an agent and an environment.
>> Okay, so like right now, you're
the agent and I'm the environment.
>> Actually I think I'm
going to have you be the agent.
>> Okay.
>> And we'll just imagine some kind of,
I don't know,
like a video game environment.
>> That sounds reasonable.
By the way did you
notice I've lost weight?
>> [LAUGH] Oh, good job,
how did you do that?
>> Well, I got drawn as a stick figure.
>> [LAUGH] That's fair.
So here we are, the agent and
the environment, and
the conversation basically talks
about what is going back and
forth between the agent and
the environment.
So, the environment is going
to reveal itself to you,
to the agent, in the form of states, S.
You then get to have some influence
on the environment by taking actions,
a, and then you also receive back,
before the next state,
some kind of reward for the most
recent state action combination.
>> Okay.
Fair enough.
>> So this is the same kind of
elements that we have in an MDP.
But the important thing is that instead
of just being given an MDP as some kind
of a graph structure and
then we get to compute on it.
Really the computation's happening
inside the head of the agent and
the information about the environment
is really only available through
the course of this interaction.
So does that make some sense?
>> It does make some sense but I guess
how is that any different from the MDP?
Well, it's the same story as how
a policy interacts with an MDP.
Right, where this is playing
the role of the MDP,
and this is playing
the role of the policy, pi.
>> Mm-hm.
>> But now, again, the computational
aspect of the system here,
the agent doesn't know the environment,
it's not living inside the agent's head.
Instead the agent is just experiencing
the environment by interacting with it.
It can then you know if it so
chose build
some kind of a model of the environment
in its head and then think about that.
But what's in the agent's head and
what's in the environment are two
different things in this set up.
>> Okay fair enough.
I get that.
>> So maybe I can make
this a little bit clearer.
So let's actually put
you in this environment.
What do you say?
>> Okay, metaphorically?
>> No, let's just do it.
>> Sure.
