All right. Hey, everyone. Welcome back.
Um, so let's continue our discussion today of reinforcement learning and MDPs.
And specifically, what I hope you learned from
today is how to apply reinforcement learning,
um, even to continuous state or infinite state MDPs.
Um, so we'll talk about discretization, model-based RL,
we'll talk about models/simulation and fitted value iteration is the main algorithm,
um, I want to lead up to for today.
Um, just to recap.
Because we're gonna build on what we had learned, uh,
in the last two lectures,
I wanna make sure that you have the notation fresh in your mind.
Um, MDP was states,
actions, transition probabilities, discount factor reward. That was an example.
Um, V Pi was the value function for a policy Pi which is the expected payoff.
If you execute that policy starting from
a state S and V star was the optimal value function.
And last time, we figured out that if you know what is V star,
then uh, Pi star,
the optimal policy or the optimal action for a given state,
can be computed as the argmax of that, right?
Um, and, uh, one, one,
one thing though we'll come back to later is,
uh, an equivalent way of writing that formula,
is that this is the expectation with respect to S prime drawn from
P_sa of V star of S prime, right?
So when we go to, uh, er, er,
we've been, we have been working with discrete state MDPs with an 11 state MDP.
So this is the sum over all the states S prime.
But when we have- when we go to continuous state MDPs,
the generalization of this or what this becomes-
this is the expected value with respect to
S prime drawn from the state transition probabilities here with- of,
uh, index Pi_sa, covers state, covers that action of the value that you attain in the future.
So V star of S prime.
Okay? Um, and then we saw the value iteration algorithm.
We're also- so we talked about valuation policy iteration.
But today, uh, we're built on value iteration,
but the value iteration algorithm uses Bellman's equations, uh,
which says, take the left-hand side,
set it to the right-hand side, right?
And, uh, for, for V star,
if V was equal to V star of the left-hand side is equal to the right-hand side,
that was, um, oh, I'm sorry.
It's missing a max there.
Right? Um, for- if V was equal to V star,
then the left-hand side and the right-hand side will be equal to each other.
But, uh, what value iteration does is an algorithm that initializes V of S as
0 and repeatedly carries its update until V converges to V star.
And after that, you can then, um,
compute Pi star or find for every state find,
the optimal action A.
Okay? So um, because we're gonna build on this notation and this set of ideas today,
I just want to make sure all this makes sense, right?
Any questions about that before we move on?
No? Okay. Cool. All right.
So, um, no?
No. Okay. So everything we've done so far was built
on the MDP having a finite set of states.
Right? So with the 11 state MDP,
S was a discrete set of states.
Um, last time on Monday,
I think somebody asked, "Well,
how do you deal with continuous states?"
So we'll, we'll, we'll work on that today.
But, uh, let's say you want to build a, um,
[NOISE] uh, let me draw a car.
Right? Let's say you want to build a  ar,
you know, maybe a self-driving car.
Right? Um, ah, the state space of a car is, um, let's see.
I'm gonna- well, instead of taking the- my artistic side view of the car,
if you take a top-down view of a car.
Right? So this is from the satellite imagery, you know,
top-down view of a car where we have two views of the car heading this way.
Um, how do you model the state of a car, right?
Well, a common way to model the state of a car that's driving around on planet Earth,
is that you need to know the position.
Right? Um, and so that can be represented as x,
y, uh, two numbers represent,
you know, longitude or latitude or or something.
Right? Um, you probably want to know the, uh,
orientation of a car by maybe measured relative to North,
you know, what's the orientation of [NOISE] the car?
Um, and then it turns out if you're driving at very low speeds, this is fine.
But if you're driving at anything other than very low speeds, then, um,
[NOISE] we'll often include in the state-space,
also the velocities and angular velocity.
So x dot is the velocity in the x direction.
So x dot is dx, dt.
Right? Or it's the velocity and acceleration, y dot.
So velocity in y direction and Theta dot is the angular velocity,
the rate at which your car is turning.
Okay? And it's sort of, um,
up to you, how you want to model the car,
is it important to model the current angle of the steering wheel,
is it important to model how worn down is your front
left tire as opposed to how worn down is your, your right tire.
So depending on the application you are building,
is up to you to decide what is the,
um, state-based- state-space you want to use to model this car,
um, and I guess- and, and if you are building a car to,
to, to race in a racetrack,
maybe it is important to model what is the temperature of the [NOISE] engine and how,
you know, worn down is each of your four tires separately.
But for a lot of normal driving, uh, this would be,
uh, uh, you know, sufficient level of detail to model the state-space.
Okay? Um, but so this is a, uh, six-dimensional,
uh, um, so this is a,
uh, six-dimensional state-space representation.
Oh, and for those that work in robotics, uh,
that would be called the kinematic model of the car,
and that would be the dynamics model of the car.
Right? If, if you want to model their velocities as well.
Um, or let's see actually, all right.
How about a helicopter?
Right? I don't know.
I hope this is a helicopter. All right.
Ah, the states- how, how,
how do you model a state-space of a helicopter?
Helicopter flies around in 3D rather than drives around in 2D.
And so common way to model the state-space helicopter would be to
model it as-
[NOISE]
having a position x, y, z.
Um, and then also, a 3D orientation of a helicopter is usually modeled with,
uh, three numbers which we sometimes call the roll, pitch, and yaw.
Right? So you're- if, if,
if you're in an airplane roll  is that you are rolling to left or right,
pitch is are you pitching up and down,
and yaw is, you know, are you facing North,
South, East or West, right?
So there's one way to turn the three dimensional orientation
of an object like an airplane or a helicopter into three numbers.
So, so, uh, er,
the, the details aren't important.
And if you actually work on a helicopter, you would figure this out.
But for today's purposes it's just- right, uh,
I- I guess the [NOISE] roll, pitch, yaw.
Right? But that, uh, um, to represent the, uh,
orientation of a three-dimensional object flying around,
this is conventionally represented with three numbers,
uh, such as roll, pitch and yaw.
Um, and then [NOISE] x dot, y dot,
z dot, Phi dot,
Theta dot, Psi dot.
They're linear velocity and the, um, angular velocity.
Okay? Um, [NOISE] maybe just one last example.
So it turns out in, in,
in reinforcement learning, uh,
maybe early, early history of reinforcement learning,
one of the problems that a lot of people just happened to work on, um,
uh, and, and, and therefore,
you'd see in a lot reinforcement learning textbooks,
there's something called the inverted pendulum problem.
But what that is, is a little toy, um,
which is a little cart,
that's on wheels, that's on a track, um,
and you have a little pole that is attached to this cart and there's a
free [NOISE] swivel there, right?
Uh, and so this pole just flops over or
this poll just swings freely and there's no motor,
uh, there's no motor at this- at this little hinge there.
[NOISE] And so the inverted pendulum problem is- see that I've prepared this.
Right? Is- [LAUGHTER] no.
I've always prepared this.. If, if, if you have, uh,
if you have a free pole and if this is your cart moving left and right,
the inverted pendulum problem is, you know,
can you, if you see it swivel,
can you kind of balance that, right?
[LAUGHTER] Um, and so,
uh, one of the- so common textbook examples of, uh,
um, reinforcement learning is, uh,
can you choose actions over time to move
this left and right so as to keep the pole oriented upward?
Right? And so for a problem like this, um,
if you have a linear rail just a one-dimensional, you know,
like a railway track that this cart is on,
the state-space would be x which is the,
uh, position of the cart.
Um, [NOISE] Theta which is the, ah,
orientation of the pole as was x dot and Theta dot.
Right? So this would be a
four-dimensional state-space for the- for the inverted pendulum if,
if it's like running left and right on
a railway track- on a one-dimensional railway track, right?
Um, all right.
Cool. So, uh,
for all of these problems,
if you want to build, you know,
a self-driving car and have it do something or, um,
build an autonomous helicopter [NOISE] and have it either hover stably or
fly a trajectory or keep the pole upright in inverted pendulum.
These are examples of robotics problems where you would
model the state space as a continuous state-space.
So what I wanna do today is focus on problems where the state-space [NOISE] is, um, R_n.
So n-dimensional set of row numbers.
And in these examples,
I guess n would be 4 or 6 or 12.
Right? Oh, and, uh, again,
for the- for the mathematicians in this class, technically,
uh, angles are not real numbers because they wrap around,
and they only go to 360, and then they wrap around to 0.
But I think for the purposes of today,
uh, that's not important.
So we just treat this as R_n.
Oh, yeah. Okay. [NOISE].
[NOISE] So [NOISE] all right.
Um, so the most straight- straightforward way-
[NOISE] the most straightforward way
to work with a continuous state space is discretization where,
um, you know, you might have in this example a two-dimensional state-space,
maybe, ah, x and Theta for the inverted pendulum.
And then you just lay down the cell or grid values, right?
And discretize it back to a- a discrete state problem.
Ah, and so, you know,
so you can give the states a set of names, one, two, three, four,
whatever and anywhere within that little square you just
pretend that your MDP in the robot is in state number 1.
So this takes a continuous state problem and turns it back to a discrete state problem.
Um, this is such a simple straightforward way to do it.
Ah, this is actually reasonable to do for small problems.
Um, and if you have
a relatively small low dimensional states MDP like an inverted pendulum problem,
you know, four-dimensional, it's actually perfectly
fine to discretize the state 3 and solve it this way.
Ah, let me describe some disadvantages of discretization first.
And then- and then we chat a little bit about when you should just
use discretization because even though it's not the best algorithm,
it works fine for smaller problems.
But for bigger problems,
we'll have to go to more sophisticated algorithms like fitted value iteration, okay?
But, um, so what are the problems with discretization, right?
Well first, actually this marker is-
[NOISE] this is a very-
you know there's kind of a naive representation,
ah, for, ah, V_star,
ah, and Pi_star, right?
Which is- you know,
remember the very first problem we talked about,
ah, predicting housing crisis?
Um, imagine if x was the size of a house,
and the vertical axis was the price of a house, ah,
and you have a dataset that looked like this, [NOISE] all right?
Discretization is the- the- the discretization equivalent of trying
to fit a function as data would be to look for the input feature and,
um, you know let's discretize it into Phi values.
And for each of these little buckets in- in each of these five intervals,
let's fit a constant function,
right, or something like that, right?
So this staircase would be how,
you know, discretization will represent the price of a house as a function of the size.
Um, and the analogy is that what
we're doing in reinforcement learning is we want to approximate the value function.
And if you were to discretize it then,
um, on the x-axis is maybe the state.
And now, I'm down to one-dimensional state, right?
Cause that's where I can plot.
And you are saying that, well,
let's approximate the value function, you know,
as a- as a staircase function,
as a function of the set of states, right?
And you know- and this is not terrible.
If you have a lot of data and very few input features,
you can get away with this. This will work, okay?
But, it- it- it- it doesn't,
it doesn't seem to s- allow you to fit a smoother function, right?
Um, so that's one downside.
It's just not a very good representation.
Um, and the second downside is the, ah, dimensionality.
All right. Some- somewhat fancifully named cursive dimensionality,
which is, ah, and Richard Bellman had given this name, and this is a cool sounding name.
But, what it means is that if, um,
the state space is in R_n, um, and discretize,
you know, each dimension into k values,
then you get k_n discrete states, right?
So if we discretize, ah,
position and orientation into 10 values which is quite small,
then you end up with you know 10-n states which
grows exponentially and in the dimensional state space n. So, um,
discretization works fine if you have relatively low dimensional problems,
like two-dimensions, no problem,
four dimensions maybe it's okay.
But they were very high-dimensional state spaces.
Ah, this is- this is not a good- this is not a good representation, right?
And, um, it turns out the cursive dimensionality- to take a slight aside
from continuous state spaces because dimensionality also
applies for very large discrete state MDPs.
So for example, one of the places people have
applied reinforcement learning is in factory optimization, right?
So we have a factory with 100 machines in a factory
and if every machine in the factory is doing something slightly different, um,
then if you have 100 machines in a giant factory, ah,
each- and each machine can be in k different states,
then the total number of states of your factory is,
um, k to the power of 100, right?
And so even if- so- so cursive dimensionality also
applies to very large discrete state spaces such as if you have a factory,
with 100 machines, and then your total state space becomes k to the 100.
Um, and it turns out that for this to have a discrete state space,
ah, fitted value iteration can be a much better algorithm as well.
We'll get to fitted evaluation in a little bit, okay?
So, um, let's see.
So some practical- so, ah,
now despite all this criticism of digitalization
if you have a small state space there's a simple method,
ah, to try to apply, you know.
And- and if- if you have a very small state space,
go ahead and discretize it if you want
quick things to try and just get something working.
Ah, so let me share with you some maybe guidelines.
Ah, this is- this is how I do it I guess, right?
If you have a, you know,
two-dimensional state space or
three dimensional state space, it's no problem, just discretize.
Of usually for a lot of problems,
uh, it's just fine.
Um, if you have maybe a 4-6 dimensional state space,
um, you know, I would think about it,
ah, and it will still often work.
So for the inverted pendulum which is four-dimensional state space,
it works just fine.
Um, I've had some friends work on, ah,
trying to, ah, drive a old bicycle, right?
Which you can model as a six-dimensional state space, ah,
and you know discretization it- it kind of
works as it- it- it works if you put some work into it.
Ah, one of the tricks you want to use as you approach
the 4-6 dimensional state space range is,
ah, choose your discretization more carefully.
So for example, if the state S2 is really important.
So if you think the- the actions you need to take or the value of
the performance is really sensitive to the state S2 and less in the state S1,
then, um, in this range people end up designing, um,
unequal discretizations where you might discretize S2 much more finely than S1, right?
And- and the reason you do that is, ah,
the number of states, the number of discrete states is now blowing up exponentially.
Something to the power of 4, something to the power of 6.
And these tricks allow you to just reduce
a little bit the number of discrete states you end up having to model.
Um, I think, you know,
if you have a 7-8 dimensional problem,
ah, I- that- that's pushing it.
That's when I would kind of be nervous, and- and,
you know, be increasingly inclined to not use discretization.
I personally rarely used discretization for problems that are eight-dimensional.
Ah, and then when your problems that are even higher-dimensional than this.
You know like 9, 10,
and higher than I would very seriously consider,
um, ah, an algorithm that does not discretize.
Very rare, um, to use discretization for- for problems this high.
Even seven to eight is quite rare.
I've seen it done in rare occasions
but- but- and- and - and these things get worse exponentially, right?
With the number of dimensions.
So maybe there's a set of guidelines for when to use
discretization and when to seriously consider doing something else.
All right. So, um,
in the alternative approach that you see today, ah,
what you will be able to do is to approximate V star
directly without resorting to discretization, okay?
And, um, uh, there'll be an analogy that will make later,
uh, just, you know alluding to this plot again.
Right, so this analogy between linear regression where you're trying to approximate
y as a function of X and value iteration,
where you're trying to learn or approximate V as a function of s. Right, that's v star.
Which is that in linear regression, um,
you say let's approximate X as a linear function of y, right, um,
or if you don't want to use the roll features y, ah,
what you can do is,
um, use, you know,
theta transpose, theta transpose phi, oh,
I'm sorry, got that totally mixed up.
Right, where phi of X is the features of x, ah, so if, um, ah, right?
So this is what linear regression does where if X is
your housing price then maybe phi of X is equal to,
you know, X_1, X_2,
X_1 squared, X_1, X_2 and so on, right?
So that's how, that's how you can use
linear regression to approximate the price of a house,
either as a function of the raw features or as a function of some,
you know, slightly more sophisticated, slightly more complex set of features of the house.
And what, we will- what,
what you see in, um,
fitted value iteration is a model where we will approximate v star of s as,
um, a linear function of features of the state.
Okay? So that's the algorithm we'll build up to.
And, uh, um, uh,
yeah we're going to try to use linear regression with a lot
of modifications to approximate the value function.
Okay? And, and, and again in reinforcement learning in value iteration, um, the,
the- your goal is to find a good approximation to
the value function because once you have that you can then use,
you know, the equation we had earlier to
compute the optimal action for every state, right?
So, so we just focused on computing the value function.
Now in order to derive the fitted value iteration algorithm, um,
it turns out that, uh, um,
fits it value iteration, um,
works best with a model with simulator of the MDP.
So let me describe what that means and
how you get the model and then we'll talk about how you can
actually you implement the fitted value iteration algorithm
and have it work on these types of problems.
Okay?
All right.
So, um, what a model of a or
a simulator of your robot is- is just a function that takes as input
a state, takes as inputs an action and it outputs
the next state S prime drawn from the state transition probabilities.
Okay? Um, and the way that a model is built,
um, is that, um, uh,
the states and the actions,
uh, above, uh, uh, and,
and let's see, and the way the model is built is the state is just a row value vector.
Okay? Oh, and, um,
I think for simplicity, uh,
for now let's assume that the action space is discrete.
Um, it turns out that for a lot of MDPs,
the state space can be very high dimensional,
and the action space is much lower-dimensional than the state space.
Uh, so for example for a car, you know,
S is, uh, uh, six-dimensional.
But the space of actions is just two dimensional, right?
The steering and braking.
Uh, It turns out for a helicopter you know the state space is 12-dimensional.
And I guess you probably mostly, I wouldn't expect
you to know how a helicopter flies but it turns out there you have, uh,
four-dimensional actions in a helicopter.
The way you fly a helicopter is you have two control sticks,
so your left hand and your right hand you can move,
uh, uh, has two-dimensions of control.
And for the inverted pendulum, I guess,
the state space is 4D and the action spaces is just 1D, right?
You move left or right.
So you actually see in a lot, um,
reinforcing learning problems that it's quite common
for the state-space to be much higher dimensional than the action space.
And so, um, let's say for now
that we do not want to discretize the state space because it's too high dimensional.
But just for the sake of simplicity let's say
we discretize the action space for now, right?
Which is, which is usually much easier to do.
But I think as we develop fitted value integration as well, uh,
we'll- we'll you might- you'll get hints of
when maybe you don't need to discretize your action space either,
but let's just say we have a discrete,
discrete action space for now.
Okay?
So, all right
so how do you get a model, right?
Um, one way to
build a model is to use a physics simulator.
So, um, you know in the case of an inverted pendulum, right?
It turns out that, uh, uh,
well the action is what's the acceleration you apply to
either positive or negative or to the, to accelerate to the left or the right.
Then it turns out that,
um, uh, let's see,
so the state space is four-dimensional, right, and it turns out that, um,
if you sort of flip open a- a physics textbook Newtonian mechanics, uh,
if you know the weight of the car,
the weight of the pole,
um, uh, uh, yeah I think that's it actually.
If you know the mass of the car and the mass of the pole,
uh, and the length of the pole,
it turns out you can derive equations about what is the velocity, right?
So it turns out S dot is equal,
you know, don't- don't worry about this.
Think of the math as decoration rather than something you need to learn where,
you know, L is the length of the pole,
M is the mass of one of these things actually don't worry about it.
M is the pole mass, uh,
A is the force extended and so on.
Um, uh and, and a conventional physics textbook will,
will, kind of let you derive these equations, uh, or,
or rather than trying to derive this yourself using, uh, uh, you know,
either yourself using Newtonian mechanics or finding the help of a physicist friend, uh,
there are also a lot of, um, uh,
open source, uh, physics simulators and software packages.
Where you can download an open source simulator plug in the dimensions
and mass and so on of your system, and then they'll spit out of the simulator like this.
It tells you how the state evolves from one time step to another time step.
All right, and so- but so in this example the simulator would say that,
um, S prime is equal to S plus,
you know, Delta t times I guess,
uh, times S dot,
where Delta t could be lets say 0.1 seconds, right?
So you want to simulate this at 10 hertz, uh,
so that 10, 10 updates per second so that
the time difference between the current state and
the next state is one-tenth of a second.
Then you write a simulator like this.
Okay? Um, and, but- and, and really,
the most common way to do this is not to actually derive the, um, uh,
physics update equations and the most common way to do this is to just
download one or the open source physics engines, right?
So, um, this will work okay for,
uh, problems like the inverted pendulum.
Um, I once used a sort of physics engines to build
a simulator for a four-legged robot and manage to used
reinforcement learning to get a four-legged robot to walk around, right?
So it, it, it works.
Although um, um,
the second way to get a model is to learn it from data.
All right, and I, I personally end up using this much more often.
So, um, here's what I mean.
There actually- let's say you want to build a,
uh, controller for an autonomous helicopter, right?
So, so really, this is a case study.
And what I'm describing is real,
like this will actually work, right?
Uh, let's say you wanna build, uh, uh,
let's say you haven't- let's say you have
a helicopter and you want to build an autonomous controller for it.
What you can do is, um,
start your helicopter off in some state S0, right?
So with, uh, GPS accelerometers, magnetic compass,
you can just measure the position and orientation of
the helicopter and then have a human pilot,
fly the helicopter around.
So the human pilot, [NOISE] you know,
using control sticks, will move the helicopter.
They'll, they'll, they'll command the helicopter with some action A0,
and then a 10th of a second later,
[NOISE] the helicopter will get to
some slightly different position and orientation as one.
And then the human pilot, you know,
will just keep on moving the control sticks, uh,
and rec- so you record down what actions they  are taken, A1.
And based on that, the helicopter [NOISE] will get to some new state S2,
and then they will [NOISE] take some action A2,
[NOISE] that get to some state S3,
[NOISE] and so on.
And, and [NOISE] let me just write this as S_T, right?
So in other words, what you do is, uh,
take the helicopter out to the field and hire a human pilot to fly this thing
for a while and record the position of the helicopter 10 times a second,
and also record all the actions that human pilot was taking on the control stick.
Okay. Um, and then do this not just one time,
but do this m times.
So let me use, uh, superscript 1.
[NOISE] Or you get the idea.
All that, great. Uh, to denote the first, uh, trajectory.
So you do this a second time, [NOISE] right?
And so on and, and, uh,
maybe do this m times, right?
So ba- thi- this is just a lot of math that's saying fly the helicopter around,
you know, m times, right?
And then record everything that happened.
And now, um, your goal is to apply, [NOISE] uh,
supervised learning, [NOISE] right?
To estimate S_t plus
1 as a function of S_t [NOISE] and A_t, right?
So the job of the model- the job of the simulator is to
take as input the current state and the current action,
[NOISE] and tell you where the helicopter is gonna go,
you know, the- like 0.1 seconds later.
And so, um, given all this data,
what you can do is apply a supervised learning algorithm to predict
wha- what is the next state S prime as a function of the current state and action, right?
And, and the other notation as [NOISE] in,
in when I drew that box for the simulator above,
I was using S prime to denote S_t plus 1 and,
uh, S and a, right?
So that's the mapping between the notations.
Um, and so [NOISE] if you use the linear regression version
[NOISE] of this idea,
um, you will say,
[NOISE] let's approximate S_t plus 1 as a linear function of the previous state,
plus another linear function of the previous state.
Um, and it turns out this actually works okay for helicopters flying at slow speeds.
This is actually not a terrible model, if, uh,
if your helicopter is moving slowly,
uh, and, and, and not flying upside down.
If, if your helicopter is flying in a relatively level way at kind of a slow speed,
this model is not too bad.
[NOISE] Um, if you're flying a helicopter in a highly dynamic situations, flying very fast,
making a very fast aggressive turn,
then this is not a great model but this is actually okay for slow speeds, right?
Um, and so I
guess A here will be,
uh, n by n matrix because,
uh, the state space is n-dimensional, you know, uh, uh,
so A is a square matrix and B,
um, will usually be a tall skinny matrix I guess,
whereas the dimension of B is
the dimension of the state space by the dimension of the action space.
Okay? [NOISE] And so,
um, in order to fit the parameters a and b,
[NOISE] you would minimize with respect to the parameters A and
B of this, [NOISE] uh,
okay. [NOISE] So you
wanna approximate S_t plus 1 as a function of that,
and so, you know,
pretty natural to fit the parameters of this linear model in a way
that minimizes the squared difference
between the left-hand side the right-hand side. Wait, did I screw up?
Yes.
Go ahead.
[inaudible].
Uh, say that again.
[inaudible].
Oh, sure. Uh, what's the difference between flying a helicopter m times versus flying a helicopter,
once, very, very long.
Uh, uh in this example,
it, it makes no difference.
Yeah. This, this is fine either way.
Uh, uh, u- u- unless, um, uh- yeah for practical purposes, it doesn't matter.
Uh, sorry. Uh, for,
for, um, for the purposes of this class, it doesn't matter.
For practical purposes, if you fly the helicopter m times,
it turns out the fuel burns down slowly.
And so the way the helicopter changes slowly and you wanna
average over how much fuel do you have or wind conditions,
this is what actually is done.
But for the purposes of understanding this algorithm,
flying a single time for a long time,
you know, works just fine as well.
Okay? Um, so this is the linear regression version of this, and, uh, uh,
and we, we actually talk about, uh, uh,
some other models later,
uh, called LQR and LQG.
Uh, you, you see this linear regression version of a model as well.
Just read, just a linear mo- model, the dynamics, right?
Uh, um, uh, we- we'll,
we'll come back to linear models dynamics later, uh, next week.
But it turns out that, um,
if you want to use a nonlinear model,
uh, you know, plugging in non-linear.
If, if you, you can also plug in,
right, Phi of S, you know,
and maybe phi prime of a as well,
if you want to have a lan- non-linear model.
Um, and, and this will work even better depending on your choice of features.
Okay? Now, um, [NOISE] finally,
having run this little linear regression thing,
where you- and it- it's not quite linear regression because A and B are matrices,
but, uh, but you can minimize this objective.
And it turns out to- this turns out to be equivalent
to running linear regression n times.
Um, so S has 12 dimensions.
This turns out to be equivalent to running linear regression
n times to predict the first state,
second state, third state variable, and so on, right?
That- that's- this is what- what this is equivalent to.
But having done this,
you now have a choice of two possible models.
One model would be to just [NOISE] say my model will set S_t plus 1
as A_St [NOISE] plus B_At,
uh, another version. [NOISE]
Would be to set St plus 1 equals A_st plus B_at plus Epsilon t,
where Epsilon t is distributed.
[NOISE] Uh, maybe from, uh, from a
Gaussian- from a Gaussian density.
Okay? Um, and so this first model would be a deterministic model,
and this model would be a stochastic model, right?
And, um, if you use a stochastic model,
then that's saying
that- [NOISE] right.
When you're running your simulator,
when you're running the model, um,
every time you generate St plus one,
you would be sampling this Epsilon from a Gaussian vector,
and adding it to the prediction of your linear model, and,
and if you use a stochastic model,
what that means is that, you know,
if you simulate your helicopter flying around,
your simulator will generate random noise that adds and subtracts
a little bit to the state space of
helicopter as if there were little wind gusts blowing it,
blowing the helicopter around, okay?
Um, and this is, uh,
uh, uh, yeah, right.
So, um, right.
So it turn- and,
uh, in, um, in most cases,
when you're building reinforcement learning models- oh,
and so the, the approach we're taking here,
this is called model-based reinforcement
learning where you're going to build a model of your robot,
and then let's train the reinforcement learning algorithm in the simulator,
and then take the policy you learn,
take the policy of how you learned in simulation and
apply it back on your real robot, right?
So this, this, this approach we're taking is called model-based RL.
[NOISE] Um, there is an alternative called model-free RL,
which is you just run your reinforcement learning algorithm on the robot directly,
and let the robot bash the robot around and so on, and let it learn.
I think that in terms of robotics applications, uh, um,
I think model-based RL has been taking off faster.
A lot of the most promising approaches are
model-based RL because if you have a physical robot, you know,
you just can't afford to have
a reinforcement learning algorithm bash your robot around for too long,
or how many helicopters do you want to crash before your learning algorithm figures it out?
Um, model-free RL works fine if you
want to play video games because if you're trying to get a computer or,
or, or, or play chess, or Othello or Go, right?
Because, um, you have a perfect simulator
for the video game which is a video game itself,
and so your, your,
your RL algorithm can, I don't know,
blow up hundreds of millions of times in a video game,
and, and that's fine, uh, for so- for,
for playing video games or for playing,
um, like, uh, you know, traditional games,
model-free approaches can work fine,
but I- most of the, um,
a lot of the, uh, uh, uh,
success applications of reinforcement learning robots have been model-based.
Although again, the field is evolving quickly so there's
this very interesting work at the intersection of model-based and model-free,
that that, that, gets more complicated, right?
But I- I- I want to say, if you want to use something tried and true, you know,
for robotics problems seriously because they're using model-based RL,
because you can then fly a helicopter in simulation,
let it crash a million times, right?
And no one's hurt, there's no physical damage anywhere in the world.
It was just, uh, uh, Okay.
And, uh, um, and- oh,
and just one last tip.
One thing we learned,
um, uh, building these,
uh, reinforcement learning algorithms for a lot of robots is that,
um, you know, having run this model,
you might ask, well,
how do I choose the distribution for this noise, right?
Uh, there- how, how,
how do you model the distribution for the noise?
Um, one thing you could do is estimate it from data.
But as a practical matter,
what happens is so long as you remember to inject- so let's see.
It turns out if you use a deterministic simulator, uh,
a lot of reinforcement learning algorithms will learn a very brittle model, uh,
that works in your simulator but doesn't actually
work when you put it into your real robot, right?
And so if you- if you actually look on YouTube or Twitter, um,
in the last year or two,
there have been a lot of cool-looking videos.
There are people using reinforcement learning to
control various weirdly-configured robots, like a
snake robot or some five-legged thing or some- whatever.
it's just a cool random,
I- I- this is- I- I- I'm not good at drawing this but, you know,
if you build a five-legged robot,
I don't even know what has five legs, right?
How do you control that?
It turns out that if you have a deterministic simulator,
um, using these methods,
it's not that hard to generate
a cool-looking video of your reinforcement learning algorithm,
supposedly controlling a five-legged thing or some crazy,
you know, a worm with, uh,
two legs or something, these crazy robots that you can build in a simulator.
But it turns out that, um,
uh, even those easy,
it's, uh, well, not easy.
Even though you can generate those types of videos in the deterministic simulator,
um, if you use a deterministic model of a robot, uh,
and you ever actually tried to build a physical robot,
and you take that policy from your physics simulator to the real robot, uh, the,
the odds of it working on the real robot are quite low,
if you use a deterministic simulator, right?
Because the problem with simulators is
that your simulator is never 100% accurate, right?
You know, it's always just a little bit off.
And one of the lessons we learned,
uh, that we've- I hope you learned, uh, [NOISE] uh,
applying RL to a lot of robots is that if you want
your model-based RL work to work not just in simulation and generate a cool video,
but you want it to actually work on a physical robot,
like a physical helicopter that you own,
that is really important to add some noise to your simulator.
Because if the policy you learn is,
um, robust to a slightly stochastic simulator,
then the odds of it generalizing,
um, uh, you know, to the, to the real world,
to the physical real world is much higher
than if you had a completely deterministic simulator.
So I think whenever I'm building a robot, right?
I- I- I pretty much- actually,
you know, I don't think I- oh,
with one exception- okay,
I [inaudible] will talk about that next week,
but with one, with one very narrow exception,
I pretty much never use deterministic simulators, uh, when,
when working on robotic control problems, unless- uh, uh,
assuming, assuming I want it to work in the real world as well, right?
Um, and, uh, and again,
you know, tips and tricks.
Uh, so, uh, the most important thing is to add some noise,
and then, uh, sometimes the exact distribution of noise.
Yeah, go ahead and try to pick something realistic,
but the exact distribution of noise actually matters less,
I want to say than just the fact of remembering to add some noise.
Okay.
[NOISE]
By the way,
I- you guys really don't know this,
but my PhD thesis, uh,
was, um, using reinforcement learning to fly helicopters.
So, so I'm trying to,
I don't know, so,
so you're telling me someone has crashed a bunch of
helicopters [LAUGHTER] model helicopters,
and has lived through the pain and the joys of seeing this stuff work or not work.
[LAUGHTER]
[NOISE]
All right. So now that you have built a model,
built a simulator, uh, for your helicopter,
for your four-legged robot or for your car, um,
how do you, um,
how do you approximate the value function, right?
So, um, in order to apply, um,
fitted value iteration, the first step is to choose features
of the state s. Right.
And then, um, we approximate v of s. You know,
we approximate v-star using a function v of s,
which is going to be Theta transpose Phi of s. Um,
and so, I don't know.
And so, uh, you know,
in the case of, uh,
uh, in, in, the case of the,
um, uh, inverted pendulum, right?
Then Phi of s,
maybe you have x, x-dot,
maybe you've x squared or x times x-dot or x,
uh, times the polar orientation, and so on.
So take, take your state to s,
and think up some nonlinear features that,
that you think might be useful for representing the value.
Um, and remember that what the value is,
the value of a state is your expected payoff from that state,
expected sum of discounted rewards.
So the value function captures,
if your robot starts off in that state,
you know, how well is it gonna do if it starts here?
So when you're designing features pick a bunch of features that you think hope convey,
um, how well is your robot doing. That makes sense?
And so, uh, maybe for the inverted pendulum, for example,
if the pole is way over to the right,
then maybe the pole will fall over given a reward of minus 1 when the pole falls over.
Right? Uh, but so, sorry.
I'm overloading the notation a bit.
Theta is both the angle of the pole as well as the parameters.
But, but, but if the pole is falling way over that looks extreme pretty badly,
unless, um, x-dot is very large and positive, right?
And so maybe there's interaction between Phi and x-dot.
So you might say, "Well, let me have a new feature,
which is the angle of the pole multiplied by the velocity."
Right? Because then-, uh,
because it seems like these two variables kind of depend on each other.
Um, so, so, so just as when you are trying to predict the price of a house,
you would say, "Well, what are the most useful features predicting the price of a house?"
Uh, um, you would do something similar,
um, for fitted evaluation.
And one nice thing about-, um, uh,
one nice thing about model-based RL is that once- model-based reinforcement learning,
is that once you have built a model,
you see a little bit that you can collect
an essentially infinite amount of data from your model.
Right? And so with a lot of data,
you can usually afford to choose a larger number of features,
because you can generate a ton of data with which to fit this linear function.
And so, you know, you- you're,
you're usually not super constrained in terms of, uh,
needing to be really careful not to choose
too many features because of fear of overfitting.
You could get so much data from your simulator that, you know,
you can usually make up quite a lot of features,
uh, and then some of the features end up not being useful, it's okay.
Because you can get enough data for running
your simulator for the algorithm to still fit a pretty good set of parameters Theta,
even if you have a lot of features.
Because you can have a lot- you can generate a lot of data to fit this function.
Okay. So, um, let's talk through the fitted value iteration algorithm.
Let's see. All right. You know what?
This is a long algorithm.
Let me just use a fresh board for this.
[NOISE].
All right. So, uh,
let me just write down the original value iteration algorithm for these v states.
Uh, so what we had previously was we would update V of s according to R of s,
plus Gamma, max over a, right?
So this is what we had, um, last Monday.
And, uh, I said at the start of today's lecture that you can also write this as this.
[NOISE].
Okay. So let's take that and generalize it to a fitted value iteration.
[NOISE].
All right. Um, so first,
let's choose a set of states
randomly, and let's initialize the parameters to equal 0, okay?
Um, and what we're going to do is where-,uh,
so, so let's see.
In linear regression, you learn a mapping from x-y,
and you have a discrete set of examples for x,
and you fit a function mapping from x and y.
So and what we're going to do here,
we're going to learn the mapping from s to v of
s. And we are going to take a discrete set of examples for s,
and try to figure out what is v of s for them,
and then for the straight line, you know,
to try to model this relationship, right.
So, so just as you have a finite set of examples,
a finite set of houses that you see a certain set of
values of x in your training set for predicting housing prices.
We're gonna see, you know,
a certain set of states,
and then use that finite set of examples to use linear regression to fit v of s. Right?
So that's what this initial sample is meant to do.
And so, um, this is the outermost loop of value iteration- of fitted value iteration.
And then for i equals 1 [NOISE] through m.
[NOISE]
Let's see, [NOISE] uh.
All right. So, um,
what we're going to do is, um,
go over each of these m states, uh,
go over each of these m states, right,
and for each one of them, um,
we're going to- and for each one of those states of each one of those actions,
we're going to take a sample of k things in order to estimate that expected value.
Right. And so this expectation
is over S prime drawn from this state transition distribution.
They say, you know, from this state,
if you take this action where you get to the next.
And so, uh, these two loops this for i
equals 1 through m. And for each action a
this is just looping over every state and every action,
and taking k samples.
Sampling k samples of where you get to if you take an action a in a certain status.
Right. And so [NOISE] um, uh,
and by taking that k examples and computing this average q a,
right, is your estimate of that expectation.
Okay. So, so all we've done so far is,
uh, take k samples, you know,
from this distribution of with S prime is drawn and average V of s. Oh,
actually, uh, oh, I'm sorry.
And, uh, if I move R of s inside,
sorry, then that's q of a. Yeah.
Okay, that makes sense?
[NOISE] Sorry.
Let me just rewrite this to move R of s inside [NOISE].
Fix this up a little bit. So this is written as Gamma.
If you write this as max over a,
of R of s plus Gamma, uh, [NOISE]
Yeah. Okay. Yes, sorry.
So we move the max and expectation out,
then this is, this is q of a.
Okay?
Um, next,
let's set y i equals max over
a of q of a [NOISE].
And so by taking the max over a of q of a,
um, that's what y i is.
Is your estimate at the right-hand side of value iteration.
Okay. [NOISE] And so
y i is your estimate for,
um, for this quantity,
for the right hand side of value iteration.
Now, in the original value iteration algorithm,
um, I'm, I'm just using VI to approximate that to abbreviate value iteration.
In the original algorithm,
what we did was we set V of S i to be equal to y i, right?
In the original value iteration algorithm,
we would compute the right hand side, this purple thing,
and then set V of s equals to that, right,
just set right-hand side equal to- I
set the left hand side equal to the right-hand side.
But in, um, fitted value iteration, you know,
V of s is now approximated by a linear function.
So you can't just go into a linear function,
and set the value of the points individually.
So what we're going to do instead is in fitted Vi,
we're going to use linear regression to make V of Si as close as possible to yi.
But V of Si is now represented as a linear function of the state.
So a linear function of the features of state.
So V of Si is Theta transpose Phi of Si,
and you want that to be close to yi.
And so the final step is run
linear regression to choose
the parameters Theta that minimizes the squared error, okay? [NOISE]
Does that make sense?
Okay, um, oh, yes. Let me just make my curly braces match.
Yeah. Okay, okay.
So that's fitted. Uh, go ahead, question?
[inaudible].
Oh, this one? Oh, this one?
Oh, no, the, the m is used differently.
Uh, so when we were learning a model m was
just how many times you fly the helicopter in order to build a model.
And the number of times you fly the helicopter in order to build a physics model,
to build a model, the helicopter dynamics has,
has nothing to do with this m,
which is the number of states you use in order to,
sort of, anchor, or in order to, uh, uh,
so I think I'm actually- so the,
the, the way to think about this is,
is you want to learn a mapping from states to B of S.
And so, uh, this sample,
this m states is- we're gonna choose m states on the x axis, right?
So, uh, and that m is the number of points you choose on the x axis.
And then in each, uh, iteration,
the value iteration we're gonna go through this procedure.
So you have sub S1 up to Sm.
Right. And then for each of these,
you're going to compute some value yi using this procedure.
And then you fit a straight line to the sample of yi's.
[inaudible].
Uh, think of this- think of the way you build a model and the way you
apply fitted value evaluation as two completely separate operations.
So, um, you can have one team of ten engineers flying a helicopter
around 1,000 times, build a model,
run the linear regression and then they have a model and
then they could publish the model on the Internet and
a totally different team could download their model and
do this and the second team does not need to talk to the first team at all,
other than downloading the model off the Internet.
There is a question.
[inaudible]
Oh, yes. Good question.
You mean they're sampling, they're sampling  k times, right?
Yeah. That's a great question, yes.
That was a- yes.
That was one my next points which is the reason you sample from this distribution
is because you're using- so you should do
this if you are using a stochastic simulator, right?
And then actually it does.
Actually, I just wanted to ask you guys what should you do?
How can you simplify this algorithm if you use
a deterministic simulator instead of a stochastic simulator?
Oh, well, let's see. So if you use a determinic- deterministic simulator then, you know,
given a certain state at
a certain action it will always map to the exact same S-prime right?
So how can you simplify the algorithm?
[inaudible] action instead of drawing k times,
you only need to draw once.
Yeah, yeah, cool. Great. Yes. So if you're a deterministic simulator,
you can set k equals 1 and set the sample only once because this distribution,
it always returns the same value.
So all of these k samples would be exactly
the same so you might as well just do this once rather than K times.
Make sense? Okay cool. Yeah.
[inaudible]
Oh, this one?
[inaudible]
Oh, no. This is, um,
this is actually a square bracket.
Um, the thing is, um,
we're trying to approximate this expectation and the way you
approximate the mean is you'd sample k times if you take the average, right?
Right. So- so what we've done here is in order to approximate this expectation,
we're gonna draw k samples and then sum over
them and divide by k. So you average over the k samples.
All right, cool. Got some more question?
What's the little [inaudible] how many states
you'll get from K sample and [inaudible]
Let's see. So how do you choose M and how do you test for overfitting and so,
you know, one- once you have a model,
one of the nice things about model-based RL is let's say that Phi of S,
right, let's say that Phi of S has 50 features.
So let's say you chose 50 features to approximate
the value function of your inverted pendulum system.
Then we know that- you know that you're going to be fitting linear regression,
right, to this 50-dimensional state-space.
I mean this step here,
this is really linear regression, right?
And so you can ask,
if you want to run linear regression with 50 parameters,
how many examples do you need to fit linear regression?
And I will say you know if M was maybe 500,
right, maybe you'd be okay.
You have 500 examples to 50-50 parameters.
But if for computational reasons,
if- if it doesn't run too slowly ,
to even set M equals 1,000 or even 5,000,
then there's no harm to letting M be bigger.
So usually M, you might as well set to be as big as you feel
like, subject to the program not taking too long to run because it- it,
you know if- if you're,
um, if you're fitting- unlike supervised learning,
if you're fitting data to housing prices,
um, you need to go out and, you know,
collect data right off Craigslist or- or what's
on Zillow or Trulia or Redfin or whatever about prices of houses.
And so data is expensive to collect in the real world.
But once you have a model,
you can set M equals 5,000 or 10,000 or
100,000 and just- and then your algorithm will run more slowly.
But as long as your algorithm doesn't run too slowly,
there is no harm to setting M to be bigger. Makes sense?
Um, all right cool.
So, um, so I know that there's a lot
going on to this algorithm but this is fitted value iteration.
And if you do this, uh, this,
you can get reasonable behavior on a lot of
robots by choosing a set of features and learning
the value function to approximate the value of the- really
approximate the expected payoff of a robot starting off in different states.
Okay. Um, now just a few details
to wrap up, again, some practical aspects of how you do this.
After you've learned all these parameters,
this- you've now learned- go ahead, yeah.
[inaudible]
Oh, I see. Yes, thank you.
Um, yes.
So in this, um,
expression, where do you get V of S prime j from?
Yes. So you would get this from Theta transpose Phi of S prime j,
using the parameters of Theta from the last iteration of fitted value iteration.
Ju- just as in value iteration,
this is the values from the last iteration that you use to update a new iteration.
So then you use the last value of Theta to update the new one. Yeah, thank you.
Cool. Oh, and, um,
one- one other thing you could do which is, um,
I talked about the linear regression version of this algorithm which is, you know,
this whole- this whole exercise is about generating a sample of S
and Y so you can apply linear regression to
predict the value of Y from the values of S, right?
But there's nothing in this algorithm that says you have to use linear regression.
In order to- now that you've generated this dataset,
that's this box that I have here,
this- this is linear regression,
right, but you don't have to use linear regression.
In modern yo- deep reinforcement learning, um,
one of the ways- well one of the ways to go from reinforcement learning
to deep reinforcement learning is to just use a neural network with this step instead.
Then you can- then- then you call that deep reinforcement learning where- no.
But, hey, it's legit, you know.
[LAUGHTER]
Um, uh, but, but,
you can also use locally weighted linear regression
or whatever regression algorithm you want in order
to estimate y as a function of the state s. Yeah,
and actually if you use a neural network,
it relieves the need to choose features Phi as well,
you can feed in the raw features.
You know, your angle, your orientation and,
and using neural networks,
to learn that mapping in a supervised learning way.
Okay, um, all right.
So one last, ah, ah, important,
ah, I guess practical implementational detail, which is, um,
fitted VI right, uh, gives, uh,
approximation to V star.
And this, um, implicitly defines Pi star.
Right, because the definition for Pi star is that, um,
right. So, um, when you're running a robot,
you know, you need to execute the policy Pi right,
given the state you're gonna pick an action,
given the state you're gonna pick an action.
And, and having computed V star,
it only implicitly defines the optimal policy Pi star.
All right, um, and
so ah if you're running a robo- if you're running a robot in real time,
then you know actually if you fly a helicopter,
you might have to choose control actions at 10 hertz
meaning 10 times a second you're given the state, you have to choose an action.
Uh, uh, if you're building a self-driving car,
again a 10 hertz controller wo- would be pretty reasonable.
I guess choose a new action and maybe 10 times a second would be pretty reasonable.
Um, but how do you compute this expectation and this maximization 10 times per second?
So, um, in what we use for fitted value iteration,
we used right, a sample,
uh, of- we use k examples to approximate the expectation.
Right, but if you're running this, um,
in real time on a helicopter, you know,
probably you don't want to, uh, uh, uh,
at least I know for my robotics implementations I have been reluctant to use
a random number generator right in the inner loop of how we control a helicopter.
Right it, it, it might work but I,
but I think, you know,
it's approximately- if you want to compute
this arg max, you need to approximate this expectation and
do you really want to be running a random number generator on a helicopter?
And if you're really unlucky the random number gen-
generator generates an unlucky value,
will your helicopter do something you know, bad and crash?
Oh, I, I, I would,
again just emotionally I don't feel very good if,
uh, your self-driving car has a random number generator and,
and a loop of how it's choosing to drive.
Right, um, so just as a practical matter, ah,
ah, ah, there are a couple of tricks that people often use.
Which is, um, the simulator
is often of this form, right.
Okay, so most simulators have this form,
next state is equal to some function of the pre- uh,
previous state and action plus some noise.
And so one thing that is often done is, um,
for your deployment or for the,
you know for the, for, for,
for the actual policy you implement on the robot.
Um, set epsilon t equals 0 and set k equals 1.
Right and so, um,
so, so, so this,
this is a reasonable way to make this policy run on a helicopter,
which is during training you do want to add noise to
the simulator because it causes a policy you learn to be much more robust.
So little errors in the simulator,
your simulator is always a little bit off.
You know maybe it didn't quite simulate wind gusts
or when you turn the helicopter does it bank exactly the right amount.
Simulator is always, you know,
it's in practice is always a little bit off.
Um, so it's important to have noise in the simulator in model based RL.
But when you're deploying this in a physical simulator, um,
one thing you could do that'll be very reasonable is just get rid of
the noise and set k equals 1 and so what you would do is,
um, uh, let's see.
Um, whenever you're in the state s,
pick the action a according to
arg max over a of v of s, a.
Right so, uh, this f is this f from here.
So this is the simulator
with the noise removed.
Okay and so what you would do is actually,
and, and, you know, computers are now
fast enough you can- you could do this 10 times a second.
Right if you want to control a helicopter or a self car at 10 hertz,
you could actually easily do this, you know, at,
at, at 10 times a second,
which is your car or your helicopter is in some physical state in the world.
So you know what is S and so you can
quickly for every possible action a that you could take,
use a simulator to simulate where your helicopter will go,
um, if you were to take that action.
So go ahead and run your simulator,
you know, once for each possible action you could take.
Right, computers are actually fast enough to do this in real time.
Um, and then for each of the possible next actions you could get to,
compute v apply to that.
Uh, so, so this is really right S prime, um, uh,
drawn from P_sa, uh,
but with a deterministic simulator, right.
Right, so every 10th of a second you could
in your simulator try out every single possible action,
use your simulator to figure out where you would go under each and every single possible action,
and apply your value function to see of all of
these possible actions, which one gets my helicopter,
you know, in the next one-tenth of a second to the state that
looks best according to the value functions you've learned from fitted value iteration.
Okay, um, and it turns out if you do this then you can,
this is how you actually implement something that runs in real time.
And, oh, and I just mentioned, you know, the, the,
the idea of a training with stochastic simulator and just setting the noise to zero,
it's one of those things that's not very vigorously justified but in practice this,
this works well. Yes, question?
[inaudible].
Oh yes. Ah, so, so, um,
for the purposes of this, you can assume you have a discretized action space,
ah, and it turns out that for a self-driving car it's
actually okay to discretize the action space.
Uh, for a helicopter,
we tend not to discretize the action space but, um,
it turns out if f is a continuous function,
then you can use other methods as well.
Right this is about optimizing over the, I,
I didn't mean to talk about this and sorry this is getting a little bit deeper.
But, uh, even if a was a continuous thing, uh,
you can actually use real time optimization algorithms, uh,
to very quickly try to optimize
this function even as the function that it contains actually.
Uh, there's a literature on something called model predictive control which,
which can actually, you can actually do these optimizations in
real time and use to fly a helicopter. Just one last question.
So what's your different action you'll transition from the next stage?
So how do you know when you're still looking?
Do you make an observation or do you use your [inaudible]?
Wait oh, uh, uh say, what's the question again?
So once you are, like when you have had the helicopter,
once you pick an action you'll transition to the next stage so do you make
an observation to set up where you are or do you use the [inaudible].
Oh, I use an observation, yeah, yes, yes.
So you take an action and then
your helicopter will do something, there will be some wind,
your model may be off and so you would then a 10th of a second later,
take another you know GPS reading, accelerometer reading,
magnetic compass reading and use
the helicopter sensors that tell you where you actually are.
Now, cool. Okay, cool.
All right, I hope, uh, yeah,
hope- hopefully this was helpful.
I feel like, you know, the- I think it's fascinating that the
excitement about self-driving cars and flying helicopters and all that,
it gives well-balanced equations like these.
I, I think that's kinda cool.
[LAUGHTER] Okay, that's great.
Thanks, I'll see you guys next week.
