Hi everyone, I'm Dorsa, uh,
and this week I'll be teaching
the state-based models and the plan is for the next couple of weeks for me to,
to teach the state-based models MDPs, uh,
and games and then and after that Percy,
we'll come back and talk about the later,
some of the later topics.
So a few announcements.
Uh, so homework 3 is out.
So just make sure to look at that.
And then the grades for homework 1 will be coming out soon.
So just yeah, be aware of that.
All right. So, so let's talk about state-based models,
let's talk about search.
So just to start,
I was thinking maybe we can start with this question.
Uh, if you can,
let me reset this.
So basically, okay, let me tell you what the question is and then think about it,
and then after that I will get this working.
So, so the question is you have a farmer and the farmer has a cabbage,
a goat, and a wolf,
and it's on one side of the river.
Everything is on one side of the river.
So you have this river.
We have a farmer.
We have the farmer with a cabbage,
with a goat, and with a wolf, okay.
And the farmer wants to go to the other side of the river and take everything with, with,
with himself, um, and- but the thing is the farmer has
a boat and in that boat can only fit two things.
So the farmer can be in it with,
with one of these other things, okay?
So the question is how many crossings can,
can the farmer do to take everything on the other side of the river?
And there are a bunch of constraints,
the constraint is if you leave
the cabbage and goat together the goat is going to eat the cabbage.
So you can't really do that.
If you leave wolf with the goat,
the wolf is going to eat the goat, you can't really do that.
How many crossings should you take to take everything to the other side?
Think about it, talk to your neighbors, I'll get this working.
Everyone clear on the question? Okay.
So the link doesn't work because,
uh, we can't connect to Internet,
but all right so.
Okay. So how many people think it is four?
Four crossings. Five, five crossings.
Six, six.
Some people think six.
Seven? More people.
No solution? No solution.
Okay. So the point is actually not like what the answer is,
we'll come back to this question and try to solve it,
but I think the important points to,
to think about right now is how you went about solving it.
So, so what were you thinking and what was the process
that you were thinking when you were trying to solve, solve this problem.
And that is kind of the commonality that search problems have and,
and we want to think about those types of problems where it's,
it's more challenging to answer these types of
questions and let's say reflex based type of questions.
So, so that's kind of just a motivating example that we'll come back later.
And here's an XKCD on this.
So basically one potential solution is the farmer takes the goat,
goes to the other side, comes back,
takes the cabbage, goes to the other side and just
leaves the wolf because why would he need a wolf,
why would a farmer need a wolf.
So [LAUGHTER] if you answered four,
you probably were thinking about this.
[LAUGHTER] And I guess it has like
an interesting point in it because sometimes maybe you should change the problem.
Your model is completely wrong.
Maybe, maybe sometimes you should rethink and go back to your model and try to fix that.
But anyways. So we'll come back to this question.
So all right.
So this was our guideline for the class,
and, and we have already talked about the reflex-based model.
So we have talked about machine learning and how that can get applied,
and now we want to start talking about state-based models.
This week, we're going to talk about search problems,
next week, MDPs, and then the week after we're going to talk about games.
If you remember the kind of the guideline that,
that we had for the class was, uh,
we were thinking about these three different paradigms of,
of modeling, all right,
we talked about this already.
So modeling, inference, and learning.
So for, for reflex-based models we talked about this already, right?
So what would the model be,
well, it can be a linear predictor or it can be a neural network.
So, so that was a model.
And then we talked about inference but in the case
of reflex-based models it was really simple,
it was just function evaluation.
You had, you had your neural network and you would just
go about evaluating it and that was inference.
And we also spent some time talking about learning.
So how would we use like let's say gradient descent
to try to fit the parameters of the model, okay.
So similar thing with search-based models.
You want to talk about these three different paradigms that we have in the class, and,
and the plan is to talk about models and
inference today and then on Wednesday we'll talk about learning.
We kind of have the same sort of format next week too.
So we're going to start talking about modeling and inference on Mondays,
Wednesdays are going to be about learning.
So, so just to give you an idea of what the plan is.
All right. So, so what are search problems?
Let's start with a few motivating examples.
So, so one potential example one can think of is, is route finding.
So you might have a map and you want to go from point A to point B on the map,
and you have an objective.
So you want to maybe find the shortest path or the fastest path or most scenic path.
That is your objective and the things you can do is you can take a bunch of actions.
So you can do things like go straight,
turn left, turn right,
and then the answer for the search problem is going to be a sequence of actions.
If, if you want to go from A to B with the shortest path,
the answer that one would give is maybe turn right
first and then turn left and then right again or any,
any of these sequences.
Okay so, so this is just a canonical example of what a search problem is.
There are a few other examples.
So for example you can think of robot, robot motion planning.
So if you have a robot that wants to go from point A to point B,
then it might want to have different objectives for doing that.
So again the question might be what is the fastest way of doing it
or what is the most energy efficient way of getting the robot to do that or,
or what is the safest way of doing it.
Like another question that we are interested in is what is the most expressive or,
or legible way of robot doing it so,
so people can understand what the robot really wants.
So you might have again various types of objectives you can formalize that,
and then the actions that,
that you can take in the case of
the robot motion planning is the robot is going to have different joints,
and each one of the joints can translate and can rotate.
So translation and rotation are the type of actions that you can take.
So, so in this case I have a robot with seven,
seven joints and then I need to tell what each one of
those joints should do in terms of translation and rotation.
That's your robot?
This is my robot, yes.
[LAUGHTER] It's a fetch robot.
[LAUGHTER] All right.
So, so let's look at another example.
So games is, is a fun example.
So you might, uh,
think about something like Rubik's cube or,
or this 15-puzzle, and again what do you wanna do as a search problem?
Well, you wanna, you wanna end up in configuration that's desirable, right?
So you wanna end up in a configuration where,
where you have this type of ah, configuration on Rubik's cube or,
or the 15 puzzle.
So that, that is the goal, that's the objective.
And then the action is you can move pieces around here.
So, so the sequence of actions might be how you're moving these pieces
around to get that particular configuration of the 15 puzzle, okay.
So again another example of what a search problem is.
Um, machine translation is,
is an interesting one if it's not necessarily
the most natural thing you might think about when you think about search problems,
but what it is actually you can think about it as a search problem again.
So imagine you have a phrase in
a different language and you want to translate it to English.
So what is the objective here?
Well you can think of the objective as going to fluent English and preserving meaning.
So, so that is the objective that one would have in machine translation.
Um, and, and then the type of actions that you're taking is you're appending words.
So you start with the and then you're
appending blue to it and you're appending house to it.
So, so as you're appending the- these different,
different words, those are the actions that you're taking.
So, so in some sense you can have any complex sequential task and,
and the sequence of actions that you would get to get
to your objective is there's going to be the answer for,
for your search problem and you can pose it as a search problem, okay?
All right. So, so what is different
between let's say reflex-based models and, and search problems?
So, so if you remember, reflex-based models the idea was you'd have
an input x and then we wanted to find this f for example a classifier that,
that would output something like,
like this y which is labeled, it's a plus 1 or minus 1.
So, so the common thing in,
in these reflex-based models was we were outputting this,
this one label, this one in this case action being minus 1 or plus 1.
Again in search problems,
the idea is I'm given an input,
I'm given a state,
and then given that I have that state,
what I wanna output is a sequence of actions.
So I do want to think about what happens if I take this action like
how is that going to affect the future of my actions.
Okay. So, so the key idea in search problems is
you need to consider future consequences of,
of the actions you take at the current state. Yes.
Is this like not equivalent to like just outputting
one thing and then like rerunning the function,
on like the updated state?
So if you rerun it.
So, so the question is, yeah, is it not the same as like I'm rerunning it,
I output a thing and then I rerun it again.
And you could do that, but that ends up being a little bit of a- that would be some-
similar to a greedy algorithm where
like let's say I want to get to the door and I want to find,
find the fastest way and right now if I just look at like
my current state maybe I think the fastest way of getting there is going this way.
But if I actually think about a horizon and I think about how
this action is going to affect my future
I might come up with a different sequence of actions.
Okay? All right.
Okay. So and, and you've already seen this paradigm so let's start
talking about modeling and inference during this class.
So this is the, the plan for today.
So we're going to talk about three different algorithms for,
for doing inference for search problems.
So, so we're going to talk about tree search which is
the most naive thing one could do to solve some of these search problems,
but that's the simplest thing we can start with.
And then after that you want to look at improvements of that doing dynamic programming or,
or uniform cost search.
So, um, the difference between search-based problem and reflex-based problem,
the very fact that in a reflex-based problem,
the output that you gave does not influence a string,
and it doesn't search?
Yeah. Tha- that's true. Yeah so,
so the output that you get in search problem it
is an action that actually influences your future.
Yeah, that's a good way of actually thinking about it.
Yes. All right.
So, so let's talk about tree search.
So let's go back to our favorite example.
Um, okay so we have the farmer,
cabbage, goat, and wolf.
So let's think about all possible actions that one can take,
when we have this farmer,
cabbage, goat, and wolf.
Okay. So, so a bunch of things we can do is a farmer
can go to the other side of the river with the boat alone.
So, uh, this triangle here just means like going to the other side of the, uh, the river.
The farmer can take the cabbage.
So C is for cabbage G is for,
ah, goat, W is for wolf.
So another possible action is the farmer takes a cabbage or the farmer
takes the goat or the farmer takes a wolf and goes to the other side of the river.
We also have a bunch of other actions.
The farmer can come back.
The farmer can come back with the cabbage,
come back with the goat, come back with the wolf.
So I'm basically numering- enumerating all possible actions that,
that one could ever do.
And sure none of- like not- some of these might not be possible in
particular states but I'm just creating this library of actions things that are possible.
Okay. So then when we think about the, ah,
this as a search problem,
we could create a search tree.
Which, which basically starts from an initial state of where things are and
then we can kind of think about where we could go from that initial state.
So the search tree is more of, ah,
what if- what if tree which,
which allows you to think about what are the possible options that, that you can take.
So, um, conceptually what- what it looks like is you're starting with your initial state,
where everything is on one side of the river.
So those two lines are the riv- the river the blue lines.
Um, and you can take a bunch of actions,
right like one possible action is you can take the cabbage
and go to the other side of the river and you end up in that state.
And that state is not a good state.
I am making that red. Well, why is that.
Because the wolf is going to eat the goat. That's not that great.
Okay. Um, and, and every action,
every crossing let's say ma- let's say every crossing takes cost of one.
So that one that you see on the edge is the cost of that action.
Okay. So that didn't really work that well.
What else can I do? Well, I can,
I can do another action.
I can, I can- from the initial state,
I can take the goat and go to the other side of the river,
that ends up in this configuration.
From there the farmer could come back,
take the cabbage, go to the other side,
end up in this configuration,
the farmer can come back.
That's again, not a great state because
cabbage and goat are left on the other side of the river,
goat is going to eat the cabbage.
That's not great. What else can I do?
Well, the farmer can come back with the goat.
And then once the farmer comes back with the goat,
the farmer leaves the goat, takes the wolf, goes to the other side,
comes back gets the goat again.
And then boom, you're done.
Okay. So- so how many steps does this take?
Well, one, two, three,
four, five, six, and seven.
So- so the ones who answer seven that was the right answer.
Um, and that is kind of the idea of getting to this end state. Yes.
So to be specifically, ah,
not include the option that the going back to the previous state even
though that's a valid next step just because we know that there's something-
So you could have this giant tree where you
go to different states but we can actually have like
a counter that tells you if I have visited that state and
if you have visited that state maybe you don't want to go there again because,
because you have already explored all the possible actions from there.
You're not done with this tree though, right?
Like I've, I've found that this good state here,
but maybe there's a better way of, like getting there.
I don't know yet. I haven't explored everything.
So, so what I can do is,
I can actually explore all these other things that, that one could do.
And I'm not gonna go over them.
But there is another solution,
and turns out that other solution also takes seven steps.
So it's not necessarily a better solution, but,
but you've got it for all of that because there could be another solution later on that.
That is, uh, better than the seven steps.
Okay. All right. Yes.
Are these slides up?
They are, they should be.
Okay.
Slides are up. Okay. Um, all right.
So, so this is how the search tree looks like. Yeah.
I'm just asking [inaudible]
Oh, that's a very good point.
Thank you for- [LAUGHTER] thank you, so for SCPD students I'll try to repeat the questions.
I always forget this.
Um, I'll try to repeat the question.
The question was, ah, was the slides, uh,
the slides aren't up, they're up, they should be up.
So okay. All right.
So, uh, going back to our search problem.
Ah, so we can try to formalize this search problem.
So, so let's actually think about it more formally.
So what are the things that we need to keep track of.
So, so we have a start state.
So let's defined a start to be the start state.
In addition to that we can,
we can define this function called actions
which returns all possible actions from states.
So actions as a function of state.
If I'm in a state, that basically tells me what are the actions I can take from there.
I can, I can define this cost function.
So this cost function,
takes a state and action and tells me what is the cost of that and in this example,
the cost of crossing the river was just one but
you can imagine having different costs values.
Ah, we can have a successor function that basically takes a state and action and,
and tells us where we end up at.
So if I'm in state S and I take action A where would I end up at?
And that's the successor function.
And then we're going to define an IsEnd function,
which basically checks if you're in an end state where
we don't have any other possible actions that you can take. Yes.
So these are the [inaudible] I got a call?
You can, you can think of it as, yeah, as a way of like finite state machine type of,
type of, uh, way of looking at it.
Yeah. So like we- we use a similar type of formalism,
uh, for MVPs and games too.
So this is good idea to get like all these formalisms right.
But start state, transitions, costs.
Those sort of things. Okay. Yes.
What's the [inaudible] like [inaudible].
Ah, say it again so.
A cost [inaudible] like.
Cost?
Position and action, and action already concerns the state.
So then- so- so the action,
okay, so action depends on state.
So you start from start state where you haven't taken any actions right,
and then from that start state then you can think about all possible like right up there.
So you're under that start state,
and there you can think about all possible actions you can take,
and then those actions depend on
current state but they don't depend on the future state, right.
So based on like the current state,
everything is on one side of the river.
I can think about all possible actions I can take and where I know- where I end up at.
And then, after that like the next action depends on that.
Yeah, that's it. So it's a sequential thing. Okay. Yes.
You have all the information on the actions and the cost that you could do beforehand,
how is this conceptually different than like a min cost flow convex optimization?
You can think of it. Okay. So- so how- how is it
different from a kind of convex optimization type of role?
So- so we have- we have an objective here and then you can think
of what that objective is and based on what that objective is,
we can have different methods for solving it, right?
So- so you can basically formulate this as an optimization problem where you
saw- you look for the solution to
a search problem as an optimization problem too that's perfectly,
a perfect way of doing it.
And, and we're going to talk about various types of
methods for- for solving this problem today.
Okay. All right.
So- so let's look at another example.
So, um, this is, um, transportation problem.
Now I'll just move this.
So, um, okay.
So basically, what we wanna do is we have street blocks from 1 through N. So 1,
2, 3, 4, so on.
So these are street blocks and N is here.
And what we wanna do is we basically want to travel from,
from 1  to, to some N number.
And we have two possible actions.
So at any state,
let's say I'm in state S. At any state,
I can either walk,
and if I walk I end up in S plus 1.
So if I'm in 3,
I'm going to end up in 4.
And walking takes one minute.
Or I can take this magic tram.
And this magic tram, takes any state S to 2 times S.
So if I'm in  3,  then I am going to end up in 6 by taking the magic tram.
And the magic tram always takes two minutes, doesn't matter from where to where.
So, so if I'm in 2, I will end up in 4,
if I'm in 5 I can end up in 10 by taking the tram.
Okay. So, so I have two possible actions in any of these states.
And what I want to do is, I want to go from 1 to N and then
I want to basically do that in the shortest, uh, time possible.
Okay. So with the- with the least amount of costs. That's the problem, makes sense?
Okay. All right.
So, so this is kind of like,
what the search problem is.
So what we wanna do is first off, you want to just formalize it.
Uh, and I'm gonna do that here.
I'm not gonna do live solutions because I'm not Percy,
and I did that once and it was a disaster.
So [LAUGHTER] we are going to,
uh, yeah I taped these in 2018.
Uh, but, uh, basically,
we're going to go over it together.
So, so let's just do that.
Um, so we're going to define the search problem, this tram problem.
So we're gonna define a class for transportation problems.
So we're going to separate our search problems from
our algorithms because remember modeling is separate from inference.
So let's just have a constructor for this transportation problem.
It takes N, because we have N blocks.
Okay. So N is the number of blocks.
Okay. All right.
So, so then you have- we still have a start state.
We're starting from 1 so block 1.
And then we need to define IsEnd state.
So IsEnd state basically checks if you've reached N or not.
Because, because we have to get to the Nth block.
Okay. All right. So what else do we need?
So we have a successor function.
We also have a cost function.
I'm gonna put both of them together,
because, because that is just easier.
So the successor and cost function,
I'm saying let's just give it state S. And then given a
state it's going to return this triple of action, new state, cost.
So I give it a state, let's say initial state,
and then it just returns all possible actions,
within new states I can end up at and how much does that cost.
Okay/. So what are my options?
Well, if I'm state S, I can walk to s plus 1 that costs 1.
If I'm in state S,
I can take the tram, I can end up in 2S,
and that costs 2.
Okay. So that's how I'm creating my triples.
And, and I need to check if I don't pass the Nth block.
Remember, like we have N blocks so we don't want to pass the Nth block.
Okay. So, so that's just to make sure that we don't pass it.
So we are still below the Nth block.
And, and this is what my successor and cost function will return that, the triples.
Okay. So let's just return that.
Okay. So that is my transportation problem.
Let's make sure it does the thing the way we want it.
So let's say we have 10 blocks,
and now I wanna print my transportation- my successor and costs function.
Let's say I'm returning successor and cost for 3. What should I get?
So from 3, I can have two actions, right.
I can either walk or I can take the tram.
If I walk, uh, it costs 1.
If I take the tram, it costs 2.
I'll end up in 4 or 6.
Let's just try. I don't know 9.
If I'm in state 9, I can only do one thing, I can walk, right?
Because remember, the, the block is- number of blocks is 10 and I can't go beyond that.
So- all right.
Um, okay. So that was,
um, [NOISE] yeah, let's go back here.
So that was just defining,
uh, the search problem, [NOISE] okay?
And, and I haven't told you guys like how to solve it, right?
This is- we are just doing the modeling right now.
So we just modeled this problem. We just coded it up.
Modeling it means, what is this- what are,
what are the actions, what is a successor function,
what is a cost function,
defining an is end function,
saying what, what the initial state is, okay?
So, so now I think we are ready to think about the algorithms in terms of,
like, going and solving these types of search problems, okay?
So the simplest algorithm we want to talk about is, is backtracking search.
So the idea of backtracking search is- maybe I can draw a tree here,
is you're starting from
an initial state and then you have a bunch of possible actions.
And then you end up in some state and you have a bunch of other possible actions.
[NOISE] Let's say you have two actions possible.
And this can become- [NOISE] this exponentially blows up so I'm going to stop soon.
[LAUGHTER] All right.
So, so we create this tree and this tree has some branching factor.
That's the number of actions you have at,
at every, at every state.
And then it also has some depth.
[NOISE] So that is how many levels you go down.
[NOISE] So let me just define that with D, okay?
And now there are solutions down in these notes, right?
So, so we wanna figure out what those solutions are.
And backtracking search just does the simplest thing possible.
What it does is, it starts from
this initial state and it's going to go all the way down here.
And if it doesn't find a solution,
it's gonna go back here and then try again and try again.
And it's gonna go over all of
the tree because there might be a better solution down here too.
So it needs to actually go over all of the tree, okay?
So I'm gonna have a table of algorithms because we're gonna talk about a few of them here.
Algorithms, [NOISE] what sort of costs they allow,
in terms of time,
how bad they are, in terms of space, how bad they are.
So if you've taken an algorithms course,
like, some of these are probably familiar.
So, er, all right.
So we talked about backtracking search, [NOISE] backtracking search.
That is basically this algorithm that goes through pretty much everything,
and it allows any type of cost.
So I can have [NOISE] any cost, right?
I can have pretty much any cost I want on
these edges because I'm going over all of the tree.
It doesn't matter what these costs are, okay?
So, um, how-, how bad is this in terms of, in terms of time?
So in terms of time,
I'm going over the full tree.
By going over the full tree, then, then this,
this is going to have this exponential blowup where I'm looking at order of b to the d,
where b is, again, my branching factor and d is the depth of the tree, okay?
Cause in terms of time,
this is not a good algorithm.
Like, in terms of time, I have to go over everything in the tree.
And that's the size of my tree, okay?
And in terms of space,
in terms of space, what I mean is,
I need to figure out what was,
what was the sequence of actions I needed to take to get to some solution.
So let's say that my solution is down here.
If my solution is down here, then for me,
in or- like, I need to store a bunch of things to know how I got here,
and the things I need to store are the appearance of
this node and that is depth of D. So in terms of space,
this algorithm takes order of D, okay?
Because, because that is, like, the things that I need to
store in my memory to be able to recover,
like, the solution when I get there. Yes. [NOISE]. Question.
Because we need to look at everything,
shouldn't this space be big or here D to the D as well?
Because until you get to that,
you need, you need to have the space to have everything, right?
You can prove that, but [NOISE] no. So actually,
we'll talk about breadth-first search later,
which does require you have a larger space.
So, so the reason you can forget it is the only history that I
need to keep track of is this particular branch, right?
I don't need to figure out, like,
I don't need to keep track of, like,
actually the history of all these other nodes.
I can, I can throw it- [NOISE] those out.
But for something else like breadth-first search where we'll talk about in a few slides,
you actually need to keep track of,
like, the history of everything else.
So, so let me get back to that in a few slides.
But for this one, basically the idea is,
um, yeah, like, I wanna know how I got there.
To, to know how I got there,
I just need to know the parents. Yes.
[inaudible] like the minimum cost to reach a point or is it to find whether,
like, you can or cannot reach a certain point in your search.
So it depends on what your objective is.
Like, it really depends on what the search problem is asking.
So, so in the case of that farmer-goat example, uh,
the search problem is asking,
you wanna move everything to the other side of the river.
So you have that criteria.
And you wanna find the minimum cost one,
so you also have that other cri- criteria.
So it really depends on what the search problem is asking.
And some of these nodes might be solutions.
Some of them might not be solutions.
So, so it really depends, okay? All right.
So, so let's just look at these on the slide.
So the memory is order of D. It's actually small. It's nice.
In terms of time, this is not a great algorithm, right?
Because even if your branching factor is 2,
if the depth of the tree is 50,
then this is gonna blow up, like, immediately.
So a lot of these tree search algorithms that we're gonna talk about,
like, they have the same problem.
So, so they pretty much have the same time complexity.
We're going to just look at very minimal improvements of them.
And then after that, we'll talk about, uh,
dynamic programming and uniform cost search,
which are polynomial algorithms that are much better than these, okay? All right.
So let's actually- let's go back to the tram example and let's try
to write up what backtracking search does.
So- all right. So we defined our model.
Our model is the search problem,
this particular transportation search problem.
It could be anything else.
Um, and now we're going to kind of have
this main section wi- where we're going to put in,
like, our algorithms in it.
And we're gonna write them as general as possible so,
so we can apply them to other types of search problems, okay?
So let's define backtracking search.
It takes a search problem.
It can take the transportation problem, okay? All right.
So- and then we're going to- basically in backtracking search,
what we're doing is we're recursing on every state given that you have a history of,
of getting there and the total cost that it took us to,
to get there, okay?
So, so at the state,
having gotten some history and some accumulated costs so far,
we are going to basically recurse on
that state and look at the children of that state, okay?
So, so we're going to explore the rest of the subtree from,
from that particular state, okay?
All right. So how do we do that?
[NOISE] Well, we gotta make sure that we're not in an end state.
Or if you're in an end state, like,
we can actually update the best solution so far, okay?
So let's put that for to do.
So, so, so the bunch of things that we need to do.
We need to figure out if you're in an end state.
If we are, well,
we got to, we gotta update our best solution.
If you're not in an end-state,
then we're going to recurse on children, okay? All right.
So we can do that later.
And then in general, this recurse function is, is going to,
uh, we're going to call it on on the, on the start state.
So let's actually do that too.
So, so what backtracking search does is it calls this recurse function
on the initial state that we have with history of none, right?
Like, we don't have any history yet, and,
and cost is 0 so far because we haven't really gone anywhere.
So, so we start with a start state.
We call recurse on it, okay?
[NOISE] And how do we recurse on children?
Well, we have defined this,
this successor and cost function.
So by calling that successor and cost function on state,
then we can get action,
new state, and cost.
So, so we get this triple of action,
new state, and cost, okay?
And then we can basically recurse on the new state.
Um, I'm not putting the histories right now in this code.
So, so we need to keep track of the history too,
but, but let's just not worry about the history.
Oh, I guess I'm putting it in this one.
[LAUGHTER].
In the later ones I will not put them.
But, but basically the history is keeping track of, like, how you got there.
And to- total cost is going to be [NOISE] what,
what you've got so far plus the cost of this,
this new state, action pair, okay?
Okay. So we need to keep track of the best solution so far.
So I'm just going to find a dictionary here just to
make sure that we keep track of it and for Python scoping reasons.
Okay. And then the place
we're going to update our best solution so far is that to do that is left, right?
So, so if you're in an end state,
then we can actually update the best solution so far, okay?
And what do we want in our best solution?
Well, we wanna know what the cost is.
So, so we can start with cost of infinity.
And anything below infinity is better.
[NOISE] And then we're going to start with a history of empty,
but we're going to fill up that history too, okay?
So that's the initialization of best solution so far.
Then, we're going to update that, right?
If you're in an end-state,
if the total cost that we have right now is smaller than the best solution so far,
then we're going to update that best solution.
And, and you're going to update its history with whatever its history is, okay?
All right. And, and that's it,
that's backtracking search, okay?
So let's just make sure it does the thing.
So maybe- so to do that,
[NOISE] we are going to- actually, no,
we gotta return the best solution so far.
Mm-hmm.
All right. So now we have defined a transportation problem.
Now, what I want to do is,
I want to call backtracking search on the transportation problem, okay?
So that all sounds good.
I need to write a print function also to- to be able to print things.
So I'm gonna just write a generic print function that we can
call on any of these types of problems.
So let's- let's define a print solution function that just like,
prints things the way we want them.
So we get the solution,
and we're gonna just unpack that cost and
history and just print the cost and history nicely.
Okay. All right.
So I can- I can use this print solution for pretty much all the other algorithms,
we'll talk about today too.
Okay. And we're gonna talk about how we get there- to the history.
So now I have my print function,
I have my backtracking search algorithm,
I've defined my transportation problem.
I can just call it on this transportation problem with 10 blocks.
So as you guys can see here,
so the total cost is 6.
So what this means is for going from city 1 to city,
city 10, then this is the best solution.
I- I gotta walk walk, walk,
walk, and then after that ta- take the tram.
Because like I end up in 5,
and then after that it's actually
worth taking the tram and paying the cost 50.
Um, let's try it out for 20.
What do you think is the answer for 20?
So [LAUGHTER] similar to before, walk, walk,
walk until we get to 5,
then we take the tram, then we take the tram again.
The cost is 8. And then if,
if it is 100,
it's a little bit more interesting if you have 100.
So you are walking and then you're taking
the tram and you get to 24 and you what- you have
that in one step to get to 25 which is
the good state because then you can just multiply that by 2.
So you walk for that one step and take the tram again, okay.
So what if I want to try out a much larger number of blocks?
So is this gonna work?
No, because, because remember,
that time was order of b to the d. That wasn't that great.
So let's try that.
Well, we got maximum recursion then, we can fix that.
So [LAUGHTER] let's try fixing that.
[LAUGHTER] So you can, you can set your recursion limit to be whatever.
So you can try that. Is this gonna work?
[LAUGHTER] Now, it's just gonna take a long time, right.
So, so it's not going to give you an answer
[LAUGHTER] And it's gonna just take a long time.
So all right.
[LAUGHTER] Actually, how do I view? Okay.
Let's go back here. All right.
So that was backtracking search, right?
So all it was doing was just going over all of this tree and it was taking
exponential time as you saw and we just tried it
out on that transportation problem that we defined.
So we just defined a search problem, we used this
really simple search algorithm to find solutions for that,
and- and then that's what we have so far.
So, so now what we want to do is,
we want to- we want to come up with
a few better improvements of this backtracking search.
Again, don't get your hopes up,
it's not that big of an improvement.
But, but we can do some- something better.
So, so the first improvement you want to make
is by using this algorithm called depth-first search,
as some of you might have heard of it.
DFS or depth-first search, okay?
So the restriction that DFS put in,
is, is that your cost has to be 0.
So your cost has to be, let me leave that.
Um, let me actually draw a line between them.
So you don't get.
Okay, so, so we are talking about DFS now,
and the restriction is the cost has to be 0.
So, so what DFS does,
is it basically does exactly the same thing as backtracking search,
but once it finds a solution down here then it is done.
It basically doesn't like explore the rest of the tree.
And the reason it can do that is the cost of all these edges is 0.
So if the cost of all these edges are 0,
then if I find a solution I found a solution.
I don't need to like find this better solution.
Because, because that, that is good enough like anything that I find also has a cost of 0,
so I might as well just return the solution.
Like, an example of that is if you have Rubik- Rubik's cube uh,
like if you find a solution then you have found a solution, right?
There are a million different ways of like getting to a solution,
but like you just want one.
And then if you find one,
then you're happy, you're done.
Okay. So as you can see, this is a very,
very slight improvement to backtracking search.
Um, what happens is in terms of,
in terms of space it's still the same thing.
So it's order of D. So in terms of space nothing has changed.
It's pretty good, it's order of D. In terms of time,
in practice it is better, right?
Because in practice if I find a solution,
I can just be done,
don't worry about the rest of the tree.
But, but in, in general,
if you want to talk about it in theory then
the worst case scenario is just trying out all of the trees,
so you write it as worst case scenario,
it's order of b to the d. So,
so nothing has really changed in terms of- in terms of exponential blow up. Yes.
I've been thinking of how you draw that tree,
it seems that you imply that the sub problems do not overlap, right?
Because you're kind of [inaudible] but in fact the sub-problem could overlap.
So you- somebody with a training problem, you can get to
the same place through different history but the rest is the same.
Yeah, so you can- so,
so the question is yeah, do sub-problems overlap here or they don't.
So you could actually have it in a setting where sub-problems do overlap,
but you could actually add this,
this extra like constraint that says if I visited the state,
then don't add it to the tree.
So, so you have that option or you have the option of like going down to tree with some,
like particular depths and not trying out everything.
In the setting that we have here, yeah,
like we're basically trying out all possible.
Like, I'm talking about the most uh, like,
general form where you're going over
all the states and all possible actions that could come out of it, okay?
All right. So that was DFS.
Okay. So the idea of DFS again as you're doing
backtracking search and then you're just stopping when you find
a solution because- because cost is 0, okay?
So in terms of s- space order of D,
in terms of time,
it's still order of b to the d, okay?
All right. So that was DFS.
We have another algorithm called breadth-first search BFS.
And this is useful when cost is some constant but it doesn't need to be 0,
it's just some, some, some positive constant.
So what that means is all these edges have the same cost
and that cost is just C. So I have the same cost pretty much everywhere, okay?
So the idea of breadth-first search,
is we can- we can go layer by layer.
Like, like we're not going to try out the depth.
Instead what we can do is,
we can go layer by layer,
try out this layer and see if we find a solution here.
Remember the tree doesn't need to go all the way down here.
The tree could end here or like at any of these and any of these nodes.
Like, like I can have like a tree that looks maybe like this.
I have a solution here.
Like this tree doesn't need to be like this nicely formed.
Like I can have a tree that looks like this, okay?
So if I have a tree that looks like this,
with breadth-first search, I'm gonna try out this layer.
See if this guy is a solution.
If it's not, I'm gonna try this guy, see
if this is the solution.
If not I'm gonna try here, here,
and then when I find a solution when I get here, I'm done, right?
Because like if I find a solution here,
I know it took 2C to get here.
Like two of these C values.
And if there is any other solution anywhere else in this sub-tree or in this sub-tree,
those solutions are going to be worse than this.
Because they are gonna just like take like,
they- they're going to have a higher cost, okay?
So because the cost is constant throughout.
Okay. So then it's,
it's useful if your solutions are somewhere like
high up in this tree and then you can find it.
So in terms of time,
I get some improvements here because I can call this depth,
this shorter depth the small d. I'm gonna
call this shorter depth small d. And in terms of time,
it's still exponential but it's order of
B to the small d. And this is actually a huge improvement,
because if you think about it,
the tree has exponentially become larger.
So these like lower levels are a lot of things that you need to, you need to explore.
If we have like branching factor of 10,
the next layer has 100 things in it, right?
So- so going down these layers is actually pretty bad.
So, so the fact that with bre- breadth-first search I can improve the timing and,
and limited to a particular depth, that's pretty good.
Still exponential, but pretty good. Yes.
[inaudible] negative cost at that point,
you can also assume this is best solution.
Yeah, you can assume that this is the best solution. Yeah, exactly.
So you are assuming that there are no negative cost.
So at this point, I know this is the best solution, I'm done.
Like I call it and and I don't like explore anything else.
The problem with breadth-first search is um, there's a question there, sorry.
Are you also assuming all the costs are the same?
Yeah, we're assuming all the costs are the same.
Because maybe you like all the costs are 1,
if- if I don't assume that,
if all of these costs are 100 and then like there might be like some,
some other like um.
[inaudible].
Yeah, you need to explore the rest if they're not the same basically. That's what I mean.
All right. So, so the the problem with BFS is,
in terms of memory we are losing.
In terms of memory,
you need to actually keep track of the history of all these other,
like all the nodes that you have explored so far.
So uh, in terms of memory,
this is going to be order of b to the d,
kind of similar to the time.
And, and the reason is,
I have explored this guy.
And then after exploring this guy,
I need to still have like a history of where it's going to go,
because next time around when I try out this layer,
I need to know everything about this parent.
And I,- like when I- when I explore here and this is not a solution,
I need to store everything about this,
because maybe I don't find a solution in this,
in this level and I need to come down.
And when I come down, I need to know everything about these nodes.
So I need to actually store pretty much like
everything about the tree until I find my solution.
And then that's where you lose like in breadth-first search.
In terms of space, it's not going to be that great.
So in terms of space, it's now order of b to the d. It's a lot worse than what we've had.
In terms of time, it is, it is better.
It's still exponential, but it is better, okay? All right.
Okay, so now um,
let's talk about one more algorithm and then afterward we,
we jump to dynamic programming.
There is a question back there.
One thing though, the small d can be the same as the big D, right?
It can. Yeah. So, it is exponential. I agree.
Small d can be the same as big D. But in practice,
if small d is not the same as big D,
we are- we are winning a lot because, because, yeah,
these lower layers are so bad that,
that people actually like to call it- call the fact that we,
we are order of b to the small d rather than big D. Yes?
Is there a reason for why DFS would be the worst case scenario for the time enough for DFS?
Uh, so DFS needs to go all the way down to these lower, lower levels.
But BFS can stop at every level because it's doing level by level.
That can be the worst case scenario [inaudible].
Yeah. So the reason is- yeah, so like you were saying, okay,
so in DFS we were also saving some time, right?
Like why aren't we are calling that out.
And then the reason is with DFS you still need to get to these like lower layers,
and that is the, like,
that is the place that you're losing on time.
So, so the fact that you're still, like,
losing on time and surely you haven't explored these other ones,
but you have already got to these lower trees,
like, so far, um, that's pretty bad.
So, so that is why we are calling it order of b to the d in a worst case.
Okay. All right.
So this, this last algorithm I wanna to talk about is,
is an idea that tries- it's a cool idea.
It actually tries to combine the benefits of BFS and DFS.
And, and this is called,
uh, DFS Iterative Deepening.
So what this algorithm does is it basically goes level by level,
same as BFS, because then that way i- if you find a solution,
you're done, everything is great, right?
Uh, but what, what it does is for every level,
it runs a full DFS.
And, and it feels- it's like it's gonna take a long time.
But, but it's actually good because, again,
if you find your solution, like,
early on, it doesn't matter that you have ran like a million DFSs so far.
So, um, so it's kinda like an analogy of it is,
is imagine that you have a dog,
and that dog is DFS,
and it's on a leash, and you have like a short leash.
And when it is on that leash,
it's going to do a DFS and try out and search all the space,
and it doesn't find anything.
So it comes back, and then you're going to extend the leash a little bit,
and it's gonna do everything, and, like,
search everything, and do a DFS.
Comes back, doesn't find anything you extend the leash again.
So, so that's the idea.
Like extending the leash is this idea of extending your, your levels, okay?
So, uh, so how does,
how does DFS iterative deepening be? Yes?
Um, if what we're looking for in following the tree is even worse [inaudible]
Uh, say that again, say that.
So if, if what we're looking for in following the tree,
is that gonna be worse than-
Yes, exactly. Yes, that's, that's okay. That's a good point.
So the point is, uh,
the, the point that, um,
I mentioned is, if your solution is,
like, here, you are screwed.
It's worse than BFS or DFS, right?
You're doing all these DFSs through like a bigger, like,
higher-level BFS and you're- and,
and it's, it's a terrible situation.
But again, in practice, like,
we are hoping the solutions are not gonna end up like down this tree.
But yeah, if the solutions are down the tree,
then you're not, like, winning anything by, by using DFS.
What exactly, like what problems do you think DFS iterative deepening would be, like, useful?
In general, if you- okay. So the question is, yeah,
so what problems do we think DFS iterative deepening is useful?
Uh, in general, if like,
there are problems that I think BFS is going to be useful,
usually, DFS iterative deepening is useful.
The reason I would think that is, like,
there is some structure about the problem that I
would think I would find my solution earlier.
So if I, if I have some reasons or some,
some reasons about the problem,
about the structure of the problem,
and I think solutions are low depth,
I should use some of these algorithms.
And in DFS with iterative deepening in terms of space,
it helps too, so might as well use that. All right.
So, so in terms of space,
it's going to be order of small d. So in terms of
space order of small d. And then in terms of time,
you'd get the same benefits of,
uh, it gets the same benefits of, uh, BFS.
So, so that's, that's nice.
And then again, like, because it's has this BFS out of the loop,
it has the same sort of constraint on the cost.
That's gotta be a, uh,
constant constraint that cost, right?
So that is our table.
And again, in looking at this table in terms of time,
you're just not doing well, right?
Like you have this exponential time algorithms here.
And, um, we cou- could avoid
the exponential space with using something like DFS iterative deepening.
But still, this time thing is- it's just not that great, okay?
And what we wanna do now is we wanna talk about search algorithms
that bring down this exponential time to polynomial time somehow.
And then there is no magic,
we'll talk about how.
[LAUGHTER] And dynamic programming is,
is the first algorithm, okay? Yes?
You might give us ideas b to the d time in term of d space.
Uh, yeah. So it- so,
so the way iterative deepening works is,
it sets the lev- or say level is one.
So if level is one,
I'm gonna do a full DFS, okay?
Because I'm doing a full DFS in terms of space,
uh, I- it's the same as DFS in terms of space.
I just- it's just the same as the length where we find a solution.
Let's say the length where I find the solution is small d. So now,
I say level is two, my new level is two,
I'm gonna do a full DFS, okay?
[NOISE] So when I do a full DFS,
then in terms of space,
I need to- I need to just remember my pairings,
so that's why it's order of d in terms of space.
And in terms of time, it's,
it's order of b to the d because if I find my solution here,
I'm done, I don't need to,
like, explore anything else.
And, and that is exponential but exponential in,
in this smaller depth as opposed to the longer depth similar to,
similar to BFS. Yes?
I'm sorry. I still don't understand why, let's say, like,
the small d is the same as the big D, right? And-
That's a- okay. So that's a very good question. So you- I think I know it.
So you're asking small d,
if small d was the same as big D. If I had my solutions down here,
why am I, like, differentiating here between a small d and big D, right?
Is that what you're asking or am I-
I'm just gonna ask if it's, like,
the depth is quite large, like,
small d is large,
and why is it, like,
why do we need to find also a function of d?
As in why wouldn't it be, like,
d times b to the d?
Um, Oh, I see where you're saying.
So, so you're saying, okay,
like, when I'm doing,
when I'm performing DFS iterative deepening,
then I'm doing DF- DFSs.
So sure, it's order of b to the d for each of them,
but then I'm doing d of them.
And if d is really large, I should put that here.
Sure, I, I do agree that is the right time.
But again, I'm- like, in, in,
in the, in the case of this exponential,
this is so bad that that we are just dropping that,
like, we don't even worry about that,
the extra d that comes in.
But it is true, you need to have that extra d,
like, in, in general if you want to talk about it.
Kind of wanna move on to dynamic programming, but last question there.
First of all, I'm after that,
presumably though you're saving the work that you've done during the prior iterations,
so you're not really computing anything larger than O to the B, capital D, correct?
Yeah, that's right. The worst-case scenario is O to the B,
capital D. All right.
So let's move to dynamic programming.
Okay. So, so what does dynamic programming do?
So maybe I can- I'll,
I'll still use this because I might need to use this thing later.
Okay. So I'm gonna erase my parameters up on here.
Okay. So the idea of dynamic programming,
we have already seen this in the first lecture,
is I have a state s,
and I wanna end up in some end state.
But to do that, I can take an action that takes me to S-prime, right?
I can, I can end up in s-prime by cost of s and a. I can take an action that,
that ends up in s-prime.
And then from there, I can do a bunch of things.
I don't know what. But I'll end up in some end state, okay?
And, and what I'm interested in actually computing is for
this state s is to find what is future cost of s, okay?
And this part of it,
is future cost of
S prime and I don't know what it is but I can just leave it as future cost of S prime.
So if I wanna find what future cost of S is,
maybe I should make this a little bit to the right one cycle.
I'm gonna write cost of s,
a for this edge.
I'm gonna erase this.
What I'm interested in finding is future cost of my state S. So what is that equal to?
Well, that's going to be equal to this cost of s, a.
Right? Like a state S, I'm going to take action a.
So it's going to be cost of s,
a plus future cost of S-prime.
Again, I don't know what that is but that's future Dorsa's problem.
So this is future cost of S prime.
And then you might ask well what is a?
Where does a come from?
How do I know what a is?
I don't know. I'm gonna pick an a that minimizes this sum.
I'm gonna put this around it.
Okay? So future cost of S is just going to be equal to minimum of cost of s,
a, plus future costs of S-prime over all possible actions.
And it's going to be 0, if you are in an end state.
If is End of S is true.
Okay? So if I already know I'm in an end state,
then there is no future cost.
That's going to be equal to 0.
Otherwise, future cost is just going to be,
cost of going from S to the next state and then future cost computed from there.
Okay? So that is just how one would go about
formalizing this problem as a dynamic problem and they're
not a dynamic programming problem, okay?
And then how do I find what S prime is?
Well, I wrote this successor and cost function [NOISE] in my code.
Remember like we know how to find the successor given
that we are in state S and we are taking action a.
So S prime is just calling that successor function over s and a.
All right. So let's go back to some route finding example.
So, so this is slightly different route finding example.
So let's say that we want to find the minimum cost path
from going from city 1 to some city n in the future,
moving forward, we can always just move forward and it
costs c_ij to go from city i to city j.
Okay? So this this is my new search problem.
Okay? So, so this is kind of how the tree would look like.
So, so if I wanna draw this research for this,
I can start from city one,
I can end up in a city two or three or four.
Then if I'm in city two,
I can end up in three or four.
If I'm in three, I can end up in four like this is how it will look like.
Ah, I can have a much larger version of it.
If I'm talking about going to city seven,
then I have this type of tree.
And by just like looking at this tree,
you see all these sub-trees just being repeated like throughout.
If you just look at five like future cost of five,
it's gonna be the same thing.
Right? It's just gonna be the same thing throughout.
And if I use like something like tree search that we have talked about,
then I have to like go and explore like
this whole tree and then it's gonna be really time-consuming.
So, so the key insight here is future cost,
this value of future cost,
only depends on state.
Okay? So it only depends on where I am right now.
And because of that maybe I can just store that the first time that I
compute future cost of five and then like in the future,
I just called that and, and,
and I don't like recompute future costs of five.
Okay? So, so the observation here is,
future cost only depends on current city.
So, so my state in this case is current city and,
and that state is enough for me to compute future cost.
Okay? All right.
So, so if you, if you think about what we have talked about so far,
like we have thought about like these these search problems where the state we think
of it as the past sequence of actions and
the history of actions you have taken and all that.
But right now for this problem,
like state is just current city and that's enough.
Okay? So and and because of that,
you are getting all these exponential savings in time and space because again,
I can compute future cost of five there and collapse that whole tree into
this graph and just go about solving
my search problem on this graph as opposed to that that whole tree.
Right. So, so that's that's where you get the savings from, from dynamic programming.
Um, and I just wanna emphasize that again of,
let me actually do this.
So, so the key idea here is,
like I was saying there is no magic happening here.
The key idea here is is how to figure out what your state is.
It's actually important to think about what your state is.
In this case we are, we're assuming a state is summary of all parts,
all past actions that we've taken sufficient for us to choose the optimal future.
Okay? So, so that's like a mouthful but ah,
basically what that means is,
the only reason dynamic programming works.
And for this particular example we just saw,
is the state the way we define it is enough for us to plan for the future.
Like I might have a different problem where the state.
Like I define a state in a way that it's not enough for me to do a plan for future.
But if I wanna use dynamic programming,
then I gotta be smart about choosing my state because,
because that is the thing that,
that decides for the future.
So, so for example for this problem,
like I might visit city one, then three, then four, and then
six, and for solving this particular search problem,
I just need to know that I'm in city six.
That is enough. Okay? But like maybe I have some other problem that requires knowing one,
three, four, and six and and because of that maybe I need to know the full tree.
Okay? So so this is where the saving comes from like
figuring out what the state is and and defining that.
Right? All right.
So so we will come back to this notion of state
again and I think about the state a little bit more carefully.
But maybe before that maybe we can just implement
dynamic programming real quick. All right.
So let's go back to our tram problem.
I'm back to the tram problem and let's implement dynamic programming.
Okay. So how do we do this?
We're basically just writing that like
math over there into code. That, that's all you're doing.
So, so we're going to define this future cost.
If you're in an end state,
we're going to return 0.
If you're not in an end state we're just going to
add up cost plus future cost of S prime.
How do we get S-prime?
Well, we're gonna call this successor success and cost function.
So we can get action new, new, new state and costs.
And then you're gonna take the minimum of them over, over all possible actions.
So minimum of cost plus future cost of new state.
That is literally what we have on the board.
Okay? All right.
And we're returning the result.
So that is future cost.
What's your dynamic programming there?
It should, it should return a future cost over initial state. Right? Start state.
And you will return the history if you want.
In this case, I'm not returning [LAUGHTER] the history.
Okay. So how do I get savings? Well, I gotta put a cache.
Right? That's the only way I'm gonna get savings.
So um, that is where I put the cache.
And if I, if the state is already in the cache.
I'll just call my cache.
Otherwise I don't. Any question there?
[inaudible].
What's that?
Are we getting future costs?
How are we getting? Uh, say that again. Sorry, I didn't hear.
So future cost takes some states,
but what actually- is there like- uh,
do we actually have, like, a function in the menu to calculate
future costs or is that like [inaudible].
So future cost is going to be,
uh- yeah, so, so we have this function, right?
Future cost over state.
But you're going to call future cost- so, so,
so future cost over state is going to be equal to cost of state and actions,
in this function I'm saying all possible actions,
try that out, plus future costs of S prime.
And S prime comes from the successor and, and, and cost function, uh,
successor and cost function. All right.
So- and then, yeah- and so,
so we do the caching,
the proper caching type of way of doing this too.
And now we have dynamic programming.
So we can basically call this over,
uh, our tram problem.
So I'm gonna, I'm gonna move forward.
Okay. So let's do print solution,
dynamic programming over our problem.
Uh, you can, again, play around with this.
The only way I'm checking this is if it gives me
the same solution as backtracking search because I knew how that works, right?
So let's just call it on ten.
And, yeah, it gave me the same, the same answer.
So I can play around with this, okay? All right.
So, uh-huh, let's go back.
Okay. So one assumption that we have here,
to just point out, is we are assuming that this graph is going to be acyclic.
So, so that's, that's an assumption that we need to make
when we are solving this dynamic programming problem.
And, and the reason is,
[NOISE] well, we need to compute this future cost, right?
For me to compute future costs of S,
the S, S prime,
I need to, like,
have thought about- sorry.
For me to compute future costs of S,
I need to have thought about future costs of S prime.
So there is, kind of, this natural ordering that exists between my state.
So if I think about an example where there are cycles,
then, then I don't have that ordering, right?
If I want to compute, let's say,
I want to go from A to D here,
and on B, C. So if I want to compute future cost of B,
I don't really know if I should have computed future costs of A before
or C before or what order should I have gone to compute,
like, future costs of B?
So, so you actually need to have some way of
ordering your states in order to compute these future costs and,
and apply dynamic programming.
So that's why, like, we can't really have cycles,
like, when we, when we think about this algorithm.
But we are going to talk about, uh,
uniform cost search which actually allows us to have cycles,
like, in a few slides. Yes.
So when is the run time of the dynamic programming?
So the run time of this is actually polynomial time in the order of states.
So order of n.
O of n?
Yeah O of n, where n is the number of states.
Yeah. Okay. All right.
So- all right.
So let's talk about the idea of states a little bit
more because I think this is, this is actually interesting.
All right. So, so let's just reiterate. What is a state?
State is a summary of all past actions
sufficient to choose future actions optimally, okay?
So, so everyone happy with what state is?
So now, what we want to do is,
we want to figure out how we should define our state space.
Because, again, this is an important problem, right?
Like, how we we're defining state space is
the thing that gets the dynamic programming working.
So, so we got to, we got to think about how to do that.
So, so let's go back to this example,
and let's just change that a little bit.
So, so this is the same example of,
I'm going from city one to city n,
I can only move forward,
and it cost C_i_j to go from any city i to city j,
and I'm going to add a constraint.
And the constraint is,
I can't visit three odd cities in a row, okay?
So what that means is,
um, [NOISE] maybe I'm in state one.
And then, I went to state three,
or city one, I went to city three.
And then after that,
can I go to city seven or- no,
based on this constraint that I've added,
I, I, like, can't do that, right?
So I want to define a state space that allows me to keep track of these things,
so I can solve this new search problem with this new constraint.
So, so how should I, how should I do that?
[NOISE] So in, in the previous problem,
when we didn't have the constraint,
our state was just a current city.
Like previously, we just cared about the current city.
And the reason we cared about the current city is like,
is like we are solving the search problem, like, we end up in a city.
We need to know how I'm going- where I should go from three.
So I should, I should have my current city in general, right?
So, so for the previous problem without the constraint,
current city was enough.
But, but now current city is not enough, right?
I actually need to know, like,
something about my past, okay? Yes.
[inaudible] have a count of how many that's odd states.
Yeah. That's actually a very good point. [NOISE] Yeah.
And so, so one suggestion is,
have a count of how many odd states.
Not only maybe, like- and the-
maybe the first thing that would come to our mind is something simpler.
So maybe we say, well,
the state is- maybe I'll write previous city just to be similar to the slide.
The state- like, when we say, well,
the state is previous city and current city.
Okay? So this is one possible option for, for my state, right?
Because, because if I have this,
if I have this guy as my state,
and then that is enough, right?
Like if I- my current city is three,
I know my previous city was one.
I know I shouldn't go to seven,
like that's enough for me to make,
like, future decisions, okay?
But there is a problem with this.
Well, what is the problem?
So I have n cities, right?
So, so current city can take n possible action and n possible states,
previous city can also take n possible options,
has n possible options.
So if I think about the size of my state space,
it is n squared.
If I decide to choose the state, okay?
If I, if I decide to choose the state,
I'm going to have n squared states.
And remember, we are doing this dynamic programming thing, like, we need to actually,
like, write down, like,
all the- like, how to get from all those states.
That's gonna be big. But there is an improvement to this.
And that's an improvement that you suggested, which is,
I don't actually need to have this whole giant previous city which has n options.
I can just have a counter to just know whether the previous city was odd or not.
Like, that's enough, right?
Like if I- I don't care if it was one or three or whatever.
Like, I just care to know if previous city was odd or not.
So, so another option for- I'll write it here.
Another option for my state is to know if previous was odd or not, okay?
And then I need to know my current city again, right?
Current city we need that because,
like, we need to know how to get from there.
And then this brings down my state space,
like, how does it bring down my state space?
Because, well, what's the size of my state space?
This guy can take n possible, uh, states.
If my previous city was odd, that's two, right?
Like, so I just brought down my state space from something that was n squared to 2n,
and, and that's a good improvement.
So in general, when you're picking these state spaces,
you should pick the minimal, like,
sufficient thing for you to make decisions.
So it's got to be a summary of all the previous actions and
previous things that you need to make future decisions,
but pick the minimum one because you're storing these things,
and it, it actually matters to pick the smallest one.
So, so here is an example of, like, exactly that.
So, so my state is now this tuple of whether
the previous city was odd or not, and my current city.
So if I start at city 1,
well, like, I don't have a previous city,
and I'm at city one, I could go to city three,
and I end up in odd and three.
I could try to go to city seven, well,
that's not possible because now I have listed three states, and,
and I end up here,
and there are, like,
the rest of the tree, you can have any other examples. Yeah.
[inaudible].
So, so the way I'm counting this is, how my- so,
so my state is a tuple of two things, right?
If the previous city is odd or even,
I have two options here.
It's either odd or even, that's two.
And then my current city. And I have n possible options for my current city.
It could be city one, city two, city three,
so that's n. So I have n options here.
I have two options here.
That's why I'm saying my whole state space is two times n, okay?
All right. Okay. So let's try out this example.
Let's not put it in.
Uh, just talk to your neighbors about this,
and then maybe, if you have ideas just let me know in a minute.
So- okay. So what is the difference here?
So we're traveling from city one to city n,
and then the constraint is changed.
Now, we want to visit at least three odd cities.
So that's what we wanna do.
And then the question is, what is the minimal state?
Talk to your neighbors. [NOISE]
All right. Any ideas?
Any ideas? [BACKGROUND] What is a possible state?
Like it- don't worry about the minimal even, like for now.
Like what do I need to keep track of?
Number of odd cities.
Number of, number of odd cities?
Yeah.
Okay. So- and is that it?
Do I need to just know the number of odds cities?
Um, or number of odd is about your, uh, [OVERLAPPING]
So number- so, so what I meant is I also need to have current city, right? So, okay.
So one possible option for this new example,
I'm gonna write that here,
is I want to visit at least three odd cities,
I also need my- to know my current city,
for any of these types- like,
not any of these types of problems,
for these particular problems that I've defined here,
I need to know where I am.
So I need to know what my current city is.
So- so that is, like,
that is given what I need to have that, okay?
So I want to see at least three odd cities.
So one possible option is to just have a counter and keep
counting number of odd cities, okay?
So this could be one potential state, okay? Yes?
Do the cities have to be different or it could be one, three, one?
So, um, okay, so the question is do the cities need to be different?
The way we are defining the problem is we are moving forward.
If I'm in one, like, I can just just move forward.
I can't like stay at one or I can't, like, go back.
So- so we're always moving forward.
But when we talk about the- the state space,
we are talking about the more general, like, setting.
Like, some- some of that 2N might not even be possible,
but- but that's the way we are counting, okay?
All right. So- so this is one option,
but I can actually do better than this. Yes?
[inaudible] you need at least three odd cities,
and then you need at least two odd cities,
then you need at least one odd city and then you're-
And then you're done. Right. So- so a suggestion there is we can- we can have,
like, you can- you can start, like,
saying you need at least three odd cities,
then you need at least two odd cities,
then you need at least one- one odd city and then you're done.
And one way of formalizing that, that's exactly right, right?
I only care if I have four odd cities now,
or five odd cities, like,
as long as I have like above three,
that's- that's good enough, right?
One odd city, two odd city, three odd city,
above that is just three plus,
like- like that's enough for me, okay?
So if I have this,
then the state space here is going to be N options here,
and number of odd cities,
it's around N over 2, so it's going to be N squared over 2.
But if I use this- this new suggestion,
where I don't keep track of four, five, six,
seven, I just keep track of one,
two, and three plus,
then my state space ends up becoming 3 times N,
and I- I can formally write that as S is equal to minimum of number of odd cities,
and three, and then current city,
you need the current city.
And with this state space,
then the size is equal to 3N, okay?
So I just, again, brought down N squared to N,
and that's- that's a nice improvement. Yes?
Do you not also need an option for zero odd cities specific to [inaudible]
Zero. We're starting from city one,
so we're already counting that in, but yeah, like,
if you have zero odd cities,
that is a good point too.
All right. So I've gotta move.
Okay, so, um, that was that.
This is how it looks like.
Like you can think of your state space like this again as a tuple of I visited one,
two, three, and- and then the cities.
I have another example here,
you can think about this later and yeah,
like, work, work it at home.
But, uh, basically the question is, again,
you're going from city one to N,
and you want to visit more odd cities than even cities.
What would be the minimal state space?
But we can talk about it offline.
So the summary so far,
is- is that state is going to be a summary of
past actions sufficient to choose future actions optimally.
And then dynamic programming,
it's not doing any magic, right,
it's using this notion of state to bring down
this exponential time algorithm to a polynomial time algorithm,
and then, with the trick of using memoization,
and with a trick of choosing the right state, okay?
And we have talked about dynamic programming and how it doesn't work for acyclic graphs.
And now, we want to spend a little bit of time talking about uniform cost search, uh,
and how that can help with the- with the cycles.
So if you guys have seen Dijkstra's algorithm,
this is very similar to Dijkstra's, like, yeah.
So- so it's basically Dijkstra's. But- all right.
So let's- let's actually talk about this.
So- so the observation here is that when we- when we think about
the cost of getting from start state to some s prime,
well, that is going to be equal to cost of going from s
to s prime and then some past cost of s, okay.
And then when dynamic programming,
let's make sure that we have this ordering and these things are computed in order,
so we're not worried about, like,
visiting the state, like, multiple times.
But- but in- in uniform cost search,
we might visit a state multiple times,
and if you have cycles, we don't know what order to go.
But the order we can go is we can actually compute a past cost- a suggested past cost,
and- and basically, go over the states based on increasing past cost, okay?
So, um, let me actually- yeah,
so- so uniform cost search,
what it does is it enumerates states in an order of increasing past cost.
So- and- and in this case,
we need to actually make an assumption here,
we need to assume that the- the cost is going to be non-negative.
So- so I'm making this assumption for uniform cost search.
So here is an example of uniform cost search running- oh, we don't have internet,
I just- yeah, there is a video of uniform cost search running in action.
If I have time, I'll connect to internet and get it working.
But- so- so let's talk about the high level idea of uniform cost search.
So in uniform cost search,
we have three sets that we need to keep track of.
One is explored set,
which is the states that we have found the optimal path.
These are the states that we are sure, like, how to get to,
we have computed the best path possible to get there,
we are, like, done with them, okay?
Then we have another set called a frontier,
where this frontier are the states that we have seen,
we have computed like a cost of getting there,
like we know, somehow,
how to get there and what would be the cost,
but we're just not sure about it, like, like,
we're not sure if that was the best way of getting there, okay?
So- so the frontier,
you can think of it as a known unknown.
I know they exist, but, like,
I actually, I'm not sure what's the optimal way of getting there.
And then finally, we have this unexplored part of states.
And these unexplored part of states,
I haven't even seen them yet,
I- I don't even know how to get there,
and you can think of it as more of an unknown unknown.
So- so that's, like,
how you would think about these three.
So let's actually work out an example for uniform cost search.
I'm actually going to do this one.
So- so I'm just gonna show how uniform cost search runs on this example.
So I said we are going to keep track of three sets: unexplored,
frontier, and then explored.
Explored. Okay? All right.
So everything ends up in unexplored at the beginning, A, B, C, and D.
And what I wanna do is I wanna go from A to D,
that- that's what I wanna do, okay?
So I wanna find the minimum path cost- path- minimum cost path to get from A to D,
given that I have this graph, okay?
So what I'm gonna do is I'm gonna take my initial state,
that's A. I am going to put A on my frontier,
and it costs zero to get to A because I'm just starting at A, okay?
So that's on my frontier, then in the next step,
what I'm gonna do is I'm going to pop off the thing
with the lowest cost from my frontier.
There's one thing on my frontier,
I'm just gonna pop off that one thing off my frontier,
I'm gonna put that to explored,
the cost of getting to A is 0.
And then, what I'm going to do is after popping it off from my frontier is,
I'm gonna see how I can get from A to any other state.
So from A, I can get to B,
that's one option, and with the cost of 1.
So from A, I can go to B with a cost of 1.
Where else can I go? I can go to C with a cost of 100.
Okay? So what I just did is I moved B from unexplored to frontier,
and then I- I know how I- to get there from A,
and I moved C to the frontier,
and I know how to get from there.
Okay? So now it's the next round,
I'm looking at my frontier,
A is not on my frontier anymore, it's in explored.
And I'm going to pop off the thing with the best cost off my frontier.
Well, what is that? That's B.
So I'm going to move B to my explored.
The way- the best way to get to B,
I already know that, right?
That's from A to B. Everything is good.
Okay? So now that I've popped off B from my frontier,
I'm gonna look at B and see what states I can get to from B.
From B, I can go to A,
but A is already in explored, like,
I already know the best way to get to A,
so- so there is no reason to do that.
From B, I can get to C,
and if I want to get to C,
then I can actually get to C with the cost of
1 plus whatever cost of B is already, 1.
So what I'm gonna do is I'm going to erase this,
because there is a better way of getting there,
and that's from B, okay?
And then, from B, I can get to D. So I'm gonna move D from unexplored to frontier.
I can get to it from B.
And then, how do I get to it from B?
There's a cost of 101, right?
Because 100 plus cost of getting to that, okay?
All right. So I'm- I'm done exploring everything I can do from B.
Going back to my frontier again.
So these two are not on my frontier.
I just have C and D on my frontier.
I'm gonna pop off the thing with the best cost,
that is C. I'm gonna move that to explored with a cost of two,
and the way to- the best way to get that is from B, okay?
So we're done with C. And then,
we're gonna see where we can go from C. From C, I can go to A.
Well, that's done, that's already on
the explored- in- in the explored set, I'm not gonna touch that.
Similar thing with B, already in the explored,
don't need to worry about that.
From C, I can get to D, right?
And if I want to get to D from C,
well, what would be the cost of that?
It would be 2 plus 1.
So I can update this and have 3.
And I can update the way to get to D from here.
And then, we're done,
we go to frontier.
The only thing that's left on the frontier is- is D. I'm going to just pop that off,
and then I'm going to add that to explored.
And that is 3. And that's what I have in my explored.
So the way to get from A to D is- is by taking this route, and it costs 1.
So A, B, C, and D. Okay?
Is that- is that clear? All right.
Okay. So there are two slides left and they're probably gonna kick us out soon,
so I'll do this next time.
So- so yeah, the two- two slides left is one is
going to just go over the- the pseudo-code.
So take a look at that, the code is online.
And there's a small theorem that says,
this is actually doing the right thing.
I'll talk about that next time.
