The following
content is provided
under a Creative
Commons license.
Your support will help MIT
OpenCourseWare continue
to offer high quality
educational resources for free.
To make a donation or
view additional materials
from hundreds of MIT courses,
visit MIT OpenCourseWare
at ocw.mit.edu.
PROFESSOR: Today we're going
to solve three problems,
a problem called
Parenthesization,
a problem called Edit Distance,
which is used in practice
a lot, for things like
comparing two strings of DNA,
and a problem called
Knapsack, just
about how to pack your bags.
And we're going to get a
couple of general ideas,
one is about how to deal with
string problems in general
with dynamic programming.
The first two and
our previous lecture
are all about strings,
certain sense or sequences,
and we're going to introduce
a new concept, kind
of like polynomial time, but
only kind of, sort of-- pseudo
polynomial time.
Remember, dynamic programming
in five easy steps.
You define what your sub
problems are and count
how many there are, to
solve a sub problem,
you guess some part of the
solution, where there's not too
many different possibilities
for that guess.
You count them,
better be polynomial.
Then you, using that guess--
this is sort of optional,
but I think it's a useful
way to think about things.
You write a recurrence relating
the solution to the subproblem
you want to solve, in terms of
smaller subproblem, something
that you already
know how to solve,
but it's got to be
within this list.
And when you do
that, you're going
to get a min or a max
of a bunch of options,
those correspond
to your guesses.
And you get some
running time, in order
to compute that recurrence,
ignoring the recursion, that's
time for subproblem.
Then, to make a dynamic
program, you either just
make that a recursive algorithm
and memoize everything,
or you write the bottom
up version of the DP.
They do exactly the same
computations, more or less,
and you need to check that
this recurrence is acyclic,
that you never end up
depending on yourself,
otherwise these will
be infinite algorithms
or incorrect algorithms.
Either way is bad.
From the bottom up, you
really like to explicitly know
a topological order
on the subproblems,
and that's usually
pretty easy, but you've
got make sure that it's acyclic.
And then, to compute the
running time of the algorithm,
you just take the number
of subproblems from part 1
and you multiply it
by the time it takes
per subproblem, ignoring
recursion, in part 3.
That gives you
your running time.
I've written this formula
by now three times
are more, remember it.
We use it all the time.
And then you need to double
check that you can actually
solve the original
problem you cared about,
either it was one
of your subproblems
or a combination of them.
So that's what we're going
to do three times today.
One of the hardest parts
in dynamic programming
is step 1, defining
your subproblems.
Usually if you do that right,
it becomes-- with some practice,
step 2 is pretty easy.
Step 1 is really where most
of the insight comes in,
and step 3 is usually trivial,
once you know 1 and 2.
Once you realize 1 and 2 will
work, the recurrence is clear.
So I want to give you some
general tips for step 1,
how to choose
subproblems, and we're
going to start with problems
that involve strings
or sequences as input,
where the problem, the input
to the problem is
string or sequence.
Last class we saw
text justification,
where the input was
a sequence of words,
and we saw Blackjack, where the
input was a sequence of cards.
Both of these are examples,
and if you look at them,
in both cases we
used suffixes, what
do I call it, x,
as our subproblems.
If x was our sequence,
we did all the suffixes,
I equals zero up to the
length of the thing.
So they're about n, n
plus 1, such subproblems.
This is good.
Not very many of them,
and usually if you're
plucking things off the
beginning of the string
or of the sequence, then
you'll be left with the suffix.
If you always are plucking
from the beginning,
you always have suffixes,
you'll stay in this class,
and that's good,
because you always
want a recurrence
that relates, in terms
of the same subproblems
that you know.
Sometimes it doesn't work.
Sometimes prefixes
are more convenient.
These are usually
pretty much identical,
but if you're plucking
off from the end instead
of the beginning, you'll end
up with prefixes, not suffixes.
Both of these have
linear size, so they're
good news, quite efficient.
Another possibility
when that doesn't work,
we're going to see an
example of that today,
is you do all substrings.
So I don't mean
subsequences, they
have to be consecutive
substrains, i through j.
And now for all i and j.
How many of these are there?
For a string of length n?
N squared.
So this one is n squared,
the others are linear.
Out of room here.
Theta n.
So you obviously you prefer
to use these subproblems
because there's fewer of
them, but if sometimes they
don't work, then use this
one, still polynomial, still
pretty good.
This will get you
through most DP's.
It's pretty simple,
but very useful.
Let me define the next
problem we consider.
For each of them we're going
to go through the five steps.
So the first problem for
today is parenthesization.
You're given an
associative expression,
and you want to evaluate
it in some order.
So I'm going to-- for
associative expression,
I'm going to think of
matrix multiplication,
and I probably want
to start at zero.
So let's say you
have n matrices,
you want to compute
their product.
So you remember matrix
multiplication is not
commutative, I can't
reorder these things.
All I can do is,
if I want to do it
by sequence of pairwise
multiplications,
is I get to choose where
the parentheses are, and do
whatever I want for
the parentheses,
because it's associative.
It doesn't matter where they go.
Now it turns out if you
use straightforward matrix
multiplication, really
any algorithm for matrix
multiplication, it matters
how you parenthesize.
Some will be
cheaper than others,
and we can use
dynamic programming
to find out which is best.
So let me draw a simple example.
Suppose I have a column
vector times a row
vector times a column vector.
And there are two ways
to compute this product.
One is like this, and
the other is like this.
If I compute the
product this way,
it's every row
times every column,
and then every row
times every column,
and every row
times every column.
This subresult is a square
matrix, so if these are-- say
everything here is n, and
this will be an n by n matrix.
Then we multiply it by a
vector and this computation
has to take, if
you do it well, it
will take theta n
squared time, because I
need to compute n
squared values here,
and then it's n squared to
do this final multiplication.
Versus if I do it this way,
I take all the rows here,
multiply them on all the columns
here, it's a single number,
and then I multiply
by this column.
This will take linear time.
So this is better
parenthesization than this one.
Now, I don't even
need to define in
general for an x by y matrix,
times a y by z matrix,
you can think about the running
time of that multiplication.
Whatever the running time
is, dynamic programming
can solve this problem,
as long as it only
depends on the dimensions
of the matrices
that you're multiplying.
So for this problem,
there's going
to be the issue of which
subproblems we use.
Now we have a
sequence of matrices
here, so we naturally think
of these as subproblems,
but before we get to the
subproblems, let me ask you,
what you think you should guess?
Let's just say from the
outset, if I give you
this entire sequence,
what feature
of the solution of
the optimal solution
would you like to guess?
Can't know the whole solution,
because there's exponentially
many ways to parenthesize.
What's one piece
of it that you'd
like to guess that
will make progress?
Any idea?
It's not so easy.
AUDIENCE: Well, wouldn't
you need the last operation?
PROFESSOR: What's
the last operation
we're going to do, exactly.
You might call it the
outermost multiplication
or the last multiplication.
So that's going to look like we
somehow multiply a 0 through ak
minus 1, and then we somehow
multiply aK through an minus 1,
and this is the last one.
So now we have two subproblems.
Somehow we want to multiply
this, somehow-- I mean,
there's got to be some
last thing you do.
I don't know what it
is, so just guess it.
Try all possibilities for k,
it's got to be one of them,
take the best.
If somehow we know the optimal
way to do a0 to k minus 1
and the optimal way to ak to
an minus 1, then we're golden.
Now, this looks like a prefix,
this looks like a suffix.
So do you think we can just
combine subproblems, suffixes
and prefixes?
How many people think yes?
A few?
How many people
think no, OK, why?
AUDIENCE: So, for example if
you split, if you were to split,
like [INAUDIBLE]?
PROFESSOR: Yeah.
The very next thing
we're going to do
is recurse on this subproblem,
recurse on this subproblem.
When we recurse
here, we're going
to split it into a0 to ak prime
minus 1, and ak prime minus 1,
or ak prime to ak minus 1.
We're going to consider
all possible partitions,
and this thing, from
ak prime to ak minus 1,
is not a prefix or a suffix.
What is it?
A substring.
There's only one thing left.
I claim these are usually
enough, and in this case
substrings will be enough.
But this is how
you can figure out
that, ah, I'm not staying
within the family prefixes,
I'm not staying within
the family suffixes.
In general, you never use
both of these together.
If you're going to need both,
you probably need substrings.
So if just suffixes work, fine.
If just prefixes work,
fine, but otherwise
you're probably going
to need substrings.
That's just a rule
of thumb, of course.
Cool.
So, part 1 subproblem is going
to be the optimal evaluation
parenthesization of
ai to aj minus 1.
So that's part of
the problem here.
We want to do a0 to n minus 1.
So in general, let's just
take some substring in here
and say, well what's the
best way to multiply that,
and that's the
sorts of subproblems
we're getting if
we use this guess.
And if you start with a
substring and you do this,
you will still remain
within a substring,
so actually I have to
revise this slightly.
Now we're going from ai-- to
solve this subproblem, which
is what we need to do
in the guessing step,
we start from ai, we go to
some guest place, ak minus 1,
then from ak up to aj minus 1.
This is the i
colon j subproblem.
So we guess some point in the
middle, some choice for k.
The number of choices
for k is-- number
of possible choices
for this guess,
so we have to try all of them,
is like order j minus i plus 1.
I put order in case I'm
off by 1 or something.
But in particular
this is [INAUDIBLE].
And that's all we'll need.
So that's the guess.
Now we go to step 3,
which is the recurrence.
And this-- we're going to
do this over and over again.
Hopefully by the end,
it's really obvious
how to do this recurrence.
Let me just fix my notation,
we're going to use dp,
I believe.
For whatever reason, in my
notes I often write dp of ij.
This is supposed
to be the solution
to the subproblem i colon j.
I want to write it recursively,
in terms of smaller
subproblems, and I want
to minimize the cost,
so I'm going to
write a min overall.
And for each choice of k, so
there's going to be a for loop,
I'm going to use Python
notation here with iterators.
So k is going to
be in the range,
I think range ij is correct.
I'm going to double check
there's no off by 1's here.
Says i plus 1j.
I think that's probably right.
Once I choose where
k is, where I'm
going to split my
multiplication,
I do the cost for
i up to k, that's
the left multiplication,
plus the cost for k up to j,
plus-- so those are the two
recursive multiplications.
So then I also have to
do this outermost one.
So how much does that cost?
Well, it's something,
so cost of the product
ai colon k times the
product ak colon j.
So I'm assuming I can
compute this cost,
not even going to try to
write down a general formula,
you could do it, it's
not hard, it's like xyz.
For a standard matrix
multiplication algorithm.
But whatever algorithm
you're using,
assuming you could figure out
the dimensions of this matrix,
it doesn't matter
how it's computed,
the dimensions will
always be the same.
You compute the dimensions
of this matrix that
will result from
that product, it's
always going to be
the first dimension
here, with the last
dimension there.
And it's constant
time, you know that.
And then if you can figure out
the cost of a multiplication
in constant time, just
knowing the dimensions
of these matrices, then
you could plug this
in to this dynamic
program, and you
will get the optimal solution.
This is magically considering
all possible parenthesizations
of these matrices, but magically
it does it in polynomial time.
Because the time for
subproblem here--
We're spending constant
time for each iteration
of this for loop,
because this is
a constant time just
computing the cost.
These are free
recursive calls, so it's
dominated by the length of
the for loop, which we already
said was order n, so it's
order n time for subproblem,
ignoring recursions.
And so when we
put this together,
the total time is
going to be the number
of some problems,
which I did not write.
The number of problems
in step 1 is n squared,
that's what we said over
here, for substrings.
So running time is number
of subproblems, which is n
squared, times linear for each,
and so it's order n cubed,
it's actually theta n cubed.
So polynomial time,
much better than trying
all possible parenthesizations,
they're about 4
to the n parenthesizations,
that's a lot.
Topological order here is
a little more interesting,
if you think about that.
I can tell you, for
suffixes, topological order
is almost always right to left.
And for prefixes,
it's almost always
left to right, for
increasing i, decreasing i.
For substrings, what
do you think it is?
Or for this situation
in particular?
In what order should I
evaluate these subproblems?
AUDIENCE: [INAUDIBLE].
PROFESSOR: This is
the running time
to determine the best way
to multiply-- that's right.
So yeah, it's worth
checking, because we also
have to do the multiplication.
But if you imagine
this n, the number
of matrices you're
multiplying is probably
much smaller than their sizes.
In that situation,
this will be tiny,
whereas the time to actually
do the multiplication, that's
what's being computed by the DP,
hopefully that's much larger,
otherwise you're kind of
wasting your time doing the DP.
But hey, at least you
could tell somebody
that you did it optimally.
But it gets into a
fun issue of cost
of planning verses execution,
but we're not really
going to worry about that here.
So, in what order should I
evaluate this recurrence,
in order to-- I want, when
I'm evaluating DP of ij,
I've already done DP
of ik and DP of kj,
and this is what you need
for bottom up execution.
Yeah.
AUDIENCE: Small to large.
PROFESSOR: Small
to large, exactly.
We want to do increasing
substring size.
That's actually
what we're always
doing for all of those
subproblems over there.
When I say all suffixes,
you go right to left.
Well, that's because the
rightmost suffix is nothing,
and then you build up a larger
and larger strings, same thing
here.
Exercise, try to draw
the DAG for this picture.
It's a little harder,
but if you-- I
mean you could basically
imagine-- I'll do it for you.
Here is, let's say--
well, at the top
there's everything, the
longest substring, that
would be from zero to
n, that's everything.
Then you're going to
have n different ways
to have substrings
of, or actually just
two different ways, to have
a slightly smaller substring.
At the bottom you have
a bunch of substrings,
which are the length zero
ones, and in between,
like in the middle
here, you're going
to have a much larger number.
And all these edges
are pointed up,
so you can compute
all the length
zero ones without any
dependencies and then just
increasing in length.
It's a little hard to see,
but in each case-- Yeah,
ah, interesting.
This is a little harder to
formulate as a regular shortest
paths problem, because if you
look at one of these nodes,
it depends on two
different values,
and you have to take
the sum of both of them.
And then you also add
the cost of that split.
Cool.
So this is the subproblem
DAG, you could draw it,
but this DP is not
shortest paths in that DAG.
So perhaps dynamic programming
is not just shortest paths
in a DAG, that's a
new realization for me
as of right now.
OK.
Some other things
I forgot to do-- I
didn't specify the base case.
The base case for
that recurrence
is when your string is of
length 0 or even of length 1,
because when it's length
1, there's only one matrix,
there's no multiplication to
do, and so the cost is zero.
So you have something like dp
of i, i plus 1 equals zero.
That's the base case.
And then step 5,
step 5 is what's
the overall problem
I want to solve,
and that's just dp from 0 to
n, that's the whole string.
Any questions about that DP?
I didn't write down, I didn't
write down a memoized recursive
algorithm, you all
know how to do that.
Just do this for loop
and put this inside,
that would be the
bottom up one, or just
write this with
memoization, that
would be the
recursive algorithm.
It's totally easy once
you have this recurrence.
All right, good.
How many people is this
completely clear to?
OK.
How many people does
it kind of make sense?
And how many people it
doesn't make sense at all?
OK, good.
Hopefully we're
going to shift more
towards the first category.
It's a little magical, how
this guessing works out,
but I think the only way to
really get it is to see more
examples and write
code to do it,
that's-- the ladder
is your problem set,
examples is what we'll do here.
So next problem
we're going to solve.
Dynamic programming is
one of these things that's
really easy once you get it,
but it takes a little while
to get there.
So edit distance, we're going
to make things a little harder.
Now we're going to be given two
strings instead of just one.
And I want to know the cheapest
way to convert x into y.
I'm going to define
what transform means.
We're going to allow
character edits.
We want to transform this
string x into string y,
so what character
edits are we allowed?
Very simple, we're allowed to
insert a character anywhere
in the strength, we're allowed
to delete a character anywhere
in the string, and we're allowed
to replace a character anywhere
in the string, replace
c with c prime.
Now, you could do a replacement
by deleting c and inserting
c that's, one way
to do it, but I'm
going to imagine that in
general someone tells me
how much each of these
operations costs,
and that cost may depend on
the character you're inserting.
So deleting a character and then
inserting a different character
will cost one thing.
It will cost the sum of
those two cost values.
Replacing a character
with another character
might be cheaper.
It depends.
Someone gives me a little table,
saying for this character,
for letter a, it costs this
much to insert, for letter
b it costs this much to
insert, this much to delete,
and there's a little
matrix for, if I
want to convert an a into a b
it costs this much to replace.
Imagine, if you
will, you're trying
to do a spelling correction,
someone's typing on a keyboard,
and you have some model
of, oh, well if I hit a,
I might have meant to
hit an s, because s
is right next to an a,
and that's an easy mistake
to make if you're not
touch typing, because it's
on the same finger, or maybe
you're shifted over by one.
So you can come up
with some cost models,
someone could do a lot
of work and research
and whatnot and see
what are typical typos,
replacing one
letter for another,
and then associate some
cost for each character,
for each pair characters,
what's the likelihood
that that was the mistake?
I call that the cost,
that's the unlikeliness.
And then you want to
minimize the sum of costs,
and so you want to find what
was the least set of errors that
would end up with this
word instead of this word.
You do that on all
words of your dictionary
and then you'll
find the one that
was most likely what
you meant to type.
And insertions
and deletions are,
I didn't hit the
key hard enough,
or I hit it twice, or
accidentally hit a key
because it was right next
to another one, or whatever.
OK, so this is used for
spelling correction.
It's used for comparing
DNA sequences,
and DNA sequences, if you
have one strand of DNA,
there's a lot of mutation--
some mutations are
more likely than others.
For example, c to
a g mutation is
more common than c
to an a mutation,
and so you give this
replacement a high cost,
you give this one a
low cost, to represent
this is more likely than this.
And then at a distance
will give your measure
of how similar two DNA
strings are evolutionarily.
And you also get extra
characters randomly inserted
and deleted in mutation.
So, it's a simplified model
of what happens in mutation,
but still it's used a lot.
So all these are encompassed
by edit distance.
Another problem encompassed
by edit distance
is the longest common
subsequence problem.
And I have a fun example,
which I spent some hours,
way back when, coming up with.
I can't spell it, though.
It's such a weird word.
Hieroglyphology
is an English word
and Michelangelo is
another English word,
if you allow proper
nouns, unlike Scrabble.
So, think of these as strings.
This is x, this is y.
What is the longest
common subsequence?
So not substring,
I get to choose--
I can drop any set of letters
from x, drop any set of letters
from y, and I want them
to, in the end, be equal.
It's a puzzle for you.
While you're thinking
about it, you
can model this as an
edit distance problem,
you just define the cost of
an insert or a delete to be 1,
and the cost of a
replace to be 0.
So this is a c to c
prime replacement.
It's going to be 0
if c equals c prime,
and I guess infinity otherwise.
You just don't consider
it in that situation.
Can anyone find the longest
common subsequence here?
It's in English
word, that's a hint.
So if you do this
you're, basically
trying to minimize number
of insertions and deletions.
Insertions in x correspond
to deletions in y,
and deletions in x
correspond to deletions in x.
So this is the minimum number
of deletions in both strings,
so you end up with
a common substring.
Because replacement says,
I don't pay anything
if the characters match exactly,
otherwise I pay everything.
I'd never want to do this,
so if there's a mismatch
I have to delete it.
And so this model
is the same thing
as long as common subsequence.
I want to solve this
more general problem,
it's actually easier to solve
the more general problem,
but in particular,
you can use it
to solve this tricky problem.
Any answers?
Yeah.
Hello.
Very good.
Hello is the longest
common subsequence.
You can imagine
how I found that.
Searching for all
English words that
have "hello" as the subsequence.
That can also be done
in polynomial time.
So how are we going to do this?
Well, I'd like to somehow
use subproblems for strings,
suffixes, prefixes,
or substrings.
But now I have two strings,
that's kind of annoying.
But don't worry, we can do
sort of dynamic programming
simultaneously over x and y.
What we're going to do is look
at suffixes of x and suffixes
of y, and to make
our subproblems
we need to combine all of those
subproblems by multiplication.
We need to think about both
of them simultaneously.
So subproblem is going to
be solve edit distance,
edit distance problem on
two different strings,
a suffix of x and a possibly
different suffix of y.
Because this is for all
possible i and j choices.
And so the number
of subproblems is?
AUDIENCE: N squared.
PROFESSOR: N squared, yes.
If x is of length n
and y is of length n,
there's n choices for
this, n choices for that,
and we have to do
all of them as pairs,
if there's n squared pairs.
In general, if they
have different lengths,
it's going to be the length
of x times length of y.
It's quadratic.
Good.
So, next we need to
guess something, step 2.
This is maybe not so
obvious, let's see.
You have here's x,
starting at position i.
You have y starting
at position j.
Somehow I need to
convert x into y,
I think it's probably
better if I line these up,
even though in some sense
they're not lined up,
that's OK.
I want to convert x into y.
What should I look at here?
Well, I should look at
the very first characters,
because we're
looking at suffixes.
We want to cut off first
characters somehow.
How could it-- what are the
possible ways to convert, or to
deal with the first
character of x?
What are the possible
things I could do?
Given that, ultimately, I
want the first character
of x to become the
first character of y.
AUDIENCE: Delete [INAUDIBLE].
PROFESSOR: You could
delete this character
and then insert this one, yes.
Other things?
There's a few possibilities.
If you look at it right,
there are three possibilities.
And three possibilities are
insert, delete, or replace.
So let's figure out
how that's the case.
I could replace this
character with that character,
so that's one choice.
That will make progress.
Once I do that, I can cross
off those first characters
and deal with the rest
of the substrings.
Let's think about
insert and delete.
If I wanted to
insert, presumably, I
need this character
at some point.
So in order to make
this character,
if it's not going to come
from replacing this one,
it's got to be from inserting
that character right there.
Once I do that, I can cross out
that newly inserted character
in this one, and then I have
all of the string x from i
onward still, but then I've
removed one character from y,
so that's progress.
The other possibility
is deletion,
so maybe I delete
this character,
and then maybe I insert
it in the next step,
but it could be this
character matches that one,
or maybe I have to delete
several characters before I
get to one that
matches, something.
But I don't know that,
so that's hard to guess,
because that would be
more time to guess.
But I could say, well, this
character might get deleted.
If it gets deleted, that's
it, it gets deleted.
And then somehow the rest
of the x, from i plus 1 on,
has to match with
all of y, from j on.
But those are the
three possibilities,
and in some sense capture
all possibilities.
So it could be we
replace xi with yj,
and so that has some
cost, which we're given.
It could be that we insert
yj at the beginning,
or it could be
that we delete xi.
You can see that's
definitely spanning
all the possible
operations we can do,
and if you think
about it long enough,
you will be
convinced this really
covers every possible
thing you can do.
If you think about
the optimal solution,
it's got to do something to
make this first character.
Either it does it by replacement
or it does it by an insertion.
But if it inserts
it later on, it's
got to get this out
of the way somehow,
and that's the deletion case.
If it inserts it at the
beginning, that's the insertion
case, if it just
does a replacement,
that's the replace case.
Those are all possibilities
for the optimal solution.
Then you can write
a recurrence, which
is just a max of those
things, those three options.
So I'm going to write, I
guess, dp of ij, yes, of i,j,
but now i,j is not a substring.
It's a suffix of x
and a suffix of y,
so it corresponds
to this subproblem.
If I want to solve
that subproblem,
it's going to be the
min of three options.
We've got the
replace case, so it's
going to be some cost of
the replace, from xi to yj.
So that's a quantity
which we're given.
Plus the cost of the rest.
So after we do this
replacement, we can cross off
both those characters, and
so we look at i plus 1 on
for x, and j plus
1 onwards for y.
So that's option 1.
Then comma for the min.
Option 2 is we have
the cost of insert yj.
So that's also
something we're given.
Then we add on what we
have to do afterwards,
which is we've just
gotten rid of yj,
so x still has the
entire string from i on,
and y has a smaller string.
Comma.
Last option is basically the
same, cost of the delete,
deleting xi, and then we have
to add on DP of i plus 1j.
Because here we did not
advance y but we advanced x.
It's crucial that we
always advance at least one
of the strings, because that
means we're making progress,
and indeed, if you want to jump
to step 4, which is topological
ordering-- sorry, I
reused my symbols here,
some different symbols.
Head back to step 4 of
DP, topological order.
Well, these are
suffixes, and so I
know with suffixes I like
to go from the smaller
suffixes, which is the
end, to the beginning.
And, indeed, because
we're always increasing,
we're always looking at later
substrings, later suffixes,
for one or the other.
It's enough to just
do-- come over here.
To just do that for
both of the strings,
it doesn't really
matter the order.
So you can do for i
equals x down to zero,
for j equals y down to
zero, and that will work.
Now this is another
dynamic programming
you can think of as just
shortest paths in the DAG.
The DAG is most easily seen as
a two-dimensional matrix, where
the i index is between zero and
length of x, and the j index
is between zero and
length of y, and each
of the cells in this matrix
is a node in the DAG.
That's one of our
subproblems, dp of ij.
And it depends on these
three adjacent cells.
The edges are like this.
If you look at it,
we have to check i
plus 1, j plus 1,
that's this guy.
We have to check ij
plus 1, that's this guy.
We have to check i plus
1j, that's this guy.
And so, as long as we
compute the matrix this way,
what I've done here is
row by row, bottom up.
You could do it
anti-diagonals, you
could do it column by column
backwards, all of those
will work because we're making
progress towards the origin.
And so if you ever-- if
you look up at a distance,
most descriptions think
about it in the matrix form,
but I think it's easier to think
of it in this recursive form,
whatever your poison.
But this is, again,
shortest paths in a DAG.
The original problem
we care about
is dp of zero zero,
the upper left corner.
So to be clear in the
DAG, what you write here
is like the cost of,
the weight of that edge
is the cost of, I
believe, a deletion.
Deletion, oh sorry,
it's an insertion.
Inserting that character,
this one's a cost of deletion,
this is a cost to replace, so
you just put those edge weights
in, and then just do a
shortest paths in the DAG,
I think, from this
corner to this corner.
And that will give you this, or
you could just do this for loop
and do that in the
for loop, same thing.
OK.
What's the running time?
Well, the number of
subproblems here is x times y,
the running time
for subproblem is?
I'm assuming that I know
these costs in constant time,
so what's the overall running
time of that, evaluating that?
Constant.
And so the overall running time
is the number of subproblems
times a constant
equals x times y.
This is the best known
algorithm for edit distance,
no one knows how
to do any better.
It's a big open problem
whether you can.
You can improve the
space a little bit,
because we really
only need to store
the last row or the
last column, depending
on the order you're
evaluating things.
To even get down to linear
space, as far as we know,
we need quadratic time.
One more problem, are you ready?
This one's going to blow
your minds hopefully.
Because we're going to diverge
from strings and sequences,
kind of.
So far everything we've
looked at involves one or two
strings or sequences,
except for [INAUDIBLE].
That involved a graph, that
was a little more exciting.
But we'd already seen that,
so it wasn't that exciting.
OK, our last problem
for today is knapsack.
It's a practical problem.
You're going camping.
You're going backpacking,
I should say,
and you can only
afford to take whatever
you can fit on your back.
You have some limit
to capacity, let's say
one giant backpack
is all you can carry.
Let's imagine it's the size
of the backpack that matters,
not the weight, but
you could reformulate
this in terms of weight.
And you've got a lot of
stuff you want to bring.
Ideally you bring
everything you own,
that would be kind
of nice, convenient,
but it'd be kind of heavy.
So you're limited, you're
not able to do that.
So you have a list of items and
each of them has a size, si,
and has a desire,
a value to you,
how much you care about it, how
much you need it on this trip.
OK, each item has two things,
and the sizes are integers.
This is going to be important.
It won't work without
that assumption.
And we have a
knapsack, backpack,
whatever, I guess it's the
British, but I don't know,
I get confused.
Growing up in Canada, I use
both, so it's very confusing.
Knapsack of total size, S.
And what you'd like to do is
choose a subset of the items.
If you're lucky, the sum
of the si's fit within s,
then you bring everything.
But if you're not lucky,
that's not possible,
you want to choose a subset of
the items whose total size is
less than or equal
to s, in order
to maximize the
sum of the values.
So you want to maximize
the sum of values
for a subset of items, of total
size less than or equal to S.
You can imagine size as
weights instead of size, not
a big deal, or you could
have sizes and weights.
All of these things generalize.
But we're going to need that
the sizes/weights are integers.
And so the items have to
fit, because you can't cheat,
you can't have more
things than what fit,
but then you want to
maximize the value.
How do we do this with
dynamic programming?
With difficulty.
I don't have a ton
of time, so I think
I'm going to tell
you-- well, let's see.
Start with guessing.
This is the easy
part to this problem.
We should also be thinking about
subproblems at the same time.
Even though I said we're
leaving sequences, in fact,
we have a sequence here, we
have a sequence of items.
We don't actually care about
the order of the items, but hey,
they're in an order.
If they weren't, we could
put them in an order,
in an arbitrary order.
We're going to use
that order, and we're
going to look at
suffixes of items.
i colon of items.
That's helpful, because
now it says, oh, well, we
should be plucking off
items from the beginning.
Starting with the
i-th item, what
should I decide
about the i-th item,
relative to the
optimal solution?
What should I guess?
AUDIENCE: Is i included or not?
PROFESSOR: Is i included
or not, exactly.
Is item i in the subset or not.
Two choices, easy.
Of course, those
are the choices.
If I do that for everybody,
then I know the entire subset.
Somehow I need to
be able to write
and this is what's
actually impossible
if I choose this
as my subproblem.
I want to write DP of i,
somehow, in terms of, I guess,
DP of i plus 1.
And we'd like to do max, and
either we don't put it in,
in which case that's our value,
or we put it in, in which case
we get an additional
v i in value.
OK, but we consume
in size, and there's
no way to remember that
we've consumed the size here.
We just called DP of i plus 1.
In this case, it has
everything, all this.
In this case, we
lose si of S, but we
can't represent that here.
That's bad, this would be
an incorrect algorithm.
I would always choose
to put everything in,
because it's not keeping
track of the size bound.
There's no capital S in
this formula, that's wrong.
So, to fix that, I'm
going to write that again,
but a subproblem is going
to have more information,
it's going to have
an index i, and it's
going to have
remaining capacity.
I'm going to call it capital
X, at some integer at most
S. We're assuming that the
sizes are all integers,
so this is valid.
The number of subproblems
is equal to n,
the number of items, did
I say there are n items?
Now there are n items, times
capital S, really S plus 1,
because I have to
go down to zero.
But n times S,
different subproblems.
Now for each of them I
can write a recurrence,
and that is DP of
i comma s, is going
to be the max of
DP of i plus 1s.
This is the case where we
don't include the items,
so S stays the same.
Actually I should write x
here, because it's not actually
our original value of s.
x is the general situation.
The other possibility
is we include item i,
and then we give DP of i plus 1.
We still consume item i.
We now have x minus si
as our new capacity,
what remains after
we add in this item.
And then we add on
vi, because that's
the value we gain from
putting that item in.
That's it, that's the
DP, pretty simple.
Let me say a little bit about
the running time of this thing.
Again, you check there's a
topological order and all
that, it's in the notes.
The total running time,
we spend constant time
to evaluate this formula,
so it's super easy.
The number of subproblems
is the bottleneck.
So it's n times s.
Is this polynomial time?
You might guess from the outline
of today that the answer is no.
This is not polynomial time.
What this polynomial time mean?
It's polynomial and n, where
n is the size of the input.
What's the size
of the input here?
Well, we're given n items, each
with a size, each with a value.
If you think of the
sizes and values
as being single word
items, then the size is n.
If you think of them as
being ginormous values,
at most, the size
of this input is
going to be something
like n times log s,
because if you write
it out in binary
you would need log s, bits
to write down those numbers.
But it is not n times s.
This would be the binary
encoding of the input,
but the running time is this.
Now s is exponential in
log s, this is, at best,
an exponential time algorithm.
But it's really not
that bad if s is small,
and so we call it
pseudopolynomial time.
What does pseudopolynomial mean?
It just means that
your polynomial
in n, the input size,
which might be this,
and in the numbers
that are in your input.
Numbers here means
integers, basically,
otherwise it's not
really well defined.
So in this case we have
a bunch of integers,
but in particular we have s.
And so there's S and the si's.
This is definitely
polynomial in n and s.
It is the product
of n and S. So you
think of this as
pseudoquadratic time, I guess?
Because it's quadratic, but
one of the things is pseudo,
meaning it is one of the
numbers in the input.
So if the number
is big in k bits,
so I can write down a number
that's of size 2 to the k.
So it's kind of in between
polynomial and exponential,
you might say.
Polynomial good,
exponential bad,
pseudopolynomial,
it's all right.
That's the lesson.
And for knapsack, this
is the best we can do,
as we'll talk about later.
Pseudopolynomial is really
the best you could hope for.
So, sometimes that's
as good as you can do
and dynamic programming
lets you do it.
