[MUSIC]
And we now move to the next topic that,
once again,
seemingly has nothing to do with
the sequence comparison.
We are now on a sightseeing tour in
Manhattan.
And our goal is to walk from the
intersection shown in blue
to the intersection shown in red.
And to visit as many attractions
as possible.
The only restriction is that we can either
move south or east.
What should be our strategy to visit
the maximum number of attractions on this
trip?
We can model Manhattan as a grid, and show
edges in this grid where there are
attractions as edges of rate one, and then
our goal is to find a path in
this graph that has maximum number of
attractions. Sometimes computer scientists
call this a longest path in this graph from the
blue node to the red node.
We can actually try to solve this problem
in an arbitrary
grid with this arbitrary number of
attractions, like this grid shown here.
There are many different ways to travel in
this grid.
For example, we can
travel like this, and in this case, we'll
visit 18 attractions.
The length of the longest path is 18.
Is it an optimal path?
No, because we can travel differently, and
in this case, visit 20 attractions.
What should be our strategy, for the
optimum exploration of Manhattan.
We need to solve the Manhattan
tourist problem.
The input is a weighted rectangular grid,
and the
output is a longest path from the source
vertex (node),
the vertex shown by blue, to the sink
vertex (node)
that is shown by red in this grid.
So what should we do?
The simplest thing that comes to mind is
to explore a greedy strategy.
For example, being in the very beginning,
we can either move east or move south.
If you move east, we will visit three
attractions immediately.
If we move south, we will visit only one.
It makes sense to move east.
Let's do it.
Then, afterwards, once again, we can move
either east or south.
We make choice based on the more, maximum
number of attractions.
And we continue like this, and finally,
we arrive to the source, visiting 23
attractions.
And you have already
guessed that this is a very simple
strategy.
But not an optimal one.
We need now to find the optimal strategy.
Another thing we need to keep in mind
is that Manhattan is not a perfect
rectangular grid.
Broadway cuts across.
And we can model this grid as an arbitrary
graph
where edges can go from whatever vertex (node)
to whatever node.
And how do we travel in this grid?
And to travel in this grid, we need to
solve (find) the longest
path in a directed graph problem,
where the input is an edge-weighted
directed graph with the source
and sink nodes, and the output
is simply a longest path from source to sink
in this graph.
Now you may be surprised by now
why I'm talking about alignment game and
Manhattan.
Do you see a connection between
the longest path problem and the
alignment game?
And it may be not obvious that there is a
connection but let's try to figure out
what is it.
For every column of the alignment
matrix, let's code it
with an arrow, as I showed in the slide.
And then,
after we design this arrow, let's see how
this arrow would translate
into the new grid that I have constructed
and presented in this slide.
The first arrow is diagonal.
Let's move diagonally in our grid.
The next arrow is also diagonal, let's
move diagonally again.
The next arrow is horizontal.
We continue further horizontally.
Diagonal, diagonal, vertical, diagonal,
vertical, diagonal,
and actually using alignment matrix, we
were able to travel in the Manhattan grid.
Right?
Now, let's ask a reverse question.
If we have a path in Manhattan grid,
would we be able to construct the
alignment matrix?
Let's try.
This is an arbitrary path in the alignment grid.
Let's combine all the arrows in one place.
And as soon as we've done it, of course
we can construct the alignment matrix as I
showed here.
Therefore, alignments are nothing but
passes in the grid.
And therefore, to play the alignment game,
to construct longest common subsequences,
the only thing
we need to do is to travel
optimal ways in the graph.
And the often question for sequence
comparison problem
in biology amounts to build an
appropriate Manhattan.
We'll do it a lot in this lecture.
In the case of the longest common subsequence,
what would be this Manhattan?
In this case, diagonal red edges
correspond to
matching symbol and have a score of one.
If these corresponding symbols for these
diagonal
edges match to each other.
And highest scoring alignment
in this case is simply the longest path in a
properly built Manhattan.
And to learn how we travel in whatever
Manhattan,
we need once again to talk about the
problem that seemingly
has nothing to do with biology.
And it is the change problem.
