Let's try to analyze the convergence of
the power method. Now we can pick any
version of the power method we want,
because we know they all start pointing
in the right direction eventually.  And
therefore if we can prove that one of
them starts pointing in the right
direction and how fast that happens, then
that automatically translates to how the
other ones start pointing in that
direction as well.  So it's convenient to
pick the version where we divide by the
eigenvalue as if we know it upfront.  So
how did this method work?  Well, we were
given an initial vector v^(0), some
direction, and then we kept hitting that
vector with matrix A and dividing by the
eigenvalue.  Now in analyzing this, what
we notice is that the after the kth
iteration, the vector that we've computed
is actually equal to the original vector
which has been hit by the matrix A k
times but then scaled back k times by
dividing by lambda_0.
It became convenient to say, "Hmm, let's
take our original vector v^(0) and let's
view it in the basis formed by the eigenvectors of matrix A, so let's write it
instead as X times some vector y. Okay? Now what does that mean? That means that our
vector v^(0) is replaced by X times y but
then if you think back about how we
develop the power method. Really what
happens is that A to the kth power times X
is the same as (Lambda to the kth power, oops) X times Lambda to the kth power.
Now this alternatively can be
written as a linear combination of the
eigenvectors of X.  Or we can write it in
matrix form as follows, where we are
thinking this 1 over lambda_0 to the kth
power and incorporated it into this
diagonal matrix Lambda. Okay? And each of these different ways of looking at it
will tell us a little bit. Alright?  What
this particular formulation tells us is
that eventually all of these terms
become arbitrarily small, because lambda_0
is the eigenvalue largest in magnitude,
which means that we expect to eventually
point in the direction of psi_0 times x_0.
In other words in the direction of x_0
except the vector that will compute will
be scaled by the coefficient the first
coefficient of our vector y.  So this now
allows us to see how fast we actually
converge to that.  We can take our vector
v^(k) and say okay
eventually we know it's going to point
in that direction. That's the same as
taking this expression and subtracting
off psi_0  times x_0.  It's the same
that's taking this term away right here.
And if you think about it carefully it's
the same as taking this diagonal matrix
and replacing this 1 by a 0.
Now, we can look at what happens to the
length of this difference. We'd like for
that length of the difference, we'd like
for the norm of the difference to go to
0.  And let's pick the norm. We'll take the
2-norm. If we looked at the 2-norm of
this, that would have to be equal to the
2-norm of this, which would have to be
equal to the 2-norm of this and
therefore equal to the 2-norm
of this.  And that's a little hard to
write in there. Okay?  And notice that we
can then bound that by the 2-norm of
this matrix times the 2-norm of this
matrix times the 2-norm of that vector.
And what does that mean?  It means that we get is less than or equal to the
2-norm of x times the 2-norm of this
diagonal matrix. But notice that the
2-norm of a diagonal matrix is equal to
the absolute value of the largest entry
on this diagonal, which happens to be
this right here.  So we get the absolute
value of lambda_1 over lambda_0  to
the kth power and then times the 2-norm
of vector y.
