So, this lecture actually is a bit of a digression,
and it is supposed to cover some of the basics
that we need for various sections of the course.
So, it is very important that you understand
some concepts for linear algebra specifically
eigenvalues, eigenvectors and in particular.
Today we will do principal component analysis,
and the reason that I do it is there is an
very neat relation of PCA and to autoencoders.
An autoencoder is something that well cover
in the course, it is a part of any deep neural
network course.
And singular value decomposition is something
that we using when we learn word vectors.
The word vector is again something very important,
I can just I can do the non SVD version of
it where I just talk about what word to wick
is, but that will not give you the same probably
not the same interpretation as if you start
from SVD and then reach word vectors, right?
So, that is why I am covering these basics.
So, how many of you know eigenvalues and eigenvectors?
Very embarrassing question how many of you
absolutely hate eigenvalues and eigenvectors.
So, let us see if we can change that today
ok, I mean on the positive side right? ok
So, what happens when a matrix hits a vector?
So, most of you a lot of people that I talk
to right actually think that eigenvectors
are the villains of linear algebra, it is
very hard to understand them and so on. But
today I am going to make a case for they are
not the villains they are actually the superheroes
of linear algebra. So, that is what the lecture
is about ok. So, what happens when a matrix
hits a vector?
Transforms it.
Transforms it right; so, actually what happens
is that it strays from it is path. So, this
is the original this is the original vector
x ok and now once I multiply it by A; that
means, if I do the transformation A x then
I get a new vector. And two things happen
right, one is the direction changes which
is obvious, and in many cases the scale also
changes; that means, the vector might get
elongated it is magnitude would increase or
it would decrease right?
So, if you really think about it actually
right? So, matrices are the real villains
of linear algebra right, and we just look
at this vector was minding it is own business
going along it is own direction a metric comes
and hits it and completely changes it is world
right, I mean; it just throws it off path
increases a dimension or slows it down or
whatever it. So, that is they are the bad
guys now for every villain what do you have
a super hero right? So, what is a super hero
corresponding to orbit? What does a super
hero do? Know that is a very linear algebra.
I am talking about comic books that this is
very linear algebraic answer he stands up
to the villain right ok.
And that is exactly what Eigen vectors do
it right, they refused to change their part
they tell the matrix ok. You can hit me as
many times as you want probably you can increase
my you could probably slow me down a bit or
push me ahead or something, but I am not going
to stray off from your path right? So, that
is what eigenvalue eigenvectors do.
So here is a matrix, which is a villain and
here is an eigenvector which is our hero and
now when this matrix hits this eigenvector
it refuses to stray from it is part right?
It says I will move forward I will move back
whatever, but I will not change my direction
ok. I will just stay honest to what I am and
these vectors are called the eigenvectors.
I am more formally you can write it as Ax
is equal to lambda x right so; that means,
the direction remains the same only the scale
changes it will either get slowed down or
it will get boosted up right? So, the magnitude
would change, but the direction remains the
same ok.
Now, what is so special about eigenvectors?
Like why are why is it that, they are always
in the lime light? I know the any course that
you do invariably touch eigenvectors or eigenvalues
at some point in that course right, where
be it machine learning, image processing whatever
you do you always speech everything that you
do, you will always have eigenvectors and
eigenvalues, why is it so, well it is turns
out that several properties of matrices can
actually be explained away by looking at their
eigenvalues right? So, if I look at a matrix
I would probably not be able to comment much
on it, but if you tell me something about
the eigenvalues.
I can see a lot of things about of it and
there is an entire field on this way this
entire spectral graph theory which looks at
properties of Laplacian matrices and come
in something on the properties of the graph
and so on right? And that is just an example
which we do not care about, but what we care
about in this course there are a few things
that we care about with respect to eigenvalues
and eigenvector. And that is what I am going
to focus on right? So, that is what this lecture
is going to be out. And I will take 2 specific
cases which are very important for us to understand
certain concepts later on. So, I will start
with the first one.
And I will start with a very simple example
to motivate this problem. And eventually will
lead to a result which will help us understand
a very important concept in deep neural network
training which is exploding and vanishing,
vanishing gradient. We will not touch that
concept today, but we will use these ideas
when we are looking at that later on. ok
So, let us take this example of 2 restaurants.
So, there is a Chinese restaurant and a Mexican
restaurant. And on day one k 1 students eat
in the Chinese restaurant and k 2 students
eat in the Mexican restaurant ok. So, this
is what my situation is on day 0, k 1 for
Chinese and k 2 for Mexican ok. Now what happens
as is obvious people get bored or they have
different want to try out different things.
So, on day two or other each subsequent day
what happens is that, a fraction p of the
students who ate Chinese today will opt for
max Mexican, on day on the next day and a
fraction q of the students who ate ma Mexican
today are going to opt for Chinese.
So, you get this situation right? So, I started
with k 1, k 2. So, what I am saying is on
day one that is the next day only a fraction
p of the k 1 students will remain for Chinese
and a fraction 1 minus q would be transferred
from Mexican to Chinese ok. And similarly
only a fraction q of the students would again
stick to the Mexican food and a fraction 1
minus p into k 1 would shift from Chinese
to Mexican is this setup clear ok. Can you
write this as a matrix operation it would
be a matrix multiplied by a vector right can
you tell me the vector.
.
k 1 k 2 k 1 k 2 and the matrix is in all this
ok, this is what it is. And I am saying that
this happens on each subsequent day, it is
every day now this keeps happening. So, on
day 1 I started with say 180 and now day 2
it change to something again day 3 it will
change something by the same fraction. Ok.
Now, let me call this as matrix M and this
is of course, v 0 right by definition as we
decided now what would happen on day 2 what
would v 2 be M applied to v 1 right and which
would be M square applied to v 0. I am just
substituting the value of v 1 which is M into
v 0 in general on the nth day what would happen
M raised to n into v 0 ok. So, you see that
the number of customers in the 2 restaurants
is given by this series you had v 0 then M
into v 0 then M square v 0 and so on up to
M raised to n vn ok. You see how the number
of customer is changing.
Now, and this is how I represent it as a state
transition diagram right? So, I had certain
numbers on day 1 and it changed with the trans
with the probability p they will stay back
with a probability 1 minus p they will move
to the next or the different restaurant and
so on right.
And now this though a very toyish example
can you relate it to many things in real life
or many things that you will take in decision
making right that you are. So, even if you
are playing a game for example, and even if
you are playing Atari games or something,
you are in a certain state based on some action
that will take will move to a different state
and so on right? So, these things happen in
various real world applications right there
is a certain state for example, even in stock
market prediction, you are at a certain value
of fish stock it might change to a different
value right and these values you could just
say them as high low or neutral that I am
not going into the actual numbers.
Today the stock value is high it does it possibility
that it will transition to something low and
so on right? So, these kind of state transition
diagrams occur in various real world examples.
Now this is a problem for the two restaurant
owners right, why is this a problem for the
two restaurant owners? They do not know how
much food to make, but every day the number
of customers is changing right, but is the
number of customers actually changing. Will
the system eventually reach a steady state?
Will it is it obvious that it will reach a
steady state or maybe it will not even reaches
steady, but the way I describe it I do not
see why it should reach a steady state right
you have some people here they go there come
back go there and so on.
The only thing which I have assumed is that
the transition matrix which was the matrix
M is constant across all the time steps right?
So, every day it is at the same priorities
by which things are changed right? So, what
is your guess if I were to ask you to take
a guess ok. Let us see how many of you think
and it is there is no correct answer here
at this point. So, just tell me how many of
you think it will reach a steady state? How
many of you think it will keep changing and
why is the sum never equal to 1 ok. So, fine
so it turns out that they will right and let
us see how.
So, we will define some things and some of
these are just definitions some of them have
accompanying proofs, which I am not going
to do here you can the proofs have been linked
from the slides. So, you can take a look at
them if you are interested right.
So, suppose there is a matrix A n cross n
matrix which has eigenvalues are lambda one
lambda 2 up to lambda n. Now what this definition
is saying is that, assume that there is one
eigenvalue which is greater there is no assumption
actually the eigenvalue which is greater than
all the other eigenvalues is called the dominant
eigenvalue. And when I am looking at a dominant
eigenvalue I am only concerned with the magnitude
not the sign ok. So, it could be that an eigenvalue
is minus 10 and all the other eigenvalues
are 1 2 3 4 5. So, the dominant eigenvalue
would be minus 10 right and I will just take
it as step is that clear the definition of
a dominant eigenvalue ok.
Now, how many of you know what is the stochastic
matrix? So, matrix M is called a stochastic
matrix, if all the entries are positive and
the sum of the elements in each column is
equal to 1. So now, this definition is again
slightly misstated. So, there is a row stochastic
matrix the column stochastic matrix and also
doubly stochastic matrix right? So, what I
am talking about here is a column stochastic
matrix like our matrix have you seen such
a stochastic matrix any time in your life
in the last 5 minutes the M matrix right?
So, the M matrix is a stochastic matrix because
the sum of the columns was 1 right, you had
p 1 minus p q 1 minus q ok or was it some
of the rows was 1 rows was 1 is it the columns
ok fine.
So, this is a stochastic matrix just a definition.
Now I combine these two definitions which
is, dominant eigenvalue and stochastic matrix
and give you a theorem right? So, the largest
dominant or the dominant eigenvalue of a stochastic
matrix is equal to 1 ok. So, to prove this,
what do I have to prove? So, I need to prove
two things one that 1 is an eigenvalue of
this matrix of any stochastic matrix and second
all the other eigenvalues are less than 1.
So, that is exactly what this proof does here
you can take a look at it and just to give
you a heads up. So, last year I use to do
this that please see the proof go back and
look at the proof people never look at the
proofs.
So, I used to ask them in the quiz where I
should be sure that people not going to answer
right? So, please when I say go back and look
at the proof do that ok. So, and lastly if
A is an n cross n square matrix and you have
this series A v 0 A square v 0 up to an vn,
then this series will converge to the dominant
eigenvector of A. What does a statement mean?
Let us not get into the proof right what does
it actually mean ok. So, let us start with
very basic stuff right what is the series
actually? What is each element in this series
it is a vector, it is a vector everyone gets
that every element in the series is a vector?
Now what do I mean that a series of vectors
converges to the dominant Eigen vector, what
is convergence mean? If I keep finding the
next element next element next element of
this series and I keep doing this as long
as I can. I will reach a value n right where
n is the nth element in the series which will
just be a multiple of the dominant Eigen vector
is that clear? You not seem to be clear everyone
gets that ok.
So, what do you mean by if you take a series
of numbers and if I say that the series converges
to 0, what does that mean? If you keep finding
the next element in the series, you will hit
a point n where you find the nth element of
the series and it will be 0 that ok. So, we
will just I will leave it at that for now.
Now so stochastic matrix dominant eigenvalues
the connection between 2 and the convergence
theorem for a series of vectors which is A
v 0 A square v 0 and so on ok.
Now, let ed be the dominant Eigen vector of
M where M is a dash matrix in our case it
is a stochastic matrix. So, what with the
corresponding dominant eigenvalue be.
1.
1 ok so given the previous definitions and
theorems, what can you say about the sequence?
It converges to a dash of ed.
.
A multiple of ed right? So, there exists an
n such that the a length nth element of the
series which is given by this is going to
be equal to some multiple of the dominant
eigenvector no, no; k is some multiple no
this is not related to eigenvalues yet just
wait for the next statement, then you will
see the difference that this is not the do
eigenvalue yet ok.
Now, my question is what happens from here
onwards, what would be the next element in
the series. How many of you say some k dash
into ed? What is the other pause I do not
have the other option what is the other option.
k into ed.
k into ed how many of you say k into ed? A
large number of ok so, you see that now just
notice the eigenvalue will come up right?
So, at step n plus 1 you would have M into
vn which is M into k into ed and this quantity
is actually 1. So, the theorem says it will
converge to some multiple of k and now if
it is a stochastic matrix, what will happen
after that time step? It will just remain
the same vector.
So, what would happen to the number of customers
in the two restaurants it will remain the
same right you get that ok fine. Now, this
was all for, what kind of matrices? Stochastic
matrices square stochastic matrices ok.
But we generally care about any square matrix.
In fact, we should care about any matrix not
discriminate, but any square matrix will do
for now. So, for a square matrix let p be
the time step at which this series approaches
a multiple of the dominant eigenvector.
The theorem was for any square matrix, remember
it was not for stochastic square matrices.
We just use this value that for a stochastic
square matrix the dominant eigenvalue is 1,
which it need which leads to that neat result
that the num then the number of customers
just becomes constant right ok. But for any
square matrix, I could write it as this that
there exist some step p at which the element
of the p'th element of the series would just
be a multiple of the dominant eigenvector
ok.
Now, what would happen at step p plus 1? Is
this fine ok, what about step p plus 2, and
in general at p plus k or p plus n everyone
gets this ok. So now, can you tell me what
does this knowing this dominant Eigen value
tell us about this series, when will it stabilize
actually?
.
When lambda is equal to 1 that is the case
we already saw if the dominant Eigen value
is greater than 1, what would happen?
.
Series will explode the series will explode
and if it is less than one what would happen
the series will vanish ok. So, this is an
important result that we will use when we
are discussing exploding and vanishing gradients.
So, we will see that in the case of something
one as a recurrent neural networks, you end
up with something of this sort and then I
will make some comments on that right? So,
that is why we will be using this will come
probably 6, 7 or maybe more lectures down
the line ok, but we will be using it at this
point. So, the main result from here is that
if the dominant eigenvalue, this should be
lambda d is greater than 1. Then it will explode
less than 1 it will vanish and equal to 1
it will stabilize ok, is that fine ok. So,
that is one result one important property
of eigenvalues and eigenvectors that well
be needing at a later point in the course.
