Okay, so I'll remind you,
what a Markov chain is, cuz a week ago.
Markov chain, in a certain sense, is kinda
memoryless type of thing, in that you're
imagining that this particle
bouncing around from state to state,
and it doesn't remember how it got.
It's memoryless, not in as quite
the same sense as the exponential, but
it's memoryless in the sense
that it doesn't remember or
care how it got to the state it's in,
right?
It's only just like the only information
you need is what the current state is.
All right, so
it's this conditional independence thing.
Given the present,
that is what state it's at now.
The future and
the past are conditionally independent.
All right, so it's the conditional
independence property.
And it helps to really
just think about pictures.
So I'm gonna draw just several
simple pictures of Markov chains.
And we can kinda just stare at
the pictures for a while, and
try to get some intuition for
how these chains behave.
And then we'll talk about
a few new concepts.
And then the most important thing
is stationary distributions,
going through some of those properties.
But there are a few other concepts that we
can see pretty easily just by looking at
some pictures.
Okay, so I'm just gonna draw
some pictures of chains.
Here is a pretty nice one.
So I'll just call the states 1, 2, 3.
And there are many examples I
could draw that would be nice.
And I'll explain what I mean
by nice when we get there.
But first, let's draw a few pictures.
So suppose you can go from 1 to 2,
you're just wandering around
from state to state, right?
Now that's a Markov chain.
You can picture it this way like we're
doing last time, just draw states and
arrows.
And I could put different probabilities
on these arrows if I want.
But just for simplicity, let's assume
that from any state,
you look at all of the possible arrows and
then put, make equally likely follow
a random arrow or something like that.
We could put different
probabilities if we want.
But let's say from 1, we can stay at 1 or
we can go to 2, from 2, have to go to 3.
From 3, you can go back to 2 or back to 1,
for example, or even just stay at 3.
These loops are optional,
just for illustrative.
It would be a different chain
if I got rid of this, but for
what I'm trying to say right now,
it doesn't have to be this particular one.
Okay, so in a certain sense,
this would be a pretty nice one, and
I'll say what that means.
So that's chain number 1.
And then, let's do kind of a nastier
chain, in a certain sense.
So we could have something that
looks kind of similar, 1, 2, 3.
And maybe it has these back and
forth things, and
maybe it has some loops like that.
But there are also states 4,
5, 6 down here.
And the thing that's, I don't know,
maybe it looks similar like that.
You can just, without having studied much
about Markov chains or a lot of technical
stuff, you can see that there's something
kind of annoying about this chain, right?
You can never get from state 1,
2, or 3 down here.
And from here,
you could never get back up.
Okay, so there's something kind
of annoying about that chain.
A couple more quick examples, what if
we had a chain that looked like this?
Usually we number the states 1 to M, but
there's no particular reason
you have to do it that way.
Sometimes it's just more
convenient to start at 0.
And they don't actually have to
be labeled by integers either.
What matters is not the name of the state
so much for what we are talking about now.
Anyway, suppose we could
go from state 0 to 1,
let's say to 2 to let's say 3.
And you can always go back and
fourth one step.
Except I didn't draw the way I wanted.
I don't want an arrow there, and
I don't want an arrow there.
So from state 1,
you can either go to 0 or 2.
From 2, you can go to 1 or 3.
At 0, though, just stays at 0,
and at 3, stays at 3.
You can see that there's also something
kind of annoying about this chain, right?
Because if it ever lands in state 0,
then, it's just gonna stay there forever.
Whatever lands in state 3,
it's gonna stay there forever.
Okay, and one more simple picture
of a chain just to stare at for
a while would be,
what if the chain looked like this?
So from state 1,
it has to go to state 2, from state 2,
it has to go to state 3,
then it has to go back to state 1.
Well, this is a very simple chain
to see what's gonna happen,
it's just gonna go around and around and
around and around like a clock.
So in a sense,
we don't need to do any math, I mean,
we already know perfectly
how this is gonna behave.
But it's kind of annoying that
it's just going around and
around and around like that,
deterministically.
So basically, we want to find ways
to rule things like that out.
So these three all have kind of
annoying properties that we'll mention.
This is an example of a nice one.
There are many nice ones.
Okay, so we need a couple definitions.
So the chain is irreducible.
I think connected would have been
a better term than irreducible, but
that's standard terminology.
Irreducible means you can get
from anywhere to anywhere, so,
if it's possible.
By possible,
I mean with positive probability
to get from anywhere in the chain.
Not necessarily in one step,
but in some finite number of steps,
from anywhere to anywhere.
Okay, so, just looking at these pictures,
clearly, this one is irreducible,
cuz you can get from anywhere to anywhere.
This one's irreducible and
these two are not.
You can get from state 1 to state 0 but
you can't get from state 0 to state 1.
Once you're in state 0, it's a trap.
So these two are,
I'll just write irreducible, irr.
Reducible, this one is reducible.
This one is reducible and
this one is irreducible.
Red stands for reducible,
just for right now, okay?
So, that you can see just by
looking at the pictures, right?
Now, reducible chains are kind of annoying
just as you can see intuitively like this.
On the other hand,
it's not that big a deal.
Because if you happen to
have a reducible chain,
you can always split it up into
irreducible components, and then just
study those irreducible components and
try to put them back together.
So this reducible chain,
I can think of that as kind of two
separate Markov chains, right?
I could just study this thing up here and
this one down here, which I happened
to draw it with the same structure.
But it didn't have to be the same,
I just drew that as an example.
If I could study this chain on 1,
2, 3, and
this one on 4, 5, 6,
it'd just be a separate thing, right?
It could be because if you start out up
here, then you're just up here forever and
states 4, 5, and
6 are completely irrelevant.
Okay, so for the most part, we only
need to look at irreducible chains.
Okay, just another couple,
quick definitions, a state, this is for
the whole chain, you can get from anywhere
to anywhere, from any state to any state.
Now we're talking about one
particular state within the chain,
and it's called recurrent.
If it's true that If the chain starts in
that state, it's guaranteed to come back.
So if starting there,
chain has probability
one of returning, To that state.
So it's kind of like certain
like tourist agencies for
certain cities like If you visit our city,
then you're always going
to keep coming back, right?
That's our recurrent.
It keeps recurring over and
over again with probability one.
That is,
there can't even be a 0.001 chance
that you will not come back to that state.
And otherwise, it's called transient.
So transient is just
the opposite of recurrent.
Okay, so
that's pretty intuitive terminology.
Irreducible is kind of a strange word,
but recurrent and
transient is pretty
natural terminology and So
just to see in this picture, intuitively,
all of these states are recurrent,
in this first picture.
Because like if you start out at state two
and you know it's always gonna go to state
three, maybe it will go to state one for
a while, maybe it'll stay at state one for
a million years but
eventually it's gonna go back there.
That's probably one.
It's kind of like, you know Murphy's Law?
Anything that can go wrong, will go wrong.
I see this as kind of
the generalization of the probabilistic
generalization of Murphy's Law.
It's basically that
anything that happens with
positive probability will
happen eventually, right?
That's true within the finite.
So what we're assuming,
finite number of states here.
Different things can happen within
an infinite number of states, but
with the finite number of states,
then even if it's extremely unlikely,
it's like extremely, if you start out
here In this example it's not difficult.
But if you can imagine a larger example
where if you start at state two and
it's extremely difficult to get back,
right?
But as long as it's possible,
eventually it's gonna to happen.
So actually in the irreducible case
If there's a finite number of states,
all the states are going
to be recurrent anyway.
Just wait long enough and
it's going to happen.
In this example here, all of the states
are recurrent because even though we can't
get from state one to state four, this
says if we start out at a certain state.
So if we start out at stage four, then the
chain is just gonna be down here forever.
So it will visit four over and over again.
And by the way,
I said that it will return probability
one of returning to that stage once.
But if that is true with probability one,
then I can also say with probability one
they will return infinitely
many times Right?
If it's going to return once,
it's probably one, but
then once it's back there, then he could
say, well, what if the probability of
the second return somehow decreased and
it got less and less likely each time?
Then it would not be a Markov chain,
right?
Once let's say you start at state four,
wander around, get back to state four.
Since it's Markov, then you no longer care
about that whole pass history, right?
It's the same problem again, so it's
probably one that's gonna come back again
and it will probably
wanna come back again.
So if it's reoccurring,
it's gonna come back infinitely often.
On the other hand, if it's transient,
then it might come back again and again,
for a while but eventually it will stop
and it will never go back again, okay?
So, let's modify,
I don't wanna draw a whole new picture to
this but let's modify this one slightly.
But after an important modification, but
if we add an extra edge let's say
from state three to state six, okay.
So suppose now that we have that edge,
it's still not irreducible
because you still can't get from state
four back to state one, for example.
So it's not irreducible.
But now states one, two, and
three have become transient.
Because if you just imagine
what this would look like.
if you start out at state one,
two, or three,
it may wander around up here for
years and years.
Eventually though it's
gonna cross this arrow here
then there's no turning back, right?
And then the rest of its infinite life
is gonna be here in four, five and
six, it will never go back, it can't.
So if I added this edge here,
these three states would be transient,
these three would be recurrent, right?
So that should make sense intuitively,
thinking about how it'd behave.
Let's think about this one here.
State, node one and two, lets think about
whether they're recurrent or transient.
Let's suppose it starts out state two,
and may it wanders around between one and
two for a while eventually it's gonna hit
state zero or it's gonna hit state three.
Once it's in state zero, or
state three it's trapped forever.
Okay that's called an absorbing state,
and this one is absorbing.
So it just gets trapped there
forever like a black hole.
Therefore the states one and
two are transient, states zero and
three are recurrent.
Because if you start in states zero
then you're in state zero forever so
it's clearly recurrent.
By the way, this example is
really just a gambler's ruin.
Just drawn as a Markov Chain, all right.
So this is just, we're studying the,
of course I could have had more
intermediates states here.
That's just the gambler's ruin
problem visualized as a Markov Chain,
which is saying how much money does
gambler a have at a certain time?
And you know wanders
around at some point and
then it eventually ends with
either gambler a is bankrupt and
then stays bankrupt forever in that
problem or gambler a has all the money.
Gambler b is bankrupt and
then it stays that way forever okay.
That's just a visualization
of a gambler's ruin.
Okay so this one everything is recurrent.
But it still has this kind of periodicity,
so
this one will be called a periodic chain.
Because, for example,
let's say it starts here, well,
it's an index time, so at time one,
it's here, at time two.
Three, four, five times six,
seven eight, nine and so on,
so notice that if I index it that way,
at times that are multiples of three,
it's always in state three,
okay, whereas you know, so
then it's like completely predictable
at what it's gonna do for this example.
So we wanna exclude things like that.
Okay, so those are just some
examples to keep in mind, pictures.
The nicest one is this one
where it's irreducible and
all the states are recurrent.
For this one, if we also added some
way to get from here back up to here,
then it becomes a nice one as well.
It would be irreducible.
Everything would be recurring.
But we can only go one way and
that's kind of a bad one.
Okay?
That will be reducible.
Alright.
Now we can talk about
Stationary distributions.
We talked very briefly
about them last time, but
I'll remind you of the definition.
So, S which is a probability vector,
you can just think of that as a pmf just
written out as I'll say probability row
vector, that's written out as a row.
Just think of it as the pmf
just written out as a row.
And it's called a stationary
distribution for
the particular chain that we're studying.
So chain with transition matrix Q,
just the notation we were using last time.
That's just the big matrix of
the probabilities of going from one state
to another one.
So the definition of stationarity
is that s times Q equals s.
And from last time we showed
that if you start out like
pick a random state that's
distributed according to S.
Where S is not necessarily stationary yet.
And then if you multiply by Q that says
what's the probability distribution
over states one step later.
We're picking a random state according
to S, let the chain evolve one step and
this will be the distribution after
one step and if we want two steps,
we would do s Q square and if we want
three steps, we do s Q cubed and so on.
So powers of Q are telling
us the higher transitions,
like the probability of
going from state I to j,
in some certain number of steps now,
we talked about that that last time.
But if s Q is just collapses back down to
s, that's why it's called stationary cuz
that's saying that if the chains starts
out with the stationary distribution
then it's gonna have the stationary
distribution at the next step and
then the next step and so on, so forever.
Okay, so with those of you who've
done Eigenvalues and Eigenvectors,
you can just interpret this as an
eigenvalue, eigenvector equation which in
general might be very, if this is very
large like a large number of states,
this may be computationally very
intensive, you know, problem to do.
You'll do all the matrix calculations
even with the computer may be
to computationally intensive,
depending on the size of the problem.
But anyways, in principle all you
have to do to solve for a stationary
distribution is to solve, this is just
a system of linear equations, right?
It's written in matrix form, but in
principle you could write everything out
as just a linear system and
you know just do you usual thing you know,
eliminate variables one by one or
you know.
Gaussian elimination or
whatever and solve the system.
Well, of course, there's a question which
I asked at the very end last time but
didn't answer in detail, it's just like,
can we solve this is it unique?
And so on, okay so
I'm gonna state these as theorem.
To prove these would involve a lot more
linear algebra then I want to get into or
lot more matrices, some of this
stuff follows from theorems and
linear algebra that should in principle
be taught in every linear algebra course.
But in practice do not get taught.
Which would mean a pretty large detour
into linear algebra if we wanna prove
these, but at least, we'll state
the theorems and also what they mean.
So, just the kinda the basic theorems
about stationary distributions.
Okay, so, for any,
well we want the chain to be irreducible,
to avoid kinda these
annoying reducible chains.
So for any irreducible, Markov chain,
and we're always assuming
finitely many states and
I won't necessarily always say that
explicitly, but I'll just remind you here,
finitely many states, cuz different things
can happen with infinitely many states.
Then, okay and
we wanna know some basic facts
about stationary
distributions first of all,
a stationary distribution exists.
I'll just say a stationary distribution,
S.
It doesn't matter what you call it.
[COUGH]
But, I'm calling ss for
stationary, stationary distributions
exist, that that's always true.
So you can solve this system and
in particular, I mean,
when you solve this thing you'll be, if
you got a solution that had some positive
element and some negative elements
that would be kinda bad, right?
But if everything is positive or
if everything is negative,
then you can just renormalize it so
the entries are non-negative and
add up to one, and you can always do that.
Secondly, it's unique.
So there exists a unique
stationary distribution
even if there a trillion states.
Thirdly there's actually
a formula in a sense that's kind
of interesting in giving more intuition
for what the stationary distribution does.
And that's that Si equals 1
over ri where, ri
is the average time it takes to return
to state i, starting at state i.
So ri is an expected value.
It's as if you start the chain
at state i and let it run
since this is irreducible, in the finite
case that also means it's recurrent so
eventually it's going to
come back to state i.
Anyone know how long does it take?
On average by definition that's our i, and
that's reciprocal to
the stationary distribution.
So I'll just say that's the average
return time that is how
many steps does it take to return
to i if the chain starts at i.
So I don't think it's obvious
that these should be equal.
I think it makes sense intuitively that
these should be inversely related, right?
Because one way to think of a stationary
distribution intuitively is
that it's gonna be the long run fraction
of time of being in a certain state.
So Si, to interpret this intuitively,
think of that as you run the chain for
a long, long time and
then they say what fraction of times
was it inhabiting state i right?
That's gonna converge to Si
under some mild conditions.
So this is the long run fraction
of time at state y, and so, like
if the chain is at y one tenth of the time
in the long run what this is saying is if
you start at state i then on average it'll
take ten steps to get back to state i.
So that's kinda neat that
there's just this very,
very simple relationship
between these things.
Then the last thing is
convergence that is,
how do we know That what I
just said about the long run,
that s is supposed to be like long run
on fraction of time in a certain state.
And how do we know the other thing.
Stationary distribution is
also called equilibrium or
steady state,
all these terms from physics or
econ that are saying it's something
to do with long run behavior.
So to state that,
if there's no periodicity,
what we need to rule out
is just this periodic case.
These first three things are true even for
that periodic case that we'd only assume.
But if we also assume there's
no periodicity problem, and
there are different ways
to get rid of periodicity.
Well, one simple way is just to say if Q
to the m is strictly positive for some m.
That also implies irreducibility.
Just remember Q to the m is the m
step transition probabilities,
probability of going from somewhere
to somewhere in exactly m steps.
Okay?
If you write out the transition matrix for
this example,
it's a three by three matrix
that you can write down easily.
And start taking powers of that matrix.
What you'll see is that 0s show up in some
places, and you take another power and
the 0s move and the 0s oscillate back and
forth where the 0s are.
But you can never find one power
of the transition matrix here
where all the entries are positive.
Okay?
But if it is true that we can find one
value of m, such that we don't have any
0s in this matrix, that will rule
out any kind of periodicity problem.
And in that case, then we can just
say that we do have convergence.
I'll just say this as the probability
that Xn, or Xn, is the chain,
converges to Si as n goes to infinity.
No matter what the starting,
Xn is the state of
the Markov chain at time n.
No matter what the starting state, so
we could say start out deterministically,
start at state 1, always.
Or we can say we choose a random
state according to some distribution,
which may or
may not be the stationary distribution.
You just have to give it some initial
condition, but no matter what the initial
condition is, in the long run,
as n goes to infinity, the distribution,
that's just the PMF at time n,
converges to the stationary distribution.
In matrix terms,
another way to say that would be just
that tQ to the n converges.
So last time,
we showed that if we start out, so
t is just any probability vector,
not necessarily the stationary one.
So think of t is just like.
Initially, we choose a random state
where the probability is given by t.
If we wanna be deterministic, then just
make t have a 1, and everything else 0s.
We showed last time that this is
our starting probability vector.
Each time you multiply it by Q, that's
just going one step forward in time for
the probabilities.
This is saying, go n steps forward
in time, that that's gonna converge
to the stationary distribution, so that
this converges to S as m goes to infinity.
So t could be any probability vector,
right?
Because that's just saying,
start up the chain however you want,
then let long time elapse and it
converges to the stationary distribution.
Okay.
So this theorem tells us
stationary distribution is
a very important thing to look at, right?
Because it's capturing
the long run behavior.
It's also capturing 1 over
expected time to return to
a certain state, it exist,
it's unique, so very, very nice.
The only difficulty with this theorem,
really,
is that it doesn't give us much
of a clue for how to compute it.
You could say, well,
we can use this tip to compute it, but
then we have to find ri which
is also a difficult problem.
Okay?
So this is a great result,
a great theorem,
but how do we actually compute the
stationary distribution assuming we don't
wanna spend the rest of our life
doing matrix algebra on sQ = s?
Okay?
So there's one, especially nice class
of Markov chains where the computations
are really, really nice.
That is we can get the stationary
distribution quickly and easily.
Not necessarily easily in the sense that
it may be hard, but it's the good kind.
By good kind, it means it may be
difficult, but you think about it and
you figure it out.
But it's not tedious, so I like that more.
So that type of chain
is called reversible.
So I'll tell you what that word means,
and then I'll show you how do we find
the stationary distributions in
the case of a reversible Markov chain.
Reversible chains, just in general,
are much nicer and
easier to work with than
non-reversible chains.
Reversible, here's the definition.
So a chain, a Markov chain
with transition matrix that
we've been writing Q, Q, and
let's say individual elements
of Q are just written as (q i j),
that's row i, column j,
is called reversible if there
is a probability vector S such that.
Here's the key equation or reversibility.
Si qij =
Sj qji.
And it's kind of a cute looking equation,
very useful if this holds for
all states i and j.
Okay?
So look from the left-hand side to
the right-hand side,
you just kind of swapped the is.
We just interchanged i and
j, and this became that.
So for any particular Markov chain,
it may or
may not be true that you
could find such an S, and
this doesn't tell us how to find S either.
But if you assume, for the moment,
that this equation holds,
then S is stationary.
That's why I called it S.
So if, I'll just say,
if reversible with respect to S,
that is if this equation holds
with respect to that particular S,
then that S is stationary.
So, if you are lucky enough
to have this equation.
Then you've essentially
bypassed doing all matrix,
solving matrix equations, and
Gaussian elimination, and stuff.
Because once this holds,
we know we have the stationary equation.
We're going to prove that in a minute.
Of course you could say, how do we
come up with s in the first place,
at least for solving this matrix system,
that there's a well known system for
doing that, they do in algebra class,
just solving this linear system whereas
here we have to cleverly come up with s.
So it may take more thought but
you get nicer results, more insight.
And for example, a lot times you
can prove something for all,
you can have Markov chains that have n
states where you're not specifying n,
or m states where you're not specifying m.
And then in some cases, you can
still solve this thing even without
explicitly knowing what
the size of the space is.
Think things like that.
Okay, so let me tell just
a little bit about the intuition.
Why this is called reversible.
Then we'll show that this
implies stationarity.
Then we'll do an example, okay?
So for the intuition,
reversible is also called time reversible.
This reversibility refers to time.
What it says is that if you start up
the chain with distribution s and
then imagine recording a video of,
imagine one of these pictures,
one of these chains and
you kind of videotaping this particle
is wandering around in this process.
Reversibility says that if
you then took that tape and
played it backwards in time, you reversed
it, and you showed that to someone else,
they would not know whether it was
going forwards or backwards in time.
It looks the same.
So, this is saying that if you run time
forwards or backwards, it looks the same.
And you can check that basically
using definition of conditional
probability and rule.
That's what this corresponds to.
So this is also pretty important
in physics too, right?
In physics they like to talk about, one of
the laws of physics are time reversible,
the things like that, very important
in thermodynamics, stuff like that.
But anyway,
that's what this means intuitively.
And, yeah, I should also mention that
in physics this is also called detailed
balance, which you don't need to know.
But in case you ever come across
that term, that's just a synonym.
Detailed balance is reversibility, but
I like the term reversibility more.
All right, so let's show that if this is
true, then, in fact, S is stationary.
Okay, so It's a quick proof and
it's good practice with this concept.
So suppose that Si Qij = Sj Qji for
all ij, so we've found this
S that makes it reversible.
We want to show,
that in fact S is stationary.
So we want to show that S times Q = S.
That's the definition of
stationary distribution.
Okay?
All we have to do is sum up both sides.
Take this equation.
This is for all ij, for
all states i and j.
So just sum both sides, sum over all i.
So I'm summing over all states i.
I'm just writing sum over i but
the indexing is that we're summing
over all the states of the chain.
So the sum of Si qij equals the sum of sj
qji, because those are equal.
And so the sums are equal.
But now, this is actually very
easy at this point, because Sj
doesn't depend on i, so
we can take that out of the sum.
So this is just Sj, times the sum over i
of qji, but
if you think about what that means,
that's the probability of going from state
j to state i summed over all states i.
That sum is just 1 cuz
it has to go somewhere.
That's just I remember each
row of q has to add up to 1.
That has to go from j to somewhere,
so that's just sj.
Okay, now if you think about what this
equation means in terms of matrices,
this says to take, or look at this one.
We're taking s and
we're writing it out as a row.
Like this s1, 2 blah, blah, blah sm.
And we're multiplying by q.
How do we actually do that?
Well, we just do the dot product.
This, and we take a column here, and
we go this times this plus
this matrix multiplication.
And what you get then is just for
the j of entry, it's just definition
of matrix multiplication at this point.
So Sq equals S, that's just the definition
of matrix multiplication, okay?
So, reversibility implies stationarity.
So, okay, I mean,
that's all well and good,
but are there any interesting examples
where we can actually do this?
And there are,
there are a couple in the handout.
I'll talk about the one
that's the most important one
and the most interesting one, I think.
We'll talk about it now.
So an example of a reversible Markov chain
would be a random walk on
an undirected network.
This is a very general,
very, very general example.
So you can think of lots of problems
that you could encode into this format,
on an undirected.
So far you had a couple homework problems
about networks or graphs, right?
And we've just been looking at undirected
ones, that is, the edges don't have
arrows, that is they're actually two
way streets, not one way streets.
But for emphasis,
let's say undirected here.
Any Markov chain at all,
you can think of that as a random
walk on some directed network.
Like any of these, each state, think
of that as a node, and we have nodes,
and we have directed edges,
which are arrows.
And if you want to be more
general than these examples,
we can put different probabilities,
right, on different arrows, okay.
So that's too general.
Undirected means that we have
a picture that looks like,
I'm just gonna make up an example.
So I'm just drawing some nodes and
I'm gonna stick some edges,
let's just do a simple one.
Let's say only four nodes.
So I'm not drawing arrows now,
I'm just drawing undirected edges.
Let's say one and two are connected, and
three is connected to everything else,
and see if I want any more edges.
Any example we want, we'll be able to
do any of these very, very quickly.
Just for illustrative purposes,
I'm just drawing one of them.
So I think this is a good one.
All right, so that's our network.
So our random walk just says Imagine that
you start out at one of these states and
then, wherever you are you look at
all of your available edges and
you pick one randomly
with equal probabilities.
So that's a pretty natural, you have been
given a lot of different problems that you
could express in this format, right?
And okay, so for example,
from state 3 there're three outgoing
edges of the same as incoming
edges cuz it's not directed.
From state 3, you can either go here,
here or here, probably one-third each.
And from state 4 you have
to go back to state 3,
from state 1 you can either go from 2 or
3 and so on.
And then just randomly wandering around,
that's why it's called
random walk on this network.
Okay so that's a mark off chain right?
You can write down the transition
matrix if you want, but
we wanna find the stationary distribution.
Clearly it's irreducible,
it'd be kind of annoying if we had node 5
over here that's not
connected to anything else.
Then that would not be
irreducible anymore.
But let's assume that
the graph is connected,
that is you can get from
anywhere to anywhere, okay.
And let's d sub i be the degree of i.
And degree just means that the number
of edges that are attached there.
So for this example, d1 = 2,
so I'm just counting, 1, 2.
d2 = 2, d3
= 3 and d4 = 1, okay?
Those are just called the degrees,
all right,
just count the number of
choices of path at each node.
Okay, now notice that we have,
The following equation is true.
di qij = dj qji where q is
the transition matrix.
So let's check,
I'm claiming this is true, but
let's check why this is true for all ij.
By the way I drew this
without loops that is I
don't want edges like from 1 to itself.
You can generalize and
extend this to handle loops if you want,
but right now I'm assuming
there are not these loops and
edge from a node to itself, okay.
So to show that this equation is true,
well i equals j
then it just says di qii = di qii
that's always true so nothing to check.
So we can assume i not equal j,
Okay so
I'm just gonna prove this equation.
So let's assume that i is not equal to j,
then let's just think about
what does qij look like?
First of all qij and qji are either
both 0, Or both non-zero.
Right, that's where this key
undirected property comes into play.
If we had an arrow here that
said you can go from 3 to 2, but
you can't go from 2 back to 3, what I
wrote would not be true anymore, right?
And then you have to worry about if
one side of the equation is zero and
the other side is non-zero and
causes some problems.
But in this case we assume that
all of these are two way streets.
So that means in particular,
that means that it's either 0 equals 0, or
something non-zero equals
something non-zero.
Of course 0 equals 0 is always true, so
we only need to look at the case
where both sides are non 0, right?
Now, if there is an edge, ij,
So this notation just means if i is
connected to j by an edge like 1,
3 is an edge, then let's just
write down a formula for qij.
gij =, and let's just think about it.
What is qij mean?
It means that we're at state i and
when we're at state i we look at all
available roads and we pick one and
we're assuming for now that it's
just randomly pick one, right?
So the probability of each particular
one is one over the degree, right?
Cuz over here's, there's two choices,
so it's one-half, one-half.
So this is just 1/di.
Okay, so
in the case where both sides are non-zero,
this just becomes di times 1/di = dj 1/dj,
that's true, right?
That just says 1 equals 1.
So because qij is 1/di,
in the non-zero case,
you plug that in here, then you see
that this is always true for all states,
which is what we're trying to show.
Okay, so notice, just with this one,
few lines right here,
we derived a result that did not require
us to write down any matrixes, or
do any Gaussian elimination, and
row and column operations and whatever.
And, it's extremely general because
I drew this example with four states,
but there could be four billion states.
This is still completely valid, right?
Of course we had some assumptions
the most important one being that this is
undirected.
We could generalize this to the case
where we put weights on the edges and
it's a fairly straightforward extension.
But the thing that would make things much,
much more complicated would be if the
weight or probability from here to here.
If that had a different weight
than going back like that, okay?
But you can see this is extremely general.
Let's write down the stationary
distribution now.
This is exactly the reversibility
equation, right?
Except that the di, these are non-negative
or positive integers, right?
So we just need to normalize it.
So to do that, all we have to do is
multiply both sides by a constant.
This will still be valid if we multiply
both sides by a constant right?
So we just need to chose a constant so
that we actually get a probability vector.
So that just says that,
in general let's say we have M nodes,
which are labeled 1 through m.
And degrees are di,
that is degree of node i,
then, Just by this equation,
it follows that if we take, Here's S,
we just let Si =,
it's proportional to the degree, right?
So it's di divided by
the sum of all the degrees.
So this j is a dummy variable
something over all the degrees.
So in other words take the vector
of all the degrees and
divide by sum of all the degrees
just to make it after one.
Then this is gonna miss dictionary.
So we've found the dictionary
distribution, and it's a pretty into
results it says it's a dictionary
probability are proportional to degrees.
Which kinda makes sense if you're
imagining this in the long-run,
it's gonna be spending more time in
the states of higher degree, right?
Okay, so that's all for today.
See you on Wednesday then.
