Prof: We've dealt so far
with the case of certainty,
and we've done almost as much
as we could in certainty,
and I now want to move to the
case of uncertainty,
which is really where things
get much more interesting and
things can go wrong.
 
So I'm going to cover this.
 
So we're ready to start.
 
So, so far we've considered is,
the case of certainty.
So with uncertainty things get
much more interesting,
and I want to remind you of a
few of the basics of
mathematical statistics that I'm
sure you know.
So you know we deal with random
variables which have uncertain
outcomes, but with well-defined
probabilities.
So another step that we're not
going to take in this course is
to say people just have no idea
what the chances are something's
going to happen.
 
Shiller thinks we live in a
world like that where who knows
what the future's going to be
like and people,
they hear a story and then
everybody gets wildly
optimistic,
and then they hear some
terrible story and then
everybody gets wildly
pessimistic,
and that kind of mood swing can
affect the whole economy.
 
I'm not going to deal with that.
 
It's hard to quantify and I'm
not exactly sure it's as
important as he thinks it is.
 
So we're going to deal with the
case where many things can
happen,
but you know what the chances
are that they could happen,
and still lots of things can go
wrong in that case.
 
So there are a couple of words
that I want you to know,
which we went over last time,
and I'll just do an example.
We always deal with states of
the world, states of nature.
That was Leibniz's idea.
 
So let's take the simplest case
where with probability 1 half
you could get 1,
and with probability 1 half you
could get minus 1.
 
So that's a random variable.
 
It might be how your investment
does.
Half the time you're going to
make a dollar.
Half the time,
you're going lose a dollar.
So this is X,
so we define the expectation of
X,
which I write as X bar,
as the probability of the up
state happening,
so let's just call that 1 half
times 1,
1 half times minus 1 which
equals 0.
Then I define the variance of X
to be, what's the expectation of
the squared difference from the
expectation?
So how uncertain it is.
 
You're sort of on average
expecting to get 0,
so uncertain it is,
is measured how far from 0 you
are, but we're going to square
it.
So it's 1 half times (1 - X
bar) squared 1 half times (minus
1 - X bar) squared = 1 half
times 1 1 half times 1 which
also equals 1.
 
So the variance is 1.
 
And then I'll write the
standard deviation of X equals
the square root of the variance
of X, which equals the square
root of 1 which is also 1.
 
So very often we're going to
use the expectation of X,
that's going to be how good the
thing is,
and the standard deviation is
going to be how uncertain it is,
and people aren't going to
like--soon we're going to
introduce the idea that people
don't like uncertainty and this
is the measure of what they do
like.
It pays off on average a big
number, say, this one doesn't
but it could,
and the measure of uncertainty
is the standard deviation.
 
I choose that rather than the
variance for a reason you'll
see.
 
It makes all the graphs
prettier, but also if you double
X you'll double the expectation,
obviously, because you just
double everything inside here.
 
The variance,
though, you're going to end up
squaring the two.
 
If you double X you'll double
all these outcomes and the mean,
so you'll end up multiplying
the variance by 4,
whereas you'll multiply the
standard deviation by 2.
So re-scaling just re-scales
these two numbers and has a
funny effect on that number.
 
So that's the reason why we use
these two.
Now, you could take another
example, by the way,
which is .9 times 3
[correction: .9 times 1 third];
let's call this Y,
and .1 times minus something.
How about let's call this 1
third and this minus 3.
Now, what's the expectation of
Y?
The expectation of Y equals .3,
right,
equals--just write it out,
it's .9 times 1 third .1 times
minus 3 which equals .3 - .3
which equals 0,
so the expectation of this
random variable is the same as
the expectation of that random
variable.
And now the variance of this,
of Y,
is .9 times (1 third - 0)
squared .1 times (minus 3 - 0)
squared,
which equals .9 times 1 ninth,
right,
.1 times 9 which equals .1 .9
which equals 1,
which is the same as the other
one.
 
So here we've got another
random variable which looks
quite different from this,
so clearly standard deviation
and expectation don't
characterize things.
This looks quite different from
that one, has the same standard
deviation and the same
expectation.
So we're going to come back
what the difference is between
these two variables in a second.
 
So there's another thing I want
to introduce which is the
covariance of X and Y.
 
So we could look at the
outcomes of these variables.
Where am I going to write this?
 
I'll write it over here.
 
We could look at the outcome of
these variables in a picture
like this, and so here we have X
and here we have Y.
So X could turn out to be 1
when Y is 1 third,
and X could turn out to be 1
when Y is minus 3.
So here's an outcome,
and here's an outcome,
and X could be minus 1,
and we could get 1 third or
minus 3.
 
So there are four outcomes
looked at here.
So if you looked at X alone
it's got a 50/50 chance you're
here or here.
 
If you look at Y alone it's a
90 percent chance up there and a
10 percent chance down there.
 
So those are called the
marginal distributions,
but the joint distribution we
would have to add a number.
So if you looked at X alone,
by the way,
you would say X alone you would
say here's 0,
here's 1, here's minus 1,
so you could have this or this
with probability 1 half and 1
half and Y you could have--
so we'll draw it this way.
 
With Y you could have 1 third
or minus 3 and here the
probability is going to be .9
and .1.
This is 0.
 
Those are the pictures that we
started with.
So you know where X could end
up and where Y could end up,
well, you don't know where they
jointly could end up.
So if they end up on the long
diagonal that means when X is
high Y tends to be high and vice
versa, and if you end up down
here X is low and Y is low.
 
So to the extent that the
probability is on the long
diagonal they're correlated
together.
To the extent that the
probability is on the off
diagonal they're negatively
correlated.
So anyway, to get a sense of
that,
the covariance is going to be
the probability of (1,
1 third) times (1 - X bar)
times (1 third - Y bar) the
probability of--
I'll just go around the circle
of (minus 1,
1 third) times (minus 1 - X
bar) times (1 third - Y bar) the
probability of (minus 1 and 1
third),
sorry what did I just do?
I did minus 1 and 1 third.
 
I've already done that,
so I'm down here.
So (minus 1 and minus 3) times
(minus 1 - X bar) times (minus 3
- X bar [correction:
Y bar]) probability of the
ordered pair--
Student: Should that
minus be the X bar or Y bar?
 
Prof: Thank you.
 
And probability,
what's the point I haven't done
yet, (1, minus 3) times (1 - X
bar) times (minus 3 - Y bar).
So why does that covariance
pick up the idea of correlation?
Well, to the extent that the
probabilities are high here and
over there on the long diagonal
this term is going to get a lot
of weight,
and what is the other term,
(minus 1,
minus 3), and this term is
going to get a lot of weight.
 
So to the extent that you're on
the long diagonal this term and
this term are going to get a lot
of weight,
but you see those terms this is
going to be positive because
it's 1 - 0 and 1 third - 0,
so that's a positive term.
And this is negative,
minus 1 - 0,
minus 3 - 0,
so a negative times a negative
is also positive.
 
To the extent that you're down
here and up there you're going
to get big positive numbers in
the covariance.
To the extent you're on the off
diagonal you'll get big
probabilities here,
but they all multiply negative
terms.
 
This is a minus and this is a
minus, because one of terms is
above the mean and the other one
is below the mean.
That's what it means to be in
the off diagonal.
So covariance is giving you a
sense of whether things are
moving together or moving the
opposite way.
So those are the basic things
you have to know.
And I guess another couple
things are,
the covariance is linear in X,
right,
because if you double X every
time you see the X variable over
here it's always an X outcome
minus an X bar,
an X outcome minus an X bar,
an X outcome minus an X bar,
an X outcome minus an X bar,
so if you double X you're going
to double every term.
 
So it's linear in X and in Y,
and so one last thing to keep
in mind is that the variance of
X is just the covariance of X
with itself.
 
Obviously if you just plug in X
equal to Y you just get the
formula for covariance
[correction: for variance],
and similarly because they're
linear the covariance of X Y--
so the variance of X Y,
one more formula,
of X Y by linearity--first of
all that's the covariance of X Y
with itself,
and therefore by linearity now,
I'm just going to do linear
stuff,
that's equal to the covariance
of X with X the covariance of Y
with Y 2 times the covariance of
X with Y.
Since it's linear I just do the
linear parts,
right?
 
Covariance of X Y with X Y is
covariance of X Y with X
covariance of X Y with Y,
then I repeat the linearity
thing and I get down to that.
 
So those are basically the key
formulas to know.
So now I'm going to make three
little observations that come
out of all of this that are
quite fascinating,
so quite elementary.
 
Are there any questions about
this, these numbers?
Yes?
 
Student: I don't
understand why you gave the
probability of (negative 1,
negative 3) weight when
negative 3 has a much more
probability of being hit on that
1 third.
 
Prof: Why did we give?
 
Say that again.
 
Student: Why did you
underline the probably of
negative 1, negative 3.
 
Prof: Probably of
negative 1, negative 3.
That's this outcome here.
 
We underlined it not because it
was very likely,
but because this term is going
to be positive.
This is positive and this is
positive.
So the whole point is the joint
distribution is not specified,
not determined by the
distributions of X alone and Y
alone.
 
So even if I know the
probability of what X could do,
and I know what the
probabilities that Y could do
that doesn't tell me anything
about what numbers I should put
on these four outcomes.
 
For example,
I could have at one extreme
when X is high Y is high--it
can't be exactly that because
the probabilities are different.
 
These numbers and those numbers
don't determine these four
numbers.
 
So there are many different
numbers I could put in these
four squares which would give me
in total this probability
outcome for X and in total this
probability outcome for Y.
So an easy way to see that is
if I made them.
So what are the observations I
want to make?
For instance,
I could say if X turns out to
be 1 half then I'll always
assume Y turns out to be 1 half,
and then with the other 40
percent of the time Y might turn
out to be--
when Y's high X might have to
turn out--
so here are some ways I could
do this.
 
I could put 50 percent here,
.5 here right?
Then 40 percent of the time
this is going to turn out--so I
have a .5 here,
then what could I do with the
rest of this?
 
This plus this has to add up to
50 percent.
So 50 percent I could have X
turn out to be here.
So when X is 1 I could have Y
always turn out to be 1,
so that means I must have a
probability here,
a probability 0 here because
here's X 50 percent.
So this plus this X is going to
turn out to be 1,50 percent of
the time.
 
Now, how much of the time is Y
going to turn out to be down
here a .1?
 
So suppose I put these
probabilities,
.4?
 
Now, so you see that X is--50
percent of the time X is 1,
and 50 percent of the time X is
minus 1.
Now, how many of the times is Y
1 third, .5 .4,
so 90 percent of the time,
and then 10 percent of the time
Y is minus 3.
 
So here's one way of putting
probabilities on the dots that
produces this outcome,
but I could have chosen another
way of doing it,
the way that you probably had
in mind where I assume they're
totally independent.
That is, knowing the outcome of
X in this way of doing it,
if I know that X turned out to
be 1, Y has to turn out to be a
third.
 
So they're very dependent.
 
X is somehow causing Y or
determining Y.
X has a lot of information
about Y.
Suppose I make them independent?
 
I say what happens here has
nothing to with what happens
over there.
 
Then I write the probabilities,
instead of these,
I'd write it .45.
 
I'd take 1 half times .9 is
.45, and then the chance that
you go down for X,
which is .5 and up for Y which
is also .45 here,
then I'd go .05 here and .05
there.
 
So here, knowing that Y has a
good outcome tells you nothing
about what X is going to do.
 
It's still equally likely X was
good or bad.
Knowing that Y had a bad
outcome, X is still likely to be
equally likely good or bad.
 
And similarly knowing the
outcome of X tells you nothing
about the outcome of Y.
 
This is 9 times this and this
is 9 times that.
So the yellow is independence,
which is probability ((X equals
x),
and (Y equals y)),
equals the product,
Probability (X = x) times
probability (Y = y).
 
So that's the case in
independence.
So in the case of independence,
knowing something about one
variable tells you nothing about
what happened to the other
variable,
but you could do other joint
things.
 
So knowing each of them
separately doesn't tell you how
they're jointly distributed,
and the covariance is an effort
to see whether they're sort of
correlated together or whether
they're correlated
independently.
So independence,
by the way, independence
implies covariance equals 0.
 
That's obvious because what's
happening in the X variable's
got nothing to do with what's
happening in the Y variable.
So since it's linear in X you
can hold Y fixed,
and the X is just the same and
you're going to get something
that adds up to 0.
 
So for any fixed value of Y
this number will just give you
the expectation of X,
which won't depend on Y and
it's going to be 0 in every
case.
So therefore if they're
independent their covariance has
to be 0.
 
So, independence means X and Y
tell you nothing.
That means the covariance is 0.
 
They could be positively
distributed like up here or
negatively distributed,
either way you want to do it.
Does that make sense?
 
You asked me about this.
 
Student: Yes.
 
Prof: So what are the
key simple observations here
that are going to inform a lot
of our behavior under
uncertainty?
 
Well, it's going to turn out
that expectation is good and
standard deviation is bad.
 
So if we take this variable
that we just found,
X and Y were both here,
X and Y were both there.
All right, they each had
standard deviation 1 and
expectation 0,
so this is the standard
deviation.
 
So X is here,
and by the way so is Y,
same thing.
 
Well, suppose I put half my
money into X and I put half my
money into Y,
and if I put half my money in
each let's say I get half the
payoff of each.
I make half a bet and get half
the outcome.
What happens to my expectation?
 
Well, the expectation of that
obviously equals 1 half X bar 1
half Y bar which also equals 0.
 
So it's staying the same.
 
The expectation hasn't moved,
but what's the variance of 1
half X 1 half Y?
 
Well, by that formula it's the
covariance--so I'm just going to
do this formula.
 
I'm going to a 1 half here and
1 half here.
So it's the same thing.
 
So it's the covariance of 1
half X with 1 half X the
covariance of 1 half X 1 half Y
1 half and 1 half.
But the covariance of 1 half X
with 1 half X is just,
okay, what is that?
 
It's the variance of 1 half X,
but we already saw from our
definition of variance over
here,
remember, if you double X
you're going to multiply the
variance by 4 because you're
squaring things.
So this is going to turn out to
be 1 quarter times the variance
of X.
 
And this, which is 1 half Y and
1 half Y, is going to be 1
quarter times the variance of Y.
 
And if the two are independent
the covariance will be 0.
So in this example,
these two variables,
if I take the orange
distribution where they're
independent I can do an X
outcome and have this standard
deviation and this expectation,
0 expectation and that standard
deviation,
I can do the Y thing,
get the same standard deviation
or I can put half my money in
each.
 
It seems like a total waste of
time to put half my money in
each.
 
After all, they give me the
same standard deviation,
but no, it isn't.
 
If they're independent you're
shockingly, drastically reducing
your standard deviation.
 
Because if they're independent
the covariance is 0 and so this
plus this plus,
the variance of X = the
variance of Y is just the half
the variance of X = half the
variance of Y.
 
So that's shocking.
 
So the standard deviation,
therefore, the square root of
that is 1 over the square root.
 
So by putting half your money
in each you've now produced this
when they're independent.
 
So this is the standard
deviation of 1 half X 1 half Y,
(X, Y) independent.
 
You move from this point to
that point.
You reduced your standard
deviation without affecting your
expectation.
 
So the first lesson that we're
going to see applied,
this is all mathematics so
mathematicians understood this,
of course, a long time ago,
but to realize this has an
application to economics wasn't
so obvious,
although Shakespeare knew it.
 
It's diversification.
 
So don't put all your,
you know, spread your
investments out into different
waters.
Shakespeare,
you know, Antonio had a
different ship on each ocean,
so instead of putting all the
ships on the same ocean he put
them on different oceans which
he assumed was independent.
 
So he had the same expected
outcome assuming the paths were
just as quick to wherever he was
selling the stuff,
the same expected outcome and
that each of the waters were
equally dangerous,
but he drastically reduced his
variance.
 
And because there were a lot of
oceans and a lot of ships this
number went down further and
further.
So the key is to look for
independent risks.
So that's one lesson in
mathematics that has a big
application in economics.
 
What's a second thing?
 
Well, the second thing is that
if you add a bunch of risks
together, so I'm going to say
this loosely.
If you add a bunch of risks
together, so by the way,
what's the generalization of
this before I say this?
If you had N independent risks
with identical means and
variances, means let's call them
all X bar and variances,
sigma squared.
 
Let's say they all have
expectation E and variance sigma
squared,
each of them has that,
then what happens to the--
so each of them has standard
deviations,
so they're all identical.
Like X and Y have the
expectation 0 and the same
standard deviation 1.
 
Suppose I had 20 of those and I
put 1 twentieth of money into
each of them?
 
What would happen to my
expectation?
1 over N dollars in each one
implies what happens to my
expectation if expectation equal
to what?
Each of them had expectation E.
 
I now split my money among all
of them, all with the same
expectation.
 
That also has to have
expectation E.
All right, just like this thing
putting half my money in Y and
half my money in X,
wherever the X went.
Y was over here.
 
X is there.
 
Half my money in X and half my
money in Y, is going to give me
the same expectation.
 
If I had 12 projects like that
that were independent I'd still
have the same expectation,
but my standard deviation,
what's going to happen to my
standard deviation?
Well, the variance is going to
be--so what's going to happen to
the standard deviation?
 
Student: It would go
down.
Prof: By what factor?
 
Yeah, what's going to happen to
the variance?
Student: 1 over...
 
Prof: Put 1 over N
dollars in each of N identical
but independent investments,
what will my variance be?
Student: >
Prof: The variance is
going to equal 1 over N times
sigma squared.
 
Why is that?
 
Because each one will have 1
over N dollars in it,
so its variance is going to be
1 over N squared times sigma
squared, but there are N of
them.
So it's going to be N over
times 1 over N squared,
so it's just 1 over N,
so implies the standard
deviation--
so I'll call it standard
deviation,
is 1 over the square root of N
times sigma.
 
So it's just this
generalization.
We've got 1 over the square
root of 2, so if I did N of them
instead of 2 of them I'd have 1
over the square root of N.
So those turn out to be very
useful formulas which are going
to come up over and over again.
 
And let's just say it again so
you get this straight.
If I have two independent
random variables,
and I split my money evenly
between them,
and they have the same
expectation,
it doesn't have to be 0,
it could be a positive number,
if I split my money between
them I haven't changed my
expectation because each dollar,
however I split it,
I'm putting it into something
with the same expectation.
But because they're independent
you get a lot of off diagonal
things happening.
 
The off diagonal things,
remember, are canceling.
One investment is turning out
well, X is--sorry that's on the
diagonal.
 
The off diagonal elements are
good in a way because if one
investment's turning out well,
sorry, turning out badly the
other one's turning out well.
 
So here investment Y is turning
out badly, but X is turning out
well.
 
So to the extent you're off the
diagonal you're canceling some
of your bad outcomes because
one's good and the other's bad.
So that way you leave the
expectation the same,
but you reduce the variance.
 
In fact it would be even better
if you could put everything on
the off diagonal,
but to the extent you get at
least some stuff on the off
diagonal you're reducing the
risk.
 
And how fast do you reduce it
when they're independent?
You reduce it dividing it
equally because the variance is
a squared thing,
half your money in one and half
in the other means the variance
of the first is 1 quarter and
the variance of the second is 1
quarter,
but now there are two of them
so the total variance is 1 half
of what it was before.
 
If you have 10 of them each one
is 1 tenth the money so it's got
1 one-hundredth of the variance,
but there are 10 of them so
it's 10 one-hundredths,
1 over N of the variance.
If you take the standard
deviation it's 1 over the square
root of N.
 
So that's the rate at which you
can reduce your uncertainty and
your risk.
 
You'll see this gets much more
concrete next lecture.
So this is just stuff that most
of you know.
So one more thing,
if you add a bunch of
independent things together,
independent random variables,
so I'm going to speak very
loosely now,
variables, you get a normally
distributed random variable,
normally distributed random
variable with the corresponding
expectation and standard
deviation.
So what am I saying?
 
I don't want to speak too
precisely about this because if
you've seen this before and seen
a proof you know everything
about it,
if you haven't it's just too
many subtleties to absorb.
 
But the normal distributed
random variable's the bell curve
that looks like that.
 
It looks like this.
 
So there's the bell curve with
expectation 0.
So it's this bell curve.
 
Now, what's special about it,
it has a particular formula
which has got an exponential to
a minus X squared thing.
Anyway, it's got a particular
formula to it which if you know
you know, if you don't it's
written down.
We're never going to use the
exact formula,
but it looks like that.
 
So these are the outcomes X and
this is the probability,
probability of outcome,
or frequency of outcome.
So the bigger X is,
and this is the mean--equals
0--I've assumed the mean is 0.
 
If you take a really big X it's
very unlikely to happen,
and a really small X it's very
unlikely to happen,
and X's nearer the mean are
pretty likely to happen.
So anyway, it's amazing that if
you add this random variable to
itself a bunch of times it can
only produce 1 and minus 1,
right?
 
This one produces totally
different outcomes,
1 third and minus 3,
they're disjoint outcomes,
but if you add this together
you can get 25 1s and 10 minus
1s,
so that gives you 15.
Over here you could have--25
will never get me there,
so sorry, that was a bad
example.
If I had 30 things I could get
18 1s and 12 minus 1s,
that'll give me 6,
you could have gotten 6 over
here,
but with 30 outcomes you could
get,
you know, all 30 of them could
have turned out to be 1,
and that would have gotten you
pretty close to the same
outcome.
So just because these outcomes
are separate,
once you're adding them up
you're starting to produce
numbers different from 1 and
minus 1,
and these added up--if you take
the right combination of 1 third
and minus a third--
you can start reproducing
things.
 
Like to get a 1 here you could
produce three tops and then
you're producing a 1.
 
So anyway, the shocking thing
is if you add a bunch of these
random variables that are
independent to each other you
get something normally
distributed that looks like that
because this random variable had
exactly the same mean and
standard deviation.
 
You add the same number of
these you're going to get
outcomes that are almost
identically distributed.
So in the limit this random
variable, enough of these added
together looks exactly the same
as these added together.
That's the second surprising
mathematical fact.
And the third thing that we're
going to use is that the normal
distribution is characterized by
the mean and standard deviation,
that's all it takes to write
the formula of this down,
and these numbers,
these are called thin tailed.
These probabilities go to 0
very fast, so you shouldn't
expect many outlying dramatic
things to happen.
And in the world they do
happen, and so we're going to
see that much of classical
economics is built on normally
distributed things and so you
can't see--
you shouldn't expect any
gigantic outliers to ever
happen.
 
And it seems natural to build
it on that kind of assumption
because if you add things that
are independent you get normal
distributions all the time.
 
And things seem independent so
why shouldn't you get normal
distributions,
and yet we must not get it
because we have so many
outliers.
So that's the basic background
of mathematics.
Are there any questions about
any of that?
I'm just assuming you know all
that and now we're going to move
to economics.
 
I think that's all the
background you need.
I want to do one more thing,
which is maybe background,
but it's used in economics all
the time, and it's called the
iterated expectations.
 
So if I told you that these
variables were correlated like
these up here,
like the orange things,
if I told you what X turned out
to be that would tell you a lot
about what Y was going to be.
 
So for example,
if I told you that X
was--sorry, the white ones are
the correlated ones.
If I tell you that X has turned
out to be 1, that tells you that
Y has to be a good outcome of 1
third, because if X is one this
never happens.
 
So the only thing that can
happen if X is 1 is that Y turns
out to be 1 third,
so knowing X is going to
completely change your mind
about the expectation of Y.
So conditional expectation,
I should have said this before,
conditional expectation simply
means re-computing expectation
using updated probabilities from
your information.
Now, you've probably done this
in high school,
so I'm just going to assume you
know how to do this.
So in this case if I tell you
something like X has turned out
to be 1 that tells you that only
these two outcomes are possible.
So that means that the only two
outcomes in the white case have
happened with probability of .5
and 0,
but if I tell you X has come
out to 1 the conditional
probabilities have to add up to
1.
So you just scale things up.
 
So you know that Y had to have
been the good outcome up here.
If I tell you that the bad
outcome for Y has happened then
you have probabilities of .1--so
this 0 makes things too easy.
Suppose I tell you the good
outcome of Y has happened.
What are the chances now that X
has gotten the good outcome in
the white probability case?
 
If I tell you that Y turned out
to be 1 third in the white
probability case what's the
probability that X turned out to
be 1, conditional on that?
 
Student: 5 ninths.
 
Prof: 5 ninths,
so that's it,
because the probabilities are
now--you're reduced with .4 and
.5 so 5 ninths of the time.
 
So that's an idea which I
assume you all can--it's very
intuitive, and it's way too long
to explain, and I'm sure you
know how to do that.
 
So anyway, the conditional
expectation, blah,
so the iterated expectation is
simply this.
It's an obvious idea,
but it's going to be incredibly
useful to us.
 
It says if you ask me what are
the chances that the Yankees are
going to win the World Series
against the Dodgers--
let's suppose that's who's
going to play--
the Yankees are going to beat
the Dodgers,
what's the probability that's
going to happen?
What do you expect the chances
are?
If I then ask you my opinion
after the first game,
well, obviously if the Yankees
win the first game my opinion's
going to go up,
so I'm going to have a
different opinion.
 
If the Dodgers win the first
game my opinion is going to go
down, so I'll have a different
opinion.
But you can ask now another
question, what's your expected
opinion going to be?
 
So the law of iterated
expectations is,
the expectation of X has to
equal the expected expectation
of X given some information.
 
So here is what I think.
 
The Yankees are 70 percent
likely to win.
If I say after the first game
[clarification:
if the Yankees win]
I'll think it's 80 percent,
and after the first game if the
Dodgers win I'll think it's gone
down to 65 percent,
it had better be that the
average of my opinions after the
information is the same as the
number I started with.
 
That's just common sense and
I'm not going to bother to prove
that.
 
So that's incredibly important.
 
It's not only the expectation
of X,
but as you learn stuff you can
anticipate your opinion's going
to change,
but your average opinion has to
always stay the same as X was.
 
So that's the last of the
background.
And now I want to do a simple
application of this.
So in fact, to that very
question, suppose that you're
playing a World Series.
 
The Yankees are playing the
Dodgers and let's suppose that
the Yankees have a 60 percent
chance of winning any game.
I'll just do it here.
 
The Yankees have a 60 percent
chance of winning any game.
What's the chance the Yankees
win a 3 game world series?
How do you figure that out?
 
Well, a naive way,
a simple way of figuring that
out is to say,
well, what could happen?
Life can mean a Yankee win,
let's call that an up,
or a Yankee loss,
let's call that a down,
and this could happen with
probability .6 or .4.
The Yankees could win again,
so that's probability .6.
We have two Yankee wins,
or the Yankees could lose the
second game so that's
probability .4.
The Yankees could lose or could
win.
That's .6 and this is .4,
and we've only played 2 games.
The Yankees could win a
third--well,
you don't need to play this
game because they've already won
a three game series,
but if you did it wouldn't
matter, .4,
or we could go up or down.
The Yankees after winning and
losing could then win
probability .6,
or could lose,
or after losing and winning
they could win again or they
could lose.
 
After losing and winning they
could lose, so this is
probability .4 and this is .6,
and then finally we have this
and we have this.
 
So this is .6 and .4.
 
So this is what the tree looks
like.
You could imagine 8 possible
paths each of length 3 where you
give the whole sequence of wins
and losses.
So to compute the probability
that the Yankees win you look at
all the--so in this case the
Yankees win.
They would have already won
here, but if you play it out it
doesn't matter.
 
They're going to win here and
here.
They've got two wins and one
loss.
Here they've got one win,
two wins and one loss.
They win.
 
Here they've got loss, win, win.
 
They win the World Series.
 
Here they lose, win, lose.
 
They lose the World Series.
 
Here's lose,
win--it's win,
lose, lose, they also lose the
World Series.
Here it's lose,
lose, they've lost the World
Series, loss.
 
So these are the possible
outcomes.
So you could compute the
probability of every path,
there are 8 of them,
and then multiply that
probability by the outcome and
you'll get the chance that the
Yankees will win the World
Series,
right?
 
That's clear to everybody?
 
But there's a much faster way
of doing it and putting it on a
computer, and that's using the
law of the iterated expectation.
So first of all--so this is
called a tree.
So we're going to use trees all
the time.
So tree, I don't want to
formally define it.
It's just you start with
something and stuff can happen.
Stuff happens every period,
and so you just write down all
the things that can happen.
 
And then you write down all the
things that can happen after
that and the thing unfolds like
a tree.
That's formal enough to
describe a tree and here we've
got it.
 
But you notice that the tree
the number of things happening
grows exponentially.
 
It's horrible to have to
compute something growing
exponentially,
but they're often recombining
trees.
 
Oh, so if I ask,
by the way, in this tree
whatever the opinion is here,
which turns out to be .68
something,
yeah, I should have asked you
to guess,
.68 something.
If you write down the opinion
that opinion has to be the
average of the opinion here and
the opinion here.
So if I take the opinion here
times .6 plus the opinion here
times .4 that's also going to
equal .68.
And that's what's going to be
the key to computing the thing
much faster rather than going
through every branch which is
such a pain because there are an
exponentially growing number of
paths,
very bad to have to compute by
hand.
 
But we notice that we can look
at a recombining tree.
These two nodes are essentially
the same.
What difference does it make if
the Yankees win one and lose
one, or lose one and win one?
 
In both cases they're at the
same spot.
They're even in the World
Series.
And since we assume the
probability of winning any game
is the same,
.6 and .4, independent of
what's happened before--
you might think you're learning
something about,
"Oh, their starter pitched
here and he didn't last the
whole game,"
and stuff like that.
 
So I'm not allowing for any of
that.
I'm just saying it's a (.6,
.4) chance for the Yankees to
win no matter what happens.
 
So all you care about at any
point from then on is who's won
how many games.
 
So these nodes are basically
identical,
and these nodes are identical,
because it all ended up with
the Dodgers ahead 2 to 1,
and here the Yankees were ahead
2 to 1,
and here the Yankees were ahead
3 to 0,
and 0 to 3.
So the recombining tree which
has all the same information is
just this, this,
this, this tree.
So this three only has 1,2,
3,4,
5, has far--it's 1 node,
2 nodes, 3 nodes and 4 nodes as
time goes by growing linearly
instead of growing 1,
to 2, to 4, to 8 which is
growing exponentially.
So I could have a very long
World Series and write it as a
finite tree and just .6 and .4
here at every stage.
So how am I going to solve this
now?
Well, over here I know the
Yankees ended up winning all 3
games.
 
Here they won 2,
here they won 1,
here they won none.
 
So those are the outcomes.
 
So instead of trying to figure
out path by path,
through these exponential
number of paths what the chances
of each path are,
why it's hard to compute here,
it's .6 times .6 times .4,
a complicated calculation,
I'm now going to do something
simple.
I'm going to say,
what would I think if the
Yankees had already won 2 games?
 
Well, I know that they would
win.
That's a 1.
 
The series is already over.
 
What would I think after the
Dodgers won the first two games?
I'd know it was over.
 
What would I think--so how did
I get that?
It's .6 times 1 .4 times 1.
 
That's 1, the Dodgers .6 times
0 .4 times 0 that's 0,
so that's my opinion if the
Dodgers win 2 games.
Here's my opinion if the
Yankees won 2 games.
What would my opinion be if
they split?
Well, if they split what would
my opinion be if I started here?
So after game 2 they've each
won 1 game.
I don't know who won the first
one, but it was 1 to 1 after 2
games.
 
Now what would I think?
 
Student: .6 times 1 .4
times 0.
Prof: Exactly,
so it's .6.
It's .6 times 1 .4 times 0.
 
So the odds,
I would think,
the Yankees would win the World
Series here with 1 game left
knowing that they win 60 percent
of the time it's .6.
But now what do I think if the
Yankees win the first game?
What's my opinion?
 
Student: .6 times 1 .4
times .6.
Prof: So it's .6 times
1,
so it's .6 .4 times .6,
so that's .24,
so that's .84 here,
and what's my opinion after the
Yankees lose the first game and
the Dodgers win?
What do I think is going to
happen?
What will my opinion be here?
 
It's .6 times having an opinion
of .6, so it's .36 .4 times
knowing that it's all over .4
times 0.
So it's equal to .36.
 
So I've now figured out--not
only am I solving this thing
much faster than I could over
there, but I'm finding
interesting numbers on the way.
 
I'm now figuring out what would
I think after the Yankees won
the first game?
 
Well, now I think it's 84
percent.
What would I think after the
Dodgers won the first game?
I'd think it was only a 36
percent chance of the Yankees
winning.
 
So now what's my opinion at the
very beginning?
It's .6 times .84 (it's my
chance of having this opinion
plus my chance of having that
opinion) .4 times .36.
Oh no, 504 (maybe) 144 what is
that?
Student: .648.
 
Prof: .648,6 times 84
looks like 504 and 4 times 36
looks like 144,
so it looks like .648 and
that's what you said.
 
So that's it.
 
I've solved it now.
 
So that's the method of
iterated expectation and we're
going to turn this into quite an
interesting theory in a second,
but I want to now put that on a
computer to show you just how
completely obvious this is,
I mean, not obvious,
fast this is.
 
So you could solve for any
number of--a series of any
length you could instantly
solve.
Now, we're going to price bonds
that way too.
So class--so what did I do?
 
I--this is a spreadsheet you
had.
I simply had the probabilities
of the Yankees winning which was
.6, which I could change.
 
Student: Can you lower
the screen?
Prof: Oh.
 
Student: Thank you.
 
 
 
Prof: So this is the
simplest thing to do,
but now suppose that--so we
said the Yankees can win every
game with probability .6.
 
So then what did I do?
 
I went down to here.
 
I gave myself some room.
 
I didn't do a very long series.
 
So now what does each of these
things say?
Each of these nodes,
like that one,
says, if I can read it,
it says--so this is my opinion
of winning the World Series.
 
It says my opinion here is
going be the chance I go up.
That's the probability,
that's A 2,
that's .6, the chance I go up
times what my opinion would be
over here,
plus the chance that I go down,
which is here,
the chance I go to here which
is 1 minus that number .6 that's
frozen up there,
times whatever I thought would
be my opinion here.
So you see that's the same--I
just write that once.
I wrote that once here,
that thing about the
probability, my opinion there is
the probability of going up.
That's S A, dollar A dollar 2,
that's .6,
it's frozen,
times what my opinion would be
and the square over 1 and up 1
plus 1 minus dollar A dollar 2
times my opinion over 1 and down
1.
So I just copied that as many
times I wanted to down the
column and then I copied it
again across all the rows.
So all of these entries are
identical, they're all just
copies of each other.
 
So it's just says iterate your
opinion from what you know it
was forward.
 
Now, how do I take a 3 game
World Series?
Well, we're starting here.
 
This'll be game 1,
game 2, game 3,
so all I have to do now is put
1s everywhere here like 1 enter,
and now I'll copy this,
ctrl, copy,
and go all the down here.
 
So that's it.
 
So we've got all the numbers.
 
So why is that?
 
Because my opinion
here--remember the numbers we
got?
 
The series goes 1 game,
2 games, 3 games,
so if you end up above the
middle that means the Yankees
won the majority of games.
 
Your pay off is 1.
 
Your probability of the Yankees
wining is 1.
So now what's your opinion
going to be?
If you've won 2 games then the
Yankees have to have won.
What if the Yankees win the
first game?
Remember the numbers we got 1,
and .6, and 0,
so here's the .84.
 
It's the average of 1 and .6.
 
Here's the .36 which was the
average of .6 and 0.
And then we come down to the
middle which is .648.
So what do I do if I want to
play a 7 game World Series?
I have to get rid of this,
and if it's a 7 game World
Series I would just--
now I want to restore what I
had before,
so I'm going to copy all this,
ctrl,
copy, ctrl.
So I'm back to where I was
before.
So you see what I'm doing here?
 
The game hasn't started.
 
This is the first game,
second game,
third game, fourth game,
fifth game, sixth game,
seventh game.
 
Every square is just saying my
opinion is my average of what my
opinion will be next time.
 
If I want to make it a 7 game
World Series I just plug in 1s
here.
 
There must be some faster way
of doing this,
but I plug in 1s here.
 
So ctrl, copy and here are all
the 1s down to above the thing,
ctrl V, and now I've solved my
opinion backwards and I've got
the chances of the Yankees
winning a 7 game World Series
are 71 percent.
 
So the longer the World Series
goes the better the chances are
the Yankees win if they're
better in each individual game,
and you can do it instantly.
 
So are there any questions
about that?
So that is a trick we're going
to use over and over again to
price bonds.
 
You do it by backward induction
because of the law of iterated
expectations.
 
Your opinion today of what's
going to happen way in the
future when you get a lot of
information has to be the
average opinion you're going to
have after you get some
information,
but before you know what the
final outcome is.
 
And so realizing that,
you just take the pieces of
information one by one and work
backwards from the end and you
can solve things instantly which
would take in the brute force
way an exponentially growing
length of time to do if you did
them path by path.
 
I now want to turn to an
application of this to one
subject, which is,
let's just not do the World
Series.
 
Let's do a more interesting
problem.
I hope I have time to finish
this story.
So the more interesting problem
is this.
Let's suppose our uncertainty's
of a different kind.
Instead of not knowing the
outcome of the World Series
let's say we don't know how
impatient we are.
So remember the most important
idea so far that we've seen,
because we haven't done
uncertainty yet,
the most important idea we've
seen so far is impatience.
That's the reason why you get
an interest rate and the
interest rate is the key to
finding out the value of
everything.
 
So Irving Fisher put tremendous
weight on impatience.
And now that we're talking
about uncertainty the natural
thing to make uncertain is how
impatient you're going to be.
So we want to talk a little bit
more about impatience.
So impatience by Irving Fisher
is the discount.
So in fact I want to talk about
this in sort of realistic terms.
Do we really believe that
people just discount the future,
1 year they discount by delta,
2 years discount by delta
squared,
3 years by delta cubed,
4 years by delta to the fourth.
 
Is it really true that every
year people think of as delta
less important as the year
before?
I mean, the argument for this
is you might not live beyond a
certain--
you know, poor imagination,
so imagination,
poor imagination,
we've said this before,
poor imagination and mortality
are the two arguments for
discounting.
But let me tell a story that
seems to contradict that.
Suppose someone asks you to
clean your room and they give
you a choice of doing it--I can
give my son for example.
Say I--"Clean your room
Constantin,"
and so if I say do it today or
do it tomorrow that makes a huge
difference to him,
I mean just a huge difference
doing it today from doing it
tomorrow.
He'll think doing it today is
just impossible,
doing it tomorrow I can almost
force him into agreeing to that.
So clearly there's a big
discount between today and
tomorrow, but what about between
a year from now and a year and a
day from now?
 
Do you think Constantin will
think there's any difference in
that?
 
The answer is no.
 
If I say, "Constantine,
do you agree to clean it 365
days from now or 366 days from
now," to him there's hardly
any difference,
but there's hardly any tradeoff.
One is hardly more valuable
than the other,
of course, they're both pretty
unimportant, but the ratio of
the two doesn't even seem
important to him.
So that's called hyperbolic
discounting.
If you do any experiment with
people or with animals,
you make a bird do something
and if he does more stuff he
gets the things faster,
he'll do a lot of stuff to get
it in the next minute as opposed
to in 2 minutes,
but the difference between what
he'll do in 10 minutes versus 11
minutes is very small.
 
So hyperbolic discounting is
discounting much less than
exponential discounting.
 
So this has a tremendous
importance for the environment.
If you thought that people
exponentially discounted like
they thought each year was only
95 percent--
if the interest rate's 5
percent it sounds like the
discounting is .95,
so if next year's only 95
percent as important as this
year,
and the year after that is only
95 percent as important as the
first year,
and the third year is only 95
percent as important as the
second year,
.95 in 100 years to the
hundredth is an incredibly small
number.
 
So there's no point in doing
something today and investing a
lot resources in order to clean
up the environment and help
people 100 years from now,
because by discounting it this
much nobody could,
you know, what's the difference
because the future's so
unimportant.
You shouldn't be investing
resources now to do something
that's going to have such a
small effect later.
So in all the reports on the
environment a crucial half of
the report is devoted to what
the discount rate should be.
So, but they never thought of
doing the most obvious thing
which is to ask what would
happen if the discounting was
uncertain.
 
All of these are certain
discount rates.
So what if you made the
discounting uncertain what would
you imagine doing?
 
So suppose you discount today
at 100 percent,
and maybe next period you're
going to discount at 200
percent,
this is the interest rate,
and here it might go down to 50
percent.
It could go up to 400 percent
or it could go down to 100
percent again,
or it could go down to 25
percent, you know,
this kind of discounting I have
in mind.
 
You don't know--so delta = 1
over (1 r), and this is r,
r_0,
r_up,
r_down.
 
So maybe the discount is
uncertain and it goes like that.
So it's a geometric random walk.
 
I keep multiplying or dividing
by 2.
I multiply or divide by 2.
 
I multiply or divide by 2.
 
That seems to make for a lot of
discounting.
These numbers are going up very
fast.
The higher the r,
the less you care about the
future.
 
So the question is if you ask
for a dollar sometime in the
future, what will people be
willing to pay for it?
So you know today that you
think the future is only half as
important as the present.
 
Let's say these all have
probability of half.
And tomorrow it might be that
you think the future is only 2
thirds,
the next year's only 2 thirds
as important as that current
year,
or you might think the future's
only 1 third as important as
this year.
 
So you see how this is working?
 
Two years from now you might
think the future's only 1 fifth,
the third year's only 1 fifth
as important as the second year.
Here you might think the third
year is half as important as the
second year.
 
Here you might think it's 4
fifths as important as the third
[correction: second]
year.
So you don't know what it's
going to be,
and if anything this process
seems to give you a bias towards
getting really high numbers,
high discounts,
meaning the future doesn't
matter.
So, but nobody bothered to
stop--so this is the most famous
interest rate process in
finance.
This is called the Ho-Lee
interest rate model where you
think today's interest rate
might be 4 percent.
Maybe it'll be 10 percent
higher next year or 10 percent
lower and it'll keep going up
and down like that,
and that's the uncertainty
about the interest rate.
So if we think interest rates
are so important,
and patience is so important,
and we want to add uncertainty,
the first place to do it is to
the interest rate,
and the Ho-Lee model in finance
does that.
Nobody bothered to compute this
out more than 30 years.
Compute what out?
 
Suppose you get 1 dollar for
sure in year 1.
How much would you pay for 1
dollar in year 1?
Well, your discount is 100
percent.
You'd pay 1 half a dollar.
 
How much would you pay for 1
dollar in year 2?
Well, you know how much more a
dollar now is worth than 1 year
from now,
but you don't know 2 years from
now so you have to work by
backward induction.
Here 1 dollar for sure is worth
1 dollar.
What would I pay for it here?
 
I'd pay 1 third of a dollar.
 
What would I pay for it here?
 
Well, the discount is 2 thirds.
 
I'd pay 2 thirds of a dollar.
 
So what would I pay for it back
here?
I'd pay 1 half times 1 third 1
half times 2 thirds discounted
by 100 percent.
 
So that's 1 third 1 sixth which
is 1 half, times 1 half,
which is 1 quarter,
I guess.
So I'd pay 1 quarter.
 
So for any time I could figure
out D(t) = amount I would pay,
I'm going to be done in one
minute,
amount I would pay today for 1
dollar for sure at time t.
And that number,
obviously, is going to go down
as t goes up,
and we know how to compute it
by backward induction.
 
You just put the 1s further and
further out and then you go
backwards by backward induction.
 
But just like for the World
Series I could do that any T
however big I want to,
and on a computer,
and the spreadsheet which I
wrote for you,
you could do this instantly.
 
And nobody bothered to do this
for T bigger than 30 because
bonds basically don't last for
more than 30 years,
so what's the point in doing it
for T bigger than 30?
So 100 years--there are
virtually no financial
instruments that are 100 years
long because they didn't both to
do this.
 
Suppose you did it for every T
up to 1,000 years?
Well, you could do it on a
computer very easily.
You could even prove a theorem
of what it's like.
So in the problem set I'm going
to ask you do a few of these,
and what you're going to find
is that people are hyperbolic--
that you get--you discount a
lot.
It's pretty close to 100
percent for the first few
periods,
but after that you're going to
be--anyway,
you're going to find out what
the numbers turn out to be when
you do it on a computer.
So we're going to start with
random interest rates next
period, the most important
variable in the economy.
 
 
