Okay, so let's get started.
Happy Halloween.
Halloween is one of my favorite holidays.
Symbolic of 110 in various ways.
In particular, I think we learn
a lot of tricks in this class.
You get a lot of treats too.
So there's the perfect Stat 110 holiday.
Speaking of tricks,
a student from last year emailed me and
told me that there's an open request
at the Bureau of Study Council for
someone to quote,
teach me all of Blitzstein's tricks.
So that's probably gonna
be a difficult to fulfill,
so that's an open request with the BSC,
but
anyway, a lot of tricks and
a lot of treats.
So today's gonna be a very,
very scary lecture as befits the holiday.
In particular, the first half,
we're gonna do the beta distribution.
And to give you some indication of how
scary, terrifying the beta distribution is
I'll tell you a true story which is
the first time I taught Stat 110,
students were so just clearly just
terrified of the beta distribution.
I didn't really know why,
actually it should not be scary, okay?
So I'll try to who you how it's not
that scary, but for some reason,
they were extremely scared
of the beta distribution.
Then on the final, I decided to be nice.
I didn't put anything involving
the beta distribution, cuz they were so
scared of it.
But then that really backfired on me,
because I think it was the students
were seeing the beta
distributions like a ghost.
And they were just seeing
betas lurking everywhere.
And when I was grading them with the TAs,
there were just beta distributions
popping up everywhere on the exam.
Even the questions that had nothing to
do with the beta, they were so scared.
Is there a beta lurking
here behind the corner?
And it was just beta everywhere.
So I think it would have been nicer
if I'd just put one easy beta
question at the beginning, then you kind
of get that fear out of your system.
So anyway, let me tell you
what the beta distribution.
It's not scary but for some reason,
it has been in the past but
it shouldn't be, okay?
Well, what is it?
It's a generalization of
the uniform distribution.
So far, the only distribution
we know by name that is both
continuous and
bounded is the uniform, right?
The uniform,
let's say uniform from zero to one.
Well, by definition it
goes from zero to one.
And the PDF is just completely flat, okay?
What if you want to generalize that and
other continuous distributions like
the normal goes from minus
infinity to infinity?
Exponential goes from zero to infinity.
What if we want something that's still
bounded between zero and one, but
is not just flat?
Well, by far, the most widely used
distribution that meets those criteria is
called the beta distribution.
And it's not one distribution
because it has parameters.
So, it's a whole family of distributions,
which we call beta a, b.
So it has two parameters,
a and b, where a and
b are any positive real numbers, Okay?
And, It takes values between 0 and
1, as I said, and
let's write down the PDF.
The PDF, Is
proportional to x to the a- 1,
1- x to the b- 1, for
x between 0 and 1, 0, otherwise.
C is a constant, so
this is the normalizing constant, okay?
So one question we'll have to deal with
is, can we actually integrate this thing,
dx from 0 to 1, so we can figure out what
c has to be to make it integrate to 1?
And that's a famous integral in math.
That's called, if you integrate this,
that's called the beta function.
And it has a long history in math
aside from its application in
probability in statistics.
So we will talk about that integral at
some point, but there's actually a lot
that we can get from studying this without
even know what the constant is yet.
So, now here are just some facts and
properties.
We've been emphasizing stories, right?
Why write this down unless
we have a story for it?
The beta distribution actually has many
stories to it, and we're not gonna try to
do all of them today, but we'll see
more and more stories for the beta.
But the most basic reason why betas are
interesting is just that it's a flexible
family of distributions,
Of continuous distributions on 0, 1.
And by flexible,
I mean that as you vary a and b,
you can make this take different shapes,
right?
So just for a couple quick examples,
if we let a = b = 1,
then this whole thing goes away.
We just have a constant, so
that's just the uniform distribution.
So it is a generalization of the uniform.
So the PDF would just look like that.
But we could let a = 2 and
b = 1, for example,
they don't have to be integers either.
It can be any positive real numbers, okay?
We could go, a = 2,
b = 1, this term is gone.
Then we just have x, so we will just have
something that increases linearly up to 1.
This is 1 here, right?
It would just increase linearly.
And it's whatever makes the area of
this area of the triangle has to be 1,
so figure out the slope.
More interesting ones,
if a = one-half and b = on-half,
you get something that looks like
a U shape, kind of like this.
Here's 1.
And notice that if it's one-half,
that's x to the negative one-half.
So as x approaches 0, it's gonna blow up.
So this is going up to infinity over here.
And as x goes to 1,
this term is gonna blow up.
So it's asymptotic like that, so
you can get something like that.
And if you let a = b = 2,
then you'll get something that is just
a nice looking upside down U shape.
For example,
these are just a few examples.
You can plug in other, but
that's why I say that it's flexible.
That there isn't one shape that it takes,
and that makes it a useful modeling tool.
So the main use of it is that it is often
used as a probability for probabilities.
That is it's often used as a prior,
For a parameter,
that takes values between 0 and 1.
So this is going back to you
had some homework stuff.
And we talked a little bit in class
about the Blitzstein approach
where we treat the parameters
as random variables.
So if we have a parameter
that's a probability,
we know it's between 0 and 1, and
we wanna to give it a prior distribution.
The beta distribution is, by far,
the most widely used choice in practice,
because it has a lot of nice properties.
In particular, it's what's called
the conjugate prior to the binomial.
If you take step 1, 11, you'll be
seeing a lot of conjugate priors.
Obviously, I'm not assuming you
know the term conjugate prior yet.
I'll explain what this means.
So this is foreshadowing 1,
11 in a sense, but it's also just a 1,
10 calculation that you should be able
to do Once you know what conjugate prior
means, which I'll explain in a bit.
Okay, and
then it also has various connections with
other distributions that we'll see later.
So it's a very well
connected distribution.
It has lots of nice properties and
relationships with other distributions and
it makes it a useful tool.
Okay, so,
let me explain what this conjugate
prior thing means cuz I just wrote that.
But, obviously you don't
know what that is and
almost you happen to
have seen that before.
What we're gonna do is generalize
Laplace's rule of succession.
So remember that was the problem with,
what's the probability that the sun rises,
kind of thing.
Where we assume complete ignorance on
the probability of the sun rising.
Well that's what Laplace did, obviously it
doesn't have to be about the sun rising,
we're actually not that ignorant.
You have some problem where
we're completely ignorant and
we modeled that probability
using a uniformed prior, okay.
So right now we want to
generalize that and say
what happens if instead of a uniform
prior, we have a beta, A-B prior?
And we don't even know what
the normalizing constant is yet.
But we don't actually need that for
this calculation, as you'll see.
Okay, so here's the conjugate prior thing.
And I'll explain the word conjugate
as this develops, for the binomial.
So here's the basic problem
which will look familiar, but
we're generalizing, okay.
So the problem is we
observe a binomial given p.
X given p is binomial
Okay, that's our old friend the binomial.
If we're treating p as this notation
means if we pretend that p is known,
it's just our friend the binomial.
But p itself Is unknown and
we're giving it a distribution
to reflect the uncertainty of p.
So in practice, the most widely used
choice is some beta distribution.
So we're gonna let p be Beta (ab).
Okay, so that's called the prior.
That is our reflection of
our uncertainty about p.
That is we don't know
the true probability, so
we're just treating it
as a random variable.
X is the data,
X is what we get to observe.
Before we observe X, we have a prior on p,
that's our prior uncertainty.
So this is not based on the data, okay.
Then after we observe x then we
wanna update our probabilities,
right, using Bayes' rule.
So that's why this is
called Bayesian statistics,
cuz we're updating our believes
based on getting the evidence.
So we want to find
the posterior distribution.
That is the distribution p given X.
Now we get to know X,
and what's our update?
Originally we have a Beta (a,b), and then
what happens after we're allowed to use X?
So, how do we do this?
Well Bayes' rule, so this is one of these
hybrid forms of Bayes' rule, right.
So ust to make the notation clearer,
I'm going to write this as F(p|X=k).
You can also write this with
a capital just the capital X
under the interpretation that it's just
short hand for something like this.
That is, we get to treat X as known.
Okay, so this means you know,
this is a posterior PDF.
Right, p is continuous, so it has a PDF.
So this is what the PDF
of p given that we got
to know now that X is in fact equal to k.
Alright, so that is what that means.
So by Bayes' rule,
that's the p(X=k|p) times f(p),
that's the prior,
divided by the probability P(x=k).
Where in the denominator,
you know, key distinction between
the numerator and denominator.
Obviously you can't cancel these out,
this one is a function of P.
This one is treating P as unknown.
This one is saying, integrate over all P.
So this one does not depend on P because
this is just going to be
a constant with respect to P.
It will depend on k but it won't depend
on P because we've integrated out the P.
So this one does not depend on P.
All right,
now let's just see what we have here.
Probability of x equals k given P.
That's easy, that's just n choose k,
p to the k, 1- p to the n- k.
Now f of p, that's just the beta density.
So that's a constant whatever
that constant c is over there.
We don't know what c is,
but it wont matter.
And p to the a- 1(1- p) b-1
divided by the probability that X = k.
Now, seemingly to do and
probably X = k we would have to
use the law of total probability.
That was, we integrate and condition on P.
And we can do that, but
there's a much faster way to do this,
which is just to look at the structure
of what we have, up to proportionality.
Okay, so I'm just gonna write
the proportionality symbol,
that is I'm gonna ignore constant.
And by constant I mean something
that does not depend on p, okay.
So let's ignore constant.
So actually we can ignore the n choose k,
cuz there's no p there, we can ignore c,
cuz there's no p there, and we can ignore
this thing, because there's no p there.
So, all that's left is this p to
the power, 1 minus p to a power.
So, we have p to the a + k- 1 (1- p),
just adding the exponents.
(1-p)b p + n- k- 1, and
that immediately tells us the answer.
So we didn't have to do any
complicated calculations at all
just group the ps together,
1-ps together ignore the constants.
And then that immediately
tells us that p given x,
that's a beta distribution, right.
It looks like a beta, there's some
constant in front but the constant is just
whatever constant is needed to
make this integrate to one, right.
So its still of the form
of a beta distribution.
So this is going to be
a beta with parameters.
K got added, but if I write it this way,
I'm just going to write Beta
(a + X, b + n- X).
This is what the notion of
congregant prior means.
The definition of congregant,
it's not just one prior,
it's a family of priors, and
that family of priors is called conjugate.
If we start with a member of the family,
so
we're looking at not just one beta
distribution, it's the family of all
beta distributions where you can
change the parameters around.
We start with a beta distribution,
compute for the prior,
compute the posterior, it's still beta
where these are updated parameters.
That's called the conjugate property,
so this is the conjugate prior for
the binomial.
And this has a pretty intuitive
interpretation, that if we think
of X as the number of successes
and- X as the number of failures.
The way you can remember this result and
understand it intuitively,
is pretend that we had some prior data
that we might have done earlier experiment
were we had a successes and b failures and
then we're adding on now we have X.
New successes, and
n- x new failures, okay?
So it stays in the family.
That's computationally very, very, nice.
But that doesn't mean it's true in any
sense and there are philosophical debates
about should you use conjugate priors and
things like that.
Is that the correct way to correct
your uncertainty, things like that.
But there's no denying that it's
convenient cuz you start with a beta you
still have a beta.
You don't need to work out a whole
new theory of some new distribution.
So it's very convenient in this sense,
very, very flexible.
So this is a generalization of
the Laplace rule of succession.
That was the case where we let this be
beta 1 1, which would be uniform, okay,
but the same thing works.
So next time we'll derive the mean and
the variance of the beta,
but you can already see that that's
going to be kind of nice too.
Because, just stare at this density,
I won't write it out now,
we'll do this next time.
But just to have some intuition for what's
going to happen, look at this density.
Suppose we wanted to compute
the expected value of x.
Then we're just going to integrate
this thing with an x in front, right?
Just definition of expected value.
So we could combine this x,
this x a- 1, we'd have x to the a.
It looks like a beta again.
So once we know the normalizing constant,
we can immediately write down the answer.
What if you wanted the second moment,
which you would use to get the variance?
Then by LOTUS you would be putting an x
squared in front to get the second moment.
X squared x to the a -1 is x to the a + 1.
It still looks like a beta.
So once we know the normalizing constant,
we can immediately write down the answer.
So it has very,
very nice properties in that sense.
All right, so let's not, we're not gonna
do the full normalizing constant today.
But I wanted to do one important special
case which is the case of how do you get
the normalizing constant
in the integer case, okay?
So a and
b don't have to be real numbers, but
let's assume that we
have integers up there.
So another way to phrase that question is,
here's what we have to do,
we have to integrate.
From 0 to 1, Of, let's say,
I'll write it in a way that
looks a little more reminiscent
of a binomial, but
it's the same basic thing.
Xx to the n- k dx.
I wanna do that integral.
Now we have integers,
nonnegative integers up here, and
k is an integer between 0 and n.
We wanna do this.
Well, okay, so if I first gave
you this integral you might say,
well use the binomial theorem,
expand this out.
Multiply this whole thing out, integrate
every term, and then try to simplify and
it will be very tedious, okay?
So let me phrase this
question a different way.
Find the integral of x to the k, (1-
x) to the n- k, without using calculus.
So my favorite calculus problems are the
ones where you don't have to use calculus.
And so, that sounds impossible.
That's actually one of
Bayes' most brilliant ideas.
Bayes' rule, itself, as we saw, you just
write down a definition of conditional
probability it's just like
one line you got Bayes' Rule.
But Bayes also had an argument
around 1760 something,
he did this integral
without using calculus.
So I wanted to show you his argument
cuz it's very beautiful but
it also gives us the normalizing
constant for these integer cases, okay?
So what does it mean to do
it without using calculus?
Well, we're going to use a story.
And that story is call Bayes Billiards.
Okay so this becomes a little bit
nicer if I put an n to k here.
I'm just multiplying by a constant.
So we could always adjust for that later,
but this looks a little bit nicer.
All right, so here is what Bayes did.
He said, consider that we
have n + 1 billiard balls,
that's like if you're playing pool.
This is exactly Bayes' argument in 1760s.
He said take n + 1 billiard balls and
then you can do one of two things.
Originally they're all white and
they all look the same and then we paint
one of them Pink.
So now at this point we have n white
balls and one pink ball, okay?
That's something we could do,
and then throw down the balls,
throw them on (0,1) independently.
That is, we're just positioning them,
we have the number line from 0 to 1.
Here's 0, here's 1.
And just throwing all the balls down,
you get some arrangement here,
some white balls here.
And maybe the pink ball
has to land somewhere,
maybe it's there this is just an example.
So we have iid uniforms.
Okay so that's the first story,
Here's an alternative.
So here we painted the ball first.
We painted one ball pink,
threw them all down.
But another thing we could do is throw all
the balls down at the number line, and
then paint a random one pink.
So first throw them paint.
Then paint one pink.
Okay now if I just
showed you this picture.
And didn't tell you where that picture
came from you would have no idea whether
it came from this story or
this story, right?
It's completely equivalent procedures, why
should it matter whether you painted it at
the beginning or the end,
it doesn't matter.
The final configuration is gonna have
the same distribution, all right.
So the key to the argument
is that these two
ways of generating a picture that looks
like this are completely equivalent.
That's tells us the value if that
integral because now let's just
let x equal the number of balls
to the left of the pink one.
Which I denote like that, okay, that's x.
So in this picture, x equals three, for
example, just count how
many are to the left.
That's going to be some integer
between zero and n, all right?
Now, according to this first story it
doesn't matter what order
we throw the balls in.
So we may as well throw the pink one
first, then condition on where that is.
And so we want an expression for
the probability that x = k.
That is, there are exactly k white
balls to the left of the pink ball.
So by the law of total probability,
we should just integrate.
Probability that x equals k, given p f
of p dp where this is just representing
where p for pink and p for
probability, where is the pink ball?
And so that's something we
can write down easily because
f of p is just 1, because we're
assuming it's uniform between 0 and 1.
Probably x equals k given p.
That's just a binomial because we define
success for the light balls; success
is defined as being to the left of the
pink ball, failure is being to the right.
So we just want to count
the number of successes.
That's simply a binomial and
choose P to the k 1minus
p to the n minus k dp.
Notice that's the exact
same integral we had.
We changed the dummy variable, x to p.
On the other hand,
we're running out of space here.
But, on the other hand we can
just write down the answer,
because from this second story,
this is story one, this is story two.
Story two I first threw the balls down,
paint a random one pink equally likely.
So it's equally likely that there's any
valid number of white balls still left
the pink one anywhere.
So all numbers from zero to n
integers are equally likely, so
it must be one over n plus one.
So that integral is just
one over N plus one.
And you can write that down just by
looking at the picture without actually
doing integration by parts.
Or expanding things out term
by term all that stuff.
So that argument gives us the normalizing
constant for some cases of the beta.
Next time we'll do the general normalizing
constants see some of the connections to
other distributions.
We have one other very scary thing for
today which is finance.
We have a special guest today Steven
[INAUDIBLE] who teaches STAT 123,
just to briefly introduce him.
123 has gotten better reviews from people
who've talked to me about than almost any
other course I can think of.
A lot of students who told me that when
they went and did finance internships,
they felt way better prepared than their
fellow interns because of having had 110
and 123.
As kind of a unique,
just to tell you very briefly a little
bit about Steven's background.
He got his PhD from
the stat department here,
so that's a proud claim to fame for
our department.
He then taught at Imperial College,
London.
He has many years of experience
working on Wall Street.
And now he's Managing Director at
the Harvard Management Company,
which is responsible for managing a large
portion of the Harvard Endowment.
I think it's definitely
true that at Harvard,
you will not find anyone else with
the same combinations both of
practical experiences as
well as academic training.
Just combining actually
been on Wall Street for
years as well as having
a PhD in statistics.
You won't find that in
any other Harvard course.
I don't actually think you'll find that in
any other course in the entire country.
So we're lucky to have that course,
STAT 123.
Which is not offered this spring,
but it will be offered next year.
And so, Steven didn't wanna miss out
on your group, so, take it away.
Let's all welcome him, and
let me give you the microphone.
>> [APPLAUSE]
>> All right, well, thank you to Joe.
And thank you all for giving me 20,
25 minutes of your time.
It's always a pleasure
to talk to Stat 110.
I'm here for
a couple of reasons really, firstly
STAT 123 that Joe mentioned is
a reasonably new course here at Harvard.
I'll giving it for
the third time in 2012, 13.
So wanted to give you a little
bit of flavor of what it's about.
But secondly, STAT 123 has one
prerequisite, which is STAT 110.
So you all here are my eligible
receivers so to speak, or
eligible to take the course, and I wanted
to give you a little bit of its flavor.
I apologize for the seniors in the class,
if there are any that won't be
able to take the course, but
I hope the next 20 minutes might even for
them be intellectually appealing.
There are gonna be a couple of brain
teasers that might make you think a little
bit about probability and
its relationship with the financial world.
Probability is, to borrow a phrase,
the soul of finance.
It's the theory that underpins
most of quantitative finance and
hopefully I will show a little
bit about that today.
So for those of you who enjoy this
course and want to see what I think
is an intellectually
appealing use of probability.
Which is also very real world oriented.
That is what STAT123 is about.
Just one further word on prerequisites.
STAT 110 is the only prerequisite.
You don't need to know anything
about finance to take STAT 123.
The whole of [INAUDIBLE]
finance was essentially built
by people who knew nothing about finance,
and still don't really.
It was built by a pro-bliss mathematicians
and statisticians and physicists.
A lot of physicists who moved to finance
when the original super collider got
cancelled in 1993 and
they all went to Wall Street and
built quantitative finance.
So you don't really need to
know anything about finance.
Obviously if you're interested in a
financial career it may be appealing, but
many students flourish in STAT 123 without
any interest in a financial career.
So I've had people who wanted to go to
Julliard or play professional soccer,
but are just interested in some
ways a new way of thinking,
a new way of reasoning about uncertainty
which uses probability to a great degree.
If finance, you have a visceral dislike
to finance, that's totally fine.
But probably is not worth taking
STAT 123 if that's the case.
I do get a couple of comments about STAT
123, finance is all about making money.
I don't want that to be, of course.
That's what finance is about
[LAUGH] it's about making money and
the financial world.
So if that's got a visceral
dislike I wish I,
totally empathize with, perhaps not taken.
So let me tell you a little
bit about the course.
Let's write this down.
Applied Quantitative Finance
on Wall Street.
I'm thinking about dropping
Wall Street for 2012, 2013.
A natural subtitle
actually to this course.
So there are many pejorative
words in this title, but
a natural subtitle is applied
Financial Derivatives.
This course is very
much about derivatives.
So two natural questions might
what is a financial derivative and
why is this a probability course.
Why is this a stat course?
So those are two natural questions
Hopefully you can answer them together.
So a financial derivative has
nothing to do with calculus.
Or at least its definition has
nothing to do with calculus.
There's some calculus involved in
thinking about and valuing derivatives.
A financial derivative,
the term derivative means derives from.
So a financial derivative is a contract so
it's a bet, an agreement,
between two people whose payout at
some maturity date is a function of,
or derives from,
the value of some other Random variable.
So for instance,
if I enter into a contract with Professor
Blitzstein where he pays me $1 if
the total snowfall in Boston is
above 80 inches this winter, okay?
That is a weather derivative, right?
The $1 is a function of
some other random variable,
in this case the number of
inches of snow in Boston.
That's simply what a derivative is.
In financial, usually means
the underlying random variable
that the payoff is a function of,
is a financial asset.
It's the price of a financial asset.
And typically written s.
So in finance,
random variables are s rather than x.
Okay, cuz s for stock.
So if you think s sub t is just the price
of, say, Google stock in one year,
that is a random variable, okay?
We don't know what it is, it'll take
potential values in a year's time, okay?
And a financial derivative is
simply something that pays off
a function of that.
Okay?
For instance, g of s might be
the indicator function above 500.
So Google is above $500 in a year's time,
you pay $1, or
you get $1 from your contract, okay?
You might, well,
who cares about contracts like this?
And we'll see particular forms of g,
so a little later.
Well, g of s, right?
A function of a random variable
is another random variable, okay?
So we're gonna get caught up in,
or use random variables,
and it turns out, and
this is one of the key results in finance,
it's called the fundamental theorem of
finance, because it's so fundamental,
is the price that you would pay for
a derivative contract.
So even just think about
this whether contract.
How much would you pay for that?
Well, it'd be related to the probability
of getting more than 80 inches of snow,
right?
Can be related to the distribution
as a random variable.
And it turns out, what you'd actually
pay for the contract is very,
very closely related to
the expected value of g of s, okay?
All right, in fact,
there's a very powerful
result that says if I choose
the random variable correctly and
choose the distribution correctly,
it is exactly an expected value.
So expectation is gonna be
run throughout Stat 123, but
you know how to do this
if you know the PDF of S,
this is just what I like to call POTUS,
I mean LOTUS, right?
I always get those confused, but
it's just the integral of g of s f of SDS,
okay?
All right, so LOTUS is gonna play a major
part in quantitative finance, okay?
Okay, That's not all there is to finance,
but that's pretty much.
When I arrived on Wall Street after, I
actually TF'd STAT110 here many years ago.
I loved it.
It was about 25 people in the class then,
so it was easier to TF.
Was a great class.
I arrived on Wall Street.
And the first day I was on Wall Street,
I was asked to calculate
the expected value of a function.
It was a bivariant normal.
It was a function of bivariant normal, and
they thought it was
an incredibly hard problem.
It's not trivial, but I thought,
I can do that, because I knew how
to calculate expected values.
I did that my first morning.
And they thought, wow, this guy's really
good, he knows all about finance.
But all I knew about it
was bivariant LOTUS.
And that, that stood me in good stead for
the rest of my career.
So, all right, here are a couple of,
I won't call these brain teasers,
cuz these actually, these examples
are also interwoven with Stat 123.
So I hope if you take them,
this lecture will echo back and
you'll think, yeah,
I remember that teaser.
If you have any questions about
the course, before I get on to these,
please feel free to e-mail me and
belie that stat.
And two of your TFs in this class, Bo and
Jessie, have either taken the class or
TF-ed it or both.
So feel free to ask them about it.
Obviously, I've chosen them
cuz they enjoyed it a lot.
Okay, so here is a couple of,
here are a couple of brain teasers.
First one comes from foreign exchange.
So foreign exchange is a fairly
straightforward concept.
So how much is a Euro worth in dollars,
right?
So one Euro trades in Europe
is worth about $1.40 now.
That moves up up and
down its sort of a spastic process.
And one pound which is close to my
heart that's worth about $1.60.
These are things that people care about.
Obviously, if you're doing traveling or
if you're doing trade.
And one thing you might care about
is where are these things gonna
be in a year's time, all right?
How can I predict where, what the value
of, say, a Euro would be in a year's time?
Okay, it's obviously of
interest in the economy.
You might think if you could
actually do that prediction,
if that prediction was actually possible,
then the person who could do that would be
phenomenally rich and would be infinitely
wealthy and would have retired, right?
So you might want to think it is
probably impossible to predict?
But anyway, so people try to, right?
And you can do it in
lots of different ways.
Build time series models,
you can think about economies.
But one thing you can do is just build
a little probabilistic model, and
that's what we're gonna do here.
Very straightforward one,
which takes two states of the world.
It's called a binomial model,
with two states of the world.
And so let's start off with,
so this is Foreign Exchange,
known as FX in the business
foreign exchange.
We're gonna take a very simple world.
We're gonna say where, we're gonna
think about the value of 1 Euro, okay?
Okay, let's assume that
right now it's worth $1, so
the price of 1 Euro is $1,
just to make the math easy.
So currently, there's $1.
Let's write that down here.
Per euro.
Okay, the actual number is about 140 for
those here.
And I'm gonna say where do
you think we think that is?
Where do we expect this
to be in a year's time?
Okay, and we're gonna build a very
simple probabilistic model.
We're gonna say there
are two possible outcomes.
The sample space in a year's
time has two states.
One where the euro becomes worth more.
Okay, so let's just say it's gone, it's
not worth $1.25, so Euro's worth more.
And one where it's worth less.
Let's say $0.80.
Okay, that's my, that's a random variable.
It's got two states of the world.
I'm actually gonna assign
probabilities to it.
Let's make it easy,
just say half and half.
All right, okay, there's a pretty
straightforward model, don't you think?
Might go up, might go down.
Okay, all right,
now we've gotta ask the question where
do we expect it to be in a years time?
Well, we can do expected value,
expectation very easily.
So the expected value of
a Euro in a year's time,
which I'll just write
expected value of Euro.
Right, so half times $1.25 plus
a half times $0.80 cents, okay?
So that's easy.
That's $1.025.
So this is where I expect
the Euro to be in a year's time.
It will have appreciated on
average to a $1.025, all right?
Okay, so
that's pretty straightforward expectation.
Okay, so that's what we expect for
the Euro, okay?
So, well,
know what $1.00 must be worth, right?
So $1.00 must be worth 1 over that, right?
Which if you do, that is 0.9756.
Right?
Okay, so
that's straightforward also, right?
If a pound is worth $2, $1 is worth 50p,
pence, okay, that's true.
Right, so if a euro is worth a $1.025,
$1 is worth 0.975 euro cents.
They're called euro cents also.
All right, so
that's a foreign exchange prediction,
that seems pretty straightforward,
so we're done.
But before we move on,
let me just restate this model, all right?
Well, pink chalk would be
good at this stage, I think.
I haven't had pink chalk before,
let's see if we can use this pink chalk.
Let's just restate the model.
Well, $1 per euro,
that's the same as one euro per dollar.
So let's just restate this model in terms
of the value of a dollar at each state.
So this is 1 euro per dollar, okay?
Up here, well it's $1.25 per euro,
well that's 80 euro cents per dollar.
Okay, and then this one is 1 euro and
25 euro cents per dollar.
Okay, same model, right?
Haven't done anything tricky.
All right, now what do we expect?
Let's just calculate the expected
value of a dollar onto this model.
Well, the states are still a half, so
just calculate that, that's easy, and.
Hang on a sec, I've predicted
the dollar to be worth more now.
$1 is now expected to be 1 euro 2 and
a half.
And so, you're just walking into your
boss's office and say, I've got this
good model that predicts that the euro
is gonna depreciate to $1.025.
And, but, hang on a sec,
the model also says,
the dollar will appreciate to 1 euro,
2 and a half.
So if you did that, of course,
the dollar will appreciate and
the euro will appreciate.
You'd guess the response is you're fired.
>> [LAUGH]
>> Now, how do we solve this?
All right, that's the first one.
Simplest probabilistic model you could
ever have, right, a binomial state.
And we've already come across
our first brain teaser.
And what's actually probably more
intellectually, troubling exactly, I found
intellectually compelling than that, it's
very simple, is that the solving this,
actually understanding what's going on
here is actually quite non-trivial.
And there we attack it two or
three ways in STAT123.
So, that's the first one.
Okay, so second one is TARP.
So this is, well,
I said this course is applicable.
Hopefully, this just gives some flavor of
how this course is applicable about what
really goes on in the financial world and
whether you're interested in it or
not, it certainly is an important
part of the US and the world.
So TARP, back in 2008, for the seniors of
you who arrived here in September 2008,
October 2008,
you know that was not a happy time for
the financial system or banks, or
the Harvard endowment for that matter.
So obviously,
a lot of things were going on in 2008.
And there's a great Econ course
called The Financial Crisis,
which I recommend to those
of you interested in that.
So what I'm gonna do is
I'm gonna focus in on one
particular part of the financial crisis
that was going on in October 2008.
Now this was called TARP.
TARP initially stood for
Troubled Asset Release Program.
In this midst of all
this financial crisis,
there was a transaction where
the US government, i.e., taxpayers,
receive from these private
institution warrants, okay?
Okay, now warrants has nothing to do
with being arrested or the legal system.
A warrant is a type of financial
derivative, okay, called a call option.
So this is their equivalent to
what's called a call option, which.
Okay, and what a call option was or
is, is the following.
So let me just write down what
the government actually had.
So the US government.
Paid.
I've rounded the numbers,
but $450 million.
To Goldman Sachs, GS.
And in return, okay,
US government had the right,
the right is not the obligation,
it's the choice.
It's the option,
to buy Goldman Sachs shares.
$425.
In 10 years time.
Okay, simple as that.
Okay, they paid and they had
the right to buy, it was 10 million.
So they had the right to buy it for
$125 each, okay?
At the time, so this was in October 2008.
Goldman Sachs' shares were trading.
Were $95.
Okay, all right, so you might ask,
well, what was the government
doing paying all that money to
buy Goldman Sachs shares for
$125 when they could have just gone
out and bought them for $95, right?
Okay, well, let's just think what
the government has in 10 years time, so.
Okay, if Goldman Sachs' shares,
in ten years time, are at $150.
What's the value of one of these warrants,
one of these options to the US?
$25, right?
Okay, cuz I can buy something that's
worth $150 by paying $125 for it, so.
Okay, if Goldman Sachs is trading at $100.
What's the option worth?
0, right?
Okay, so
we can begin to see this function,
this g function taking shape for
this warrant,
which is same as a call option,
just terminology.
All right, this g function.
Is max of the share price at maturity T.
Here, T is ten years.
And what we'll call K,
K is, in this case, $125.
And K is called the strike price, okay?
And 0, all right?
So here is a particular form of g, okay?
The payout looks like this.
So this is g(s).
Okay, so if I know the distribution,
the PDF of S in 10 years time,
I know the price of this warrant,
all right?
So the warrant, the price,
my board management not very good, is.
Just a LOTUS, okay?
So two things to point out,
one is the government
decided that the cost of
one of these options,
if you work it out, is $45 per option.
Where did that come from?
Is the US government in
the business of calculating
probability distribution,
probability density functions?
Right, so that's one question
which we'll talk about in 123.
The other thing just to note
is the solution to this.
It's just a LOTUS, right?
If you think about it, it's actually not
that tricky cuz you can just replace the.
Limits of integration,
get rid of your max, right?
Okay, such a fairly
straightforward integral.
The solution of that integral under
particular form for the probability
density function is a very famous formula,
the Black-Scholes formula, okay?
So I like to think that you
can get a Nobel Prize in
economics just for knowing LOTUS.
Because that's what
the Black-Scholes formula is, okay?
So thank you for your time,
appreciate you coming here.
[APPLAUSE] And I hope to see you,
I hope to see some of you next year.
Thank you.
