So there's a couple more
important distributions.
That's the main topic for today.
Then we'll start Markov chains next time.
All the remaining distributions are what
I would call off shoots of the normal.
That is their importance is inherited from
the fact that the normal is important,
they're all defined in terms
of normal distributions and
their stories are tied to
the stories of the normal.
Cuz they're just,
basically we do different things
with normal random variables.
And the normal is so important,
we saw the central limit theorem last
time which is one reason that
the normal is important,
but there are a lot of other
reasons the normal is important.
And so you do things with the normal,
then there's a good chance that
that's also important, okay?
So the first one is called
the chi-squared distribution.
That has one parameter, so it's chi,
the Greek letter chi looks
like a fancy x squared.
Chi-square of n.
There's one parameter, n,
which is called the degrees of freedom, or
in words it's just chi-square.
It's an extremely famous
distribution in statistics.
Some of you have probably
heard of chi-squared tests or
maybe even used chi-squared tests,
things like that.
So we're not gonna derive
chi-squared tests in this class.
But you would see that in a lot
of other stat courses and
other non-state courses too.
The chi-square test has become one of
the most famous statistical methods
in psychology and sociology and
all over the place, it's used everywhere.
I mean, you can get a computer
to do a chi-square test for you.
And in some sense they're
really over used for
reasons that are not important to get into
in this class but to really understand
what it's doing, you have to
understand chi-squared distribution.
So we're talking about as
a distribution probability but
it's used all over
the place in statistics.
So the definition,rather than
defining it by writing down the PDF,
which would tell you nothing
about where it comes from,
I wanna define it in terms of how it's
related to the normal distribution, okay?
So let's say, V equal,
let's say Z1 squared plus Z2 squared,
plus blah, blah, blah, plus Zn squared,
where the Zj's are iid standard normal.
That's the definition of how we
get a chi-squared distribution.
So then, by definition,
just to write that down, by definition,
this is chi-squared, and
we would just write V is chi-squared of n.
So sometimes the n gets subscripted,
sometimes it's written like that,
it's the same thing.
So all it is is the sum of squares
of N iid standard normals, okay?
And this comes up a lot in statistics
because a lot of statistical methods
somehow involve adding up
squares of things, right?
And if those things happens to be iid
standard normal then you're gonna get
chi-squared.
So it comes up everywhere.
All right, well,
that's all as far as the definition.
But I wanted to show you,
how does it connect with other
distributions we've done.
So I'll just call this
fact chi-squared of 1,
chi-squared of 1 means we just take
one standard normal and square it.
Chi squared of 1 is the same
thing as gamma of 1/2, 1/2.
That's not something that you
can easily see in your head.
But it's a calculation that you
could all do it at this point.
It's good practice, so
I'll let you do this.
There's just a change of variables thing.
The reason that I'm not gonna through
the algebra of this is because you had
a homework problem,
where we took a standard normal and
raised to the fourth power and
you found the PDF, right?
Now this is like an easier
version of that problem.
It's the same idea.
The only thing you have to be careful
about is because it's squared then you
have to deal with the fact that
the function, y equals x squared,
is decreasing and then increasing, right?
So you can't just plug into
the change of variance formula.
As long as you're careful
about that fact though,
it's just an easier version
of that homework problem.
You can just do the calculation,
get the PDF.
Same as on the homework, so
if you understood that homework problem,
then this is easy to check.
And you just get the PDF.
And you'll see,
that's the same as the Gamma 1/2, 1/2.
So that's a useful fact about
the chi-squared because we
already know a lot of nice
things about the Gamma.
We know how it relates to the beta.
We know how to add up Gammas.
And we know how to get moments for
the Gamma, things like that.
We can borrow all the stuff we know
about the Gamma distribution and
just use that for free without
having to re-derive everything.
So that's very helpful
in studying chi-square.
So that's Chi-Square of 1.
So assuming this fact which you can just
check, like on the homework problem.
Now if you want Chi-Squared
of n that just says we're
adding up n, iid Gamma 1/2, 1/2, right?
And remember, we also talked about the
fact that if you have a gamma a lambda and
gamma b lambda and
they're independent and you add them,
you get gamma a plus b lambda, right?
So therefore if you had and
n of them, and the lambda is 1/2,
that's the same for
all of them, same scale.
Just adding 1/2, so
it immediately follows from this that
chi-square of n is the same
thing as Gamma n over 2, 1/2.
So this is really actually
not a new distribution.
It's a new name for something we
already knew, which is a gamma, okay?
So in a sense I don't really need to list
it separately on the table of famous
distributions, things like that,
it's a special case of the Gamma.
But it's used so much in statistics
because of adding up sums of squares,
it gets its own name, and
it gets its own entry in the table, okay?
So that is the chi-squared distribution,
that's our penultimate,
famous univariate distribution.
That univariate,
as in it's not multivariate.
We're gonna also do
the multivariate normal today.
This is just foreshadowing that, we talked
about the multinomial distribution, right?
Which was the generalization of
the binomial when you have more than two
categories.
So the multinomial is the most important
multivariate discreet distribution,
all right?
It's discreet,
it's a generalization of the binomial.
Multivariate normal which we're gonna
do later today is the most important
multivariate continuous distribution.
So the question is what's
the natural way to extend normal
distribution into higher dimensions
where you have a vector rather
than you have just a random variable
you have a random vector, okay?
And so we'll talk about that later.
But first, we just need one
more univariate distribution
which is the famous
student t distribution.
A lot of students have wondered
why it's called student and
the reason is that it was discovered and
first used by a guy called Gossett.
Well, the fact that his name is Gosset
doesn't tell you about why it's called
Student then.
The reason,
this was around 1908 That he first
introduced this in a very
influential paper.
Gosset was a statistician who was working
as a master brewer for
Guinness, that beer company.
And he was an excellent,
excellent statistician working for
Guinness a little over 100 years ago.
And a lot of people think that, and so
when we was publishing his papers
he used the pseudonym student.
And a lot of people think that was because
he didn't want Guinness to know, but
actually Guinness was very
supportive of him as far as I know.
But Guinness did not want the other beer
companies to know they had a statistician
working for them.
That was like their secret weapon.
So he published his papers
using the name student,
was just a pseudonym he came up with.
The letter t just because it became
standard to use the letter t for
a certain statistics.
Those of you that have
seen t statistics or
t tests before it just got called t for
some reason and that caught on.
Okay, so
what he introduced with the t test,
which is based on the t distribution,
which we're about to do.
100 years ago,
just over 100 years ago, but
since then, since it caught on,
it's been extremely
widely used in statistics and
it still is to this day.
So I just wanna say a little, we're not
gonna go through a lot of the statistical
side of how do you use t test and
one sample t test and
two sample t tests and things like that.
Some of you have see that but
I'm not assuming that you have.
But the reason, the mathematical basis
behind those things is the t distribution.
So we're gonna talk about
the t distribution, okay.
So again, you could write down the PDF of
this thing, but it's just this complicated
looking thing and it would give you
no idea of where does it come from.
So again we're gonna define it by relating
it back to normal distribution, okay.
So we're going to say that let Let T,
call it T because it's
gonna be a T distribution.
Suppose we had a random variable
T that's of the form Z over
the square root of V over N,
where Z is against standard normal.
And V is chi square of N,
and they're independent.
So Z is independent of V, all right?
So if we have a random variable
of this form, we say that,
that follows a T distribution.
And we'd write that as T,
distributed as t sub n.
Lowercase t is the name
of the distribution,
we don't necessarily
always write the student.
We just say t distribution and
degrees of freedom.
Degrees of freedom is just mysterious
at first sounding name for
what the parameter is.
It's kind of a deep question like,
what do degrees of freedom really mean,
when people talk about degrees of freedom.
But for now you can just think
about it as the parameter
which goes back to the chi-square.
It's just,
how many normals squared did we add up?
We call that the degrees of freedom,
that's the parameter.
On top it's just one standard normal.
But here we have a chi square of n.
So that's where the parameter comes in.
All right, so that's the t distribution.
Let's just mention a few properties.
That we can see,
we could get the PDF of this thing by
doing an ugly Jacobian
type of calculation.
But there's no need for
that for our purposes here.
I mean, I'm not saying you would
never need the PDF of this thing.
But for the things that make it
interesting for our purposes
we can just think of it in terms of this
representation, in terms of normals.
Okay, so first of all, let's just stare at
this and see, what can we say about it.
First of all,
it's a symmetric distribution.
That is.
If we multiply by minus 1 it
has the same distribution.
So it's metric about zero.
And you can see that just
my multiplying by minus 1.
And we know that the Z is symmetric,
right?
Normal 01 is symmetric without 0.
We put a minus up here, that doesn't
change the distribution on top and
the bottom is just some independent
thing that hasn't changed.
So, it symmetric,
let's look at the case, n equals 1.
If n = 1 then we just have the square
root of a squared normal, so
that's the absolute value of a normal.
So what we have in that case is a ratio
of independent standard normals
with an absolute value sign on the bottom.
But the absolute value sign on the bottom
does not affect the distribution.
Again because of symmetry.
So, if n = 1, that's really just the ratio
of two independent standard normals and
that's the distribution
we saw before the Cauchy.
So, this is a generalization
of the Cauchy.
The t one is the same as Cauchy.
Remember we did that
Cauchy interview problem,
we derived of the PDF of the Cauchy
if you think about that.
So, this generalizes that.
And in particular, the mean doesn't exist.
We were talking about
the evil Cauchy before.
Cauchy does not have a finite expectation.
So, the t distribution with one degree of
freedom does not have a finite meaning,
the mean doesn't exist.
But on the other hand
if n is at least two,
then by symmetry,
I'll expect a value of equal zero.
Symmetry is one way to see it, but
let's also do just a quick calculation for
why the mean is zero.
So you had another homework problem
pointing out that in general you can't say
E of this ratio, in general that's
gonna be E of this over E of that.
Right, but
it is true that we can write this is E
(Z) times E of 1 over
square root V over n.
That's valid because this poisso and
this poisso are independent.
I assumed that they're independent, right?
They're independent, therefore,
they aren't correlated, so we can do that.
And we got zero.
The reason this would break down in
the case of n = 1 is that this term
wouldn't exist.
And you can't say that something that
doesn't exist times 0 we can't say that,
that's 0, right?
So that's the problem, okay?
But as long as this part exist
which it will if n leads to then 0.
You can also just see that had to be
true by symmetry the mean is 0, okay?
And so, it doesn't have an MGF it
doesn't have all of its moments.
Like the T1 doesn't have a first moment.
The T2 won't have a second moment.
The T3 won't have a third moment.
It goes like that, so there's a limited
number of moments you can take.
And actually, if you want to get,
so I did this for, I did this for
a first moment, but
if you wanna get a higher moment,
if it exists, all you have to do
is put this T to a power, right?
And then just put the power here,
power here.
We get all that from things we know,
then, because we know
all the moments of the normal, and then
this one here, you raise that to a power.
It looks ugly the way I wrote it, but
if you just imagine raising this thing to
a power that's just a power,
there's some constant you can take out,
aside from that it's just a power
of a Gamma distribution, right?
And powers of Gamma distributions,
it's easy to get the mean using LOTUS,
so that means we can find
the moments that exist.
It's not true that all
of its moments exist.
But the ones that do,
we can find it in an easy way using this.
The odd moments are gonna be
0 if they exist by symmetry.
If you put Z cubed here,
then that's gonna be 0.
But for the even ones,
we talked before about the even
moments of the normal and then this.
Actually, here's another way to get the,
this is just a reminder of
the even moments of the normal.
E of Z squared is 1, because that's
the variance if Z is standard normal.
E of Z to the fourth is 3,
E of Z to sixth is 3 times 5,
and then it goes up like that.
Where your multiplying,
this is 1, 1 times 3,
1 times 3 times 5 it's a skip factorial.
We proved that before using MGF, right?
But here's another way
to look at this problem.
If we wanted, so we used MGFs before.
Another way would be
to relate it to Gamma,
by relating it to chi-squared.
So if we have the expected
value of Z to the 2n,
that is we want the 2n
moment of a standard normal where
n is a positive integer, well,
we could just think of that
as E of Z squared to the n.
That doesn't take much algebra.
But the point of writing it
this way is now we recognize,
Z squared,
that's just chi-squared of 1, right?
But chi-squared of 1 is the same
thing as Gamma of 1/2, 1/2.
So all we really need is the nth
moment of a Gamma of 1/2, 1/2,
then you can just use LOTUS.
So we did a calculation before using LOTUS
to get the moments of the Gamma, and for
practice, doesn't matter if you
remember that or memorize that.
But you should be able to do the LOTUS and
the Gamma function it's really
easy to get moments, right?
The Gamma distribution it's easy to get
the moments because you just go X to
a power times this Gamma thing, but
then it still looks like a Gamma, right?
So as long as you have the pattern
recognition of the Gamma integral and
you write down LOTUS you can get this.
Okay, so if you do it this way,
you're gonna get something in
terms of a Gamma function.
But you can show using properties
of a Gamma function that is in
fact equivalent to just
multiplying odd numbers, okay?
So, all right, that's how we can get
the moments of the T distribution
up to the point where they stop existing.
Okay, and other properties.
Well, one reason the T is famous is that
it looks like the normal distribution,
approximately, but it's heavier tailed and
the easiest way to see that is
just to see it by trying it out.
Plot what the density looks like,
or generate some values.
So heavier tails meaning that more
extreme values are relatively
more likely than they would be for
the normal.
And heavier tails than normal.
You can measure that in terms
of kurtosis for example,
which is on the homework, you've just
turned in, or just by looking at it.
That's especially true if n is small,
like in the Cauchy has very heavy tales,
the Cauchy density was a constant
times 1 over 1 + X squared.
As you let X go to infinity, the Cauchy
is decaying like 1 over X squared, right?
Try comparing that with the normal.
The normal you have E to
the minus X squared over 2.
And the normal is decaying much, much,
much, much faster than a Cauchy.
But on the other hand to
relate it more to the normal
if n is large like let's say n is
you know, 30 or 40 or 50 or larger.
Then, it really just look like a normal so
for n large tn looks very
much like standard normal.
And when I say looks very much like,
I'm using that kind of intuitively.
But if you want a mathematical statement
of it, let's just take the limit.
So math statement would be that
the distribution, Either the CDF or
the PDF, either way, will converge to
the distribution of standard normal,
if you let N go to infinity.
And to see that, well, you could do a big
calculation, but it's not necessary.
But the best way to see that is
to use the Law of Large Numbers.
So let's actually check
that fact over here.
Okay, so now let's imagine.
To make it clear, I didn't
indicate the dependence on N here.
To make that clear because we're taking a
limit, let's consider a sequence of them.
So let's let Tn = z/square root of Vn / n,
where let's say we have an infinite
sequence of i.i.d standard normal.
Okay, so we have all these i.i.d standard
normal and what do we do with them?
Well, that's supposed to
be a chi-squared of n,
so Vn is the sum of squaroots of normals,
okay?
So I'm just generating a T distribution
with n degrees of freedom.
May as well just use the same Z for
all of them, right?
Those are independent of, so
Z is also standard normal.
And it's independent of the Zj's.
So this is a way to construct
a t-distribution with n degrees of freedom
just by the definition over there.
I took a normal,
divide by this kind of squared thing.
Okay, but I chose to define it,
because I'm just looking at what
happens with the distribution.
I'm saying, what happens to the
distribution of this as goes to infinity?
We can choose whatever T
random variable we want,
as long as it has the correct
distribution, right?
Okay, so let's choose it this way.
Using the same Z each time.
That makes it easier, right?
Same Z.
Then what happens?
Let's look at this thing Vn/n,
what do you think happens to that
as you let n go to infinity?
It goes to a constant, which constant?
Converges, well,
if I took a sample mean of these,
and I look at the, yeah, so
you can think of this as
the sample mean of these squared normals.
So its distribution.
If we standardize this,
then distribution will go to normal zero,
one after we standardize it.
But I just want something
simpler than that.
What does it converge to point wise?
It converges to one.
Why'd you say one?
From this?
Using what theorem?
>> [INAUDIBLE]
>> Law of large numbers.
CLT will let us get
the distribution of this.
But I just wanna say what
happens to this point wise?
That is if we evaluate this at
each point in the sample space.
So this goes to one with probability one.
By the law of large numbers,
because we're taking
these NIID squared normals and
we averaged them, right?
And the law of large numbers says that
the average of the sample converges
to the true theoretical average.
The true theoretical average is just
the expected value of one of these, right?
They're IID.
Numbers says this will converge to
the theoretical value e of z one squared.
Which is just one.
That's the variance of z one.
So this thing goes to 1,
the probability 1.
There is a square root there but because
this is just a point wise statement,
this converges to 1 if take square
roots it still converges to 1.
The square root of 1 is 1,
with probability 1.
Okay?
So now we're letting n go to infinity, but
the denominator is just going to 1,
so this sequence,
Tn, is just converging
to z with probability 1.
That's telling us that these random
variables converge because I constructed
it this way.
I could have constructed the t
distribution some other way.
I didn't have to use the same Z and
the same Z1 squared, and
keep recycling things this way.
We could have chosen some
other t distributions, but
the distribution, so this point wise
statement might not be true anymore, but
the distributions are still
behaving this way.
So what that says is that the t
distribution with n degrees of freedom
converges.
Now I'm talking about the distribution,
not the random variable.
Converges to a standard
normal distribution.
So if N is very large, then you're just
thinking of this denominator as being
essentially 1 by the law of large numbers.
And so then what matters
is the normal on top, okay?
So if you have a large number
of degrees of freedom,
it just looks like
the normal distribution.
Okay, but with the small number
degrees of freedom then, you know,
it looks sort of like the normal but
it'll have a much heavier details, okay.
All right, so
that's really all you need to know as for
univariate famous named distributions,
okay?
So last topic for today, and
our last famous distribution
is the multi-variant normal.
So what we want to do is extend some of
the nice stuff for the uni-variant normal.
Right, normal has a lot of nice
properties, very, very useful.
But what if we have a random vector,
not a random variable?
Okay, so, there's one case where it
would be really easy to construct
a multivariate distribution,
which is if we have independent
random variables, right?
If we have independent,
let's say we have IID random variables
they stick them together into a vector and
they have a joint distribution.
But that's the simple case where
the joint, and if it's continuous,
the joint PDF is the product of
the marginal PDFs and you know.
The more interesting cases as far as
a multivariate perspective is when
there's some correlation, okay?
And it turns out that for
the normal there's a very,
very nice standard way to
extend it to higher dimensions.
And there's different
equivalent ways to define this.
So we want the multivariate normal.
Which I sometimes abbreviate to MVN.
Well, we have to start by defining it.
And there is a different
ways to define it.
But I think the nicest way for our purpose
is defining terms of linear combinations.
That is every distribution
we're talking about today.
The way we're thinking of it is
by reducing it back to stuff we
already know about the normal
distribution, right?
Chi-squared we defined
in terms of the normal.
T was defined in terms of normal and
chi-squared, but
in turn,
that's defined in terms of normal.
And we're going to find multi-variant
normal in terms of just normal, right.
Everything is based on the normal
distribution today, okay?
So if we have a vector,
a random vector just means that we have
a bunch of random variables that
we strung together into a vector.
So let's say we have a random vector.
Let's say x1, x2, xk,
which maybe we'll just call that vector x,
okay?
And the definition is
that this is multivariate
normal if the following is true.
If I want to reduce this back
down to one dimension, alright?
Reduce it back down to
the case we're familiar with,
which is the one dimensional case.
A natural way to do that is just to
take a combination of the components.
So if every linear combination.
A linear combination,
if you haven't had linear algebra,
linear combination just means add up these
variables with any constants in front.
So a linear combination just looks like,
let's say t1 x1+ t2 x2,
that's just called a linear combination
if you haven't seen that before.
So I took a combination
of the random variables,
t1 through tk are just arbitrary
constants, any constants at all, okay?
If this is always normal.
Every linear combination.
So if you can choose even one choice for
the T's for
which this fails to be normal,
then it would not be multi variant normal.
But if no matter how you
choose these constants T,
this always has a normal distribution
then we say this is multi variant normal.
Okay, so see how this reduced
it back down to one dimension?
Now this is just a random variable.
Okay, so
we've reduced it back to familiar things.
So for a couple quick examples, let's do a
quick example that is multivariate normal.
And a quick example that
is not multivariate normal.
So suppose that we started with,
let's say, Z and W be IID.
Standard normal,
then of course it's gonna be true that Z,
W is multi-variant normal because
they're just independent.
But it's more interesting if there's
some correlation between them.
So for example, what if we did, Z plus 2W.
I'm just making up some numbers here.
You can put whatever constants you want.
I'm just wanna doing a concrete example.
3Z, let's say we did 3 Z + 5W.
I just made up some constants here,
is multi variant normal.
So this is a vector now,
I took these IID standard normals and
I just added them with some
constants I just made up.
Of course I can put ABCD
here that will be true,
soon I wanted a little more concrete.
Okay so, and why is that true?
Well, to verify that this is true all
we need to do is to say if we take
any constant times this plus any
constant times this it is normal.
So if I take,
let's call the constants S and T.
S times Z+2W, + t(3Z + 5W)
I just took two arbitrary constant
in front s and t or any constant, and
rewrite this in terms of Z and W.
That is something times
Z + something times W,
well that's just (s + 3t) Z + (2s + 5t)W.
And that's normal.
Because we already knew that if we add up,
right,
we already knew using MGFs and
previous stuff we did,
we already knew that if we add up
independent normals it's normal, right?
So this is one independent normal plus,
this is the sum of two
independent normals.
So we know that, that's normal.
So that's just an example, but that's kind
of an important general class of examples.
That we start with IID normals,
and then just put together a vector
of different linear combinations.
That will always be multivariate normal.
All right, so that's an example,
let's do a non-example.
I don't know if non-example
is the right word.
I wanna go as far away from
that example as possible.
I'm not gonna say counter example, because
it's not a counter example to anything,
it's just an example that's
not a multivariate norm.
Well sort of a counter example,
but I'll call it a non example.
Non example would be let
Z be standard normal and
let's let S be a random sign.
That is S is plus or minus 1 with equal
probabilities, and it's independent of Z.
Okay, so, Then Z and
SZ, S times Z is related very
closely to one of the strategic
practice problems from before,
where you can check for yourself.
Multiple by a random sign just
through the symmetry of the normal,
that's still standard normal.
So marginally,
these are our standard normal.
Marginally meaning you
look at Z on it's own,
look at S times Z on it's own,
they are just standard normal.
But the pairs Z, SZ is not
multi variant normal.
So put them together, on their own their
normal, put them together it's not.
The reason is that we could
just look at, look at Z + SZ.
If you want to test whether this
is multi-variant and normal,
according to the definition,
you need to check what if I take something
times this plus something times this.
Well, the easiest case of that
would be to just simply add them.
Z + SZ, there's no possible
way that that could be normal.
Because if you look at what happens,
half the time, S is -1 and you get 0.
The other half the time,
you get some continuous thing.
So this is actually a mixture
of discrete and continuous.
You're never gonna find a normal
distribution that equals
zero with probability equal to one half,
all right?
That is not a property of
the normal distribution.
So that is not multivariate normal.
And you can see that this definition,
if in
our definition if we had decided to just
say well multivariate normal just means
that you string together some normals
then we'd be including this example.
But this is a pretty nasty example right,
so we're avoiding things like
that by defining it this way.
So that's much better.
Okay, so while we haven't done the PDF or
the MGF,
the MGF of the multivariate normal,
we don't really need for this course.
I mean you can do some Jacobean and
work it out.
MGFs are much more useful for
our purpose, much simpler.
So let's derive the MGF,
this is a joint, the MGF.
So this is the MGF of
the joint distribution Of X.
So let X be multivariate normal, Okay,
now since this is multivariate,
when you define the MGF,
you do it like with
a constant times each term.
That is, in the univariate case,
we'd just do e to the tx.
In the multivariate case,
we have a linear combination.
Either the t, I'm gonna write it this way,
t prime x where
this just means take the vector of t and
do a dot product with the vector of x.
Another words to write
that out more long hand.
This is just the definition of MGF,
is that we need a constant for
each component.
So it's e to the t1 x1 + blah,
blah, blah, + tk xk, right.
Right, okay so, that looks like
it might be a complicated thing.
We're assuming we have
a multivariate normal.
And I'll say what is the MGF look like?
Well, it would be this, but
what can we say about this?
Well, it looks like this may be
very concrete until we think
back again to the definition we just did.
The definition said that, if we
are requiring to see multivariate normal,
then the thing up in
the exponent is normal, right?
So all we really have is the expected of
E to a certain normal random variable.
But that we can get from the MGF of the
normal in the one-dimensional case, right?
So we've reduced it back
down to one dimension, And
in particular, If you look up
the MGF of the standard normal,
let me just write it here for
reference's sake.
If, let's say, x is N mu sigma squared,
I'll just remind you of the MGF,
which we've done before.
The MGF of x,
the way I remember it is that it's
e to the power of the mean of
what's up here, which is t mu.
I just took the mean of this,
plus 1/2 the variance, so
that's just what the MGF works out to.
So the variance of this is
t squared sigma squared.
So that's the MGF of a standard normal,
would be the case if mu equals 0,
sigma equals 1.
That's the MGF of any univariate normal.
But if we know that fact, which we
did earlier, then we just know right
away what this is, by definition,
that's just e to a normal,
that's a special case of the MGA,
in particular if let, if you just,
instead you have an e to a normal,
you can just let T equal one, if you want.
Right?
You can let T be anything.
Okay, so to sum that up, we already know
without doing anymore calculations,
we already know this is
gonna be e to the expected
value of what's up here which is lets say
t 1 miu1 plus blah blah
blah plus tk miu k where
miu j is Let's just let
E of EX j equal mu j.
That's the mean.
So mu J is the mean of the J [INAUDIBLE]
component just viewed on it's own.
All right?
And then plus half the variants, so
that's E to this whole big thing,
one half the variants of this thing.
I'm just gonna write it as
variance of that thing.
So that's all up in the exponent.
Let me just write it as exp,
that means e to the power of this thing.
Okay, just the mean of what's up here,
plus 1/2 the variance and then to get
this variance thing, if they're dependent,
then we can add up these variances.
But it might be that there's some
covariances between them and
then we'll just expand this out in
a way we usually get variances.
So that's the MGF and from multivariate
distributions its true just like for
univariate that the MGF
determines the distribution.
Okay, more so
one more property of multivariate normal,
last fact about it is, very useful fact
about independence and correlation.
So, that's the fact that,
Within, within an MVN.
That is, if you have this multivariate
normal vector, and you kind of look inside
that vector, then it's true that
uncorrelated implies independent.
We saw before that, in general.
Independent implies uncorrelated, but
uncorrelated it does
not imply independent.
But a very important special
case where the converse is true,
is when you're working within a
multi-variant normal, implies independent
in other words just to write this out,
if we let X equal, so
that's our vector, but let's say we split
it up into like X 1 and X 2 MVN, so
we have multi variant normal,
and we split it up into two vectors,
then the statement is that
If every component of x1 is uncorrelated
with every component of x2,
Then they're independent.
So take any component of X1 and
look at the covariant between that and
any components of X2.
And if they're all zero then X1 and
X2 are independent.
So x1 is independent of x2.
So that would not be, if we allowed
weird examples like this one,
then that statement would
not be true because Z and
s times x z are clearly not
independent but they are uncorrelated
cuz if you compute the covariance
of this and this you get zero.
So this are normal and they are
uncorrelated that doesn't imply that they
are independent, but
with multi-variance normal that is true.
So for a quick example of this,
suppose we had x+y and x-y, so
let's let x,y be iid*N(0,1),
okay, and consider the sum and
the difference.
Okay so X+Y and X-Y,
taken together as a vector, that's
multi-variant normal We'd probably
call it bivariate normal in this case.
When it's two dimensional,
it's often called bivariate normal.
That's multivariate normal for
the same reason we just saw.
That is, if you take any consonant
times this plus consonant times this,
it's normal.
So therefore it's multivariate normal and
they are uncorrelated.
Because if we take the covariance of
x+y and x-y and just expand that out,
is the covariance of x with itself,
that's the variance of x.
And then it's plus covariance (x,y) but
from this one, but
there is also a minus covariance x and
y from those two so those cancel and
then there is minus covariance(y) with
itself, so that is minus the variance of y
so this cancels, and
because they both have variants one.
Or in general, if they both had the same
variance here, the variances cancel.
We get zero.
So they're uncorrelated, the sum and
difference of IOD standard
normal are uncorrelated.
In general uncorrelated would
not tell us independent, but
here because they're multi-variant normal,
we now know that X plus Y,
and X minus Y are independent.
That's another very,
very special fact about the normal,
it's very difficult to prove, but it turns
out to be true that if we did here I,
I, D Anything other than
a normal distribution here and
if it were true that this
isn't independent of that,
then actually it would have
to be a normal distribution.
So this is a very special property
of the normal distribution and
the proof is just from this
fact about multivariate normal.
Okay.
So to prove this fact,
you could just do a calculation with
MGFs that we don't have time for and
that you don't need to worry about.
Although you could do it for
practice with the MGF if you want.
But anyway,
knowing this result is very very useful.
Because it's often easier to compute
the covariants than to directly show
they're independent.
