So, the main topic for the next couple
lectures is continuous distributions.
We've learned about the binomial,
and the poisson, and
the hypergeometric, and so on, and
at this point we've covered all of
the famous discreet distributions
that we need in this course.
And no now is a good time to start talking
about the continuous distributions.
I like to do discreet before continuous,
because conceptually it's
simpler to think about discreet.
But it doesn't mean that
continuous is harder, necessarily,
because discreet is kind of
conceptually easier in a sense.
But, on the other hand, we have all
these nasty sums that come up, and so
we learn some ways to sometimes
avoid the sums using stories, and so
on, but sometimes you just have
the sum you can't deal with.
The continuous case, we'll be doing
integrals instead of sums, and even though
this sounds counterintuitive, in general,
it's easier to do an integral than a sum.
Although the same thing could come up,
we could be faced with integrals
we don't know how to do.
So again, we're gonna try to look for
kind of more clever,
and more conceptual ways to avoid having
to do lots and lots of integration.
But anyway, we'll come to that later.
But, a lot of the ideas
are completely analogous.
So, at this point, I'm assuming you
have a pretty good understanding of
what a PMF is, and
what is a discreet distribution?
What does it really mean, and the expected
value of a discreet distribution, and
now we're just gonna move
into the continuous case.
So, I think, and just for
having a big picture on this,
it helps to just kind of
contrast the two things.
So I'm gonna make kind of a dictionary
of discrete world and continuous worlds.
So we can put discrete world over here and
continuous world over here.
So we have a random variable
that we're looking at, and
usually we've been calling our random
variable x in the discrete case,
and usually we'll call it
x in the continuous case.
So, so far it's completely analogous.
We got discrete, continuous.
Now in the discrete case, as you're very
familiar with by now, we have a PMF,
Which you can just think of as the P(X=x),
viewed as a function of little x.
So if it takes positive integer values,
then I would need to specify this for
all positive integers x.
In the continuous case, the [P(X=x)=0].
So in that case we have a PDF instead,
which usually we would
write as f(x), but
you can call it whatever you want.
I'll call it f sub x (x) just to
emphasize that this is the PDF of x.
So I'm gonna tell you what a PDF is,
but I'm just telling you now,
that it's analogous to a PMF.
The reason we need this
is that the [P(X=x)=0].
So continuous, it means we're thinking of
random variables that could take on any
real value, or
maybe any real number in some interval.
So say we had the interval
from zero to one,
and X is allowed to take on real
number value between zero and one.
Well, I mean we could make up examples
where this not true, but in the continuous
case every, there are uncountably many
real number between zero and one, and
any specific number like Pi
over four has probability zero.
So if we just try to write down a PMF
we would just say it's zero, and
that would be useless.
So that's why we need
something else instead.
So, I'll tell you what a PDF is,
but that's the analogy.
Just to continue this a little more,
then I'll start telling you
more about what PDFs are.
We have a CDF.
That's this function F(x)=P(X</=x),
and sometimes we'll subscript the x just
because maybe if we add another random
variable y, we could write F sub y for
its CDF, okay?
Well, in the continuous case we
have a CDF, exactly the same thing.
So that's one advantage, and
we've seen in the discrete case,
usually it's easier to deal
with the PMF than with the CDF.
The CDF in the discrete case is a lot like
a step function with all these jumps.
It's not so easy to deal with,
and this is much more direct.
But one virtue of CDFs is
a CDF is completely general.
So every random variable has a CDF, and so
we don't need to separate out the theory.
Now, let's talk about PDFs.
So, now this is a PMF.
So the PDF is the most common way to
specify a continuous distribution.
PDF stands for
probability density function,
not portable document format,
probability density function.
Okay, so the keyword here is density.
The common mistake with PDFs is to
think that they're probabilities.
It's not a probability,
it's a probability density.
So you can think of density,
just in an intuitive sense as like,
think of probability as mass.
Remember if the pebbles with
the total mass equals one?
But in the continuous case we can't think
of pebbles any more, it's more like
we just have this kind of massive of mud
that we're smearing around the space.
So I think of discrete as pebbles,
continuous as mud.
The total mass of the mud is one,
and density, makes you think
of mass per volume, mass per area,
things like that, mass per length.
Okay, so it's probability per something,
but not probability.
So we say that x, this is a definition.
A random variable, X,
has PDF F(x) if in order to find
probabilities for X we can achieve
that by integrating the PDF.
So, if the probability of
let's say X is between a and
b, that is X is in some interval a b,
must be given by the integral
from a to b of f(x)dx.
For all a and b.
So, f(x) is not a probability, it's
what you integrate to get probability.
Integrated density,
then you get a probability.
So that's the definition, and let's see
how this relates to CDF and other things.
Notice, by the way, that if we let a = b,
Then we're integrating from a to a,
of f(x)dx.
So that's the area under the curve
from a to a, which is zero,
cuz you haven't actually specified
any interval, so that's zero.
Which agrees with what I said there, the
probability of any specific point is zero.
We need an interval of non-zero length.
Okay, so that's called a PDF,
and to be valid, remember, for
a PMF, I said that a PMF is valid
if they're non-negative and
they sum up to one, right?
So by analogy, for
a PDF we want them to be non-negative,
and rather than summing to one,
it should integrate to one.
Okay, so to be valid,
f(x) is greater than or equal to 0, and
the integral of f(x) for
minus infinity to infinity should equal 1.
Otherwise, we have not
specified a valid PDF.
So it might look something like,
just to draw a picture,
an example, maybe a PDF.
A famous example would be the bell curve
type of thing that we'll get to later.
But anyway, for
the purpose of the picture,
I don't really care exactly what
the definition of this function is.
But I'm just drawing some curve
from minus infinity to infinity.
Now it might be that it's
0 on the negative side and
only positive to the right or whatever,
but it's some continuous looking curve.
And the total area, if I shaded
the whole area under this curve,
I would get 1, right?
And so larger points where the density,
I drew a symmetric one, but
it doesn't have to be symmetric,
it could be some nasty looking curve.
As long as it's non-negative and
the area under the curve is 1,
those are the requirements.
So to kind of interpret a little more
of what does the density really mean?
Cuz I said it's not a probability.
If we take f(x), let's say, at some
point x0, what is that really like?
If we take some point x0 here and
we say the density is this number.
What does that mean?
It's possible that this
number is greater than 1, for
example, because you can have a function
that sometimes is greater than 1, but
the integral could still be 1, right?
So we can't say that's a probability, but
what we can say is, so this is a density.
So if you think of it as like
probability per unit of length,
then if we multiply by some small number,
let's say,
epsilon is approximately
the probability that x
will fall in an interval
of length epsilon.
Let's call this, let's say,
x0- epsilon/2, X0 + epsilon/2.
So all I did was take x0.
The probability of the random variable
exactly equaling x0 is 0, okay?
But if we take some tiny,
so for epsilon, very small.
So the probability is 0 of it equaling x0,
but we take some tiny little
interval around x0, I just wrote
down an interval of length epsilon.
Then the probability is approximately the
density times the length of that interval.
But by multiplying by this epsilon here,
we're kind of converting it back into
a probability scale instead
of a density scale.
And to see, so this is kind of a good
intuitive way to think of a density.
But I haven't yet
shown you why is that equivalent to this
mathematical thing that I wrote here.
But to see why this is true,
just by staring at this,
If we wanna find the probability,
then what do we do?
We integrate the pdf from here to here,
right?
So imagine this integral where you're
integrating from here to here, okay?
And then let's think about
what would that integral be.
Well, I didn't just say epsilon was small,
I said epsilon is very small.
I could have said very, very small.
Now if epsilon is very,
very, very small, what that
means is that in that tiny little
interval, f is not gonna change very much.
So over that tiny miniscule interval,
we can treat this function as
being approximately a constant.
And it's easy to integrate a constant,
the integral of a constant is just the
constant times the length of the interval.
And that's all we did, so we're treating
it, if it's approximately this constant on
that interval times
the length of the interval.
So, that's why this follows from this.
And this is more useful for driving
things, but this gives you some more
intuition on what's the difference between
a probability density and a probability.
Okay, so, let's see how is
this thing related to the CDF?
So if x has PDF, little f,
let's find the CDF.
Well, by definition, the CDF is
the probability, That x is less than or
equal to little x, but by definition,
I said the definition of a PDF is
that's the thing that you integrate
to get probability, right?
So if I wanna know what's
the probability that x is in any region,
all I do is integrate
the PDF over that region.
So here, it would simply integrate
from minus infinity to x of,
I could call it f(x)dx, but it's a little
bit clearer to change the letter,
so f(t)dt.
t is just a dummy variable here.
I just didn't want it
to clash with this x.
That is, for any particular number x,
we're gonna take this curve.
Let's say x is here.
If this x is this x we're looking at,
then we're saying just look at the area
under the curve up to this point.
That would give us the CDF at that point,
all right?
Because we wanna know the probability
of everything to the left, and
probability is given just by
taking area under this curve.
So it's just the area under the curve
from minus infinity up to x.
That's all we're doing.
So that shows how to get
from a PDF to a CDF, okay?
Well, what about the other way around,
if we have a CDF, how do we get the PDF?
So go the other way around,
if x has CDF, capital F, and of course,
we're assuming it's a continuous
random variable, not a discrete one.
So in the continuous case, by the way,
the terminology, it could be slightly
confusing because when we say we
have a continuous distribution,
it means capital F should be continuous.
But we don't just want
it to be continuous,
we want it to be differentiable.
So the continuous refers to not so
much to F being a continuous function.
It refers to the fact that x can
take on a whole continuum of values,
rather than just discrete values, okay?
So if it has CDF F, and
x is a continuous random variable,
and then we want to get,
From the CDF to the PDF,
So f(x) =, so let's think about that.
This is the relationship between a CDF and
a PDF, okay?
And but now I wanna say, if we know this
integral, how can we extract out this?
Well, the answer is just take
the derivative, right, f(x) = F'(x).
And why is that true?
Your favorite theorem of calculus, that's
the fundamental theorem of calculus, FTC.
Actually, we're gonna need both parts of
the fundamental theorem of calculus, so
it's nice that actually that
it is pretty fundamental.
At least the way I learned it, part one of
the fundamental theorem of calculus said
if you have an integral that looks like
this, up to some indetermined upper limit,
if you take the derivative of that,
then you just get this function.
So that's the first part of
the fundamental theorem of calculus.
The second part of the fundamental
theorem of calculus says that if you
wanna do a definite integral,
you find anti-derivative and
then evaluate it at the two end points.
So okay, this is just saying
the derivative of the CDF is the PDF,
in the continuous case.
So it's a very straightforward
relationship between them.
And if we wanted to know, this also kind
of confirms something we did earlier.
Let's say we wanna know the probability
that x is between a and b.
And in the discrete case it's crucial
whether less than or equal, and so on.
In the continuous case, it makes no
difference if you write strict or
not strict here.
So according to the definition of a PDF,
if we wanna get the probability of this
interval that x is in that interval.
All we do is integrate
the PDF from there to there.
But another way to think about this would
be, remember your fundamental theorem of
calculus, and
the notation matches up pretty well too.
Because, like in AP calculus, usually if
you have a function little f, usually it
will call its anti-derivative capital F,
which is exactly what we're doing here.
If we wanna do this integral,
we take some anti-derivative.
Well, we already have one, that's the CDF.
And then we evaluate here, evaluate there.
So that's just F(b)- F(a).
So that's also true by
fundamental theorem of calculus.
And that's similar to a result
that we had earlier for CDFs, so
it's consistent with earlier stuff, okay?
So we'll do some examples
in a little while.
But right now this is just the general
framework, and making the analogy, okay?
So we have a CDF, and I'll just add here,
just to have it in this dictionary too.
That's the derivative of the PDF is
the derivative of the CDF, question?
>> [INAUDIBLE]
>> Yeah,
and the question is, in this framework,
is big F always differentiable?
Yeah, we have to assume
that it's differentiable.
I mean, there are functions
that are continuous, but
not differentiable everywhere.
But in that case it would just
be a more complicated thing.
And when we say continuous random variable
in this course it means we have a CDF
which has a derivative.
Because if we don't have a PDF then we're
not dealing with continuous distributions
and things can be much nastier.
So yeah we're assuming that
this derivative exists.
Okay, so that's, So
in general if I ask you,
find the distribution of whatever,
in the continuous case,
in the discrete case,
you can either give the PMF or the CDF.
Those are equally valid ways
to describe a distribution.
In a continuous case, you can give the PDF
or the CDF, those are equally valid ways.
Okay, so let's continue this list.
In the discrete case,
we have the expected value, right?
And remember the expected value,
we just take the sum
of the values times the probability
of the values, okay?
So in the continuous case,
this would just be 0 because all of
these are 0, so that's no useful.
But by analogy, instead of a sum,
we'll do an integral.
S the definition of the expected
value in the continuous
case is that we integrate x times the PDF.
So it's completely analogous.
In general we're gonna integrate
from minus infinity to infinity.
And sometimes we'll deal with random
variables where the only possible
values are say between 0 and 1.
And in that case we're just integrating
0 outside of that interval.
So then we would restrict it to
the region where it's non-zero.
But in general that's best the definition,
okay?
So that's completely analogous.
Let's do one more concept that applies in
both the discrete and continuous cases.
And that's the notion of variance, so
we've been talking about expected values.
But that's just giving a one number
summary of the average, right?
But it is not telling us anything about
the spread of the distribution, right?
How spread out is it?
So for that, we need the idea of variance,
and the definition of the variance.
So intuitively,
variance is just supposed to be a measure
of how spread out the distribution is.
That is, on average,
how far is x from its mean?
So we might start by trying to do
the expected value of x minus the expected
value of x.
Here's the mean.
This is the difference between x and
its mean.
But if we just did this though,
we would always get 0, though.
Because by linearity,
that's E of X- E of E of X.
But E of E of X isn't E of X.
Because it's just a constant, so
this would be useful because
that would just be zero.
Okay, so then I guess the most
obvious thing to me to do to
fix that problem is to
put absolute value signs.
Because then we're making
it non-negative and
then there won't be 0 anymore,
except if x is a constant.
But absolute values
are annoying to deal with.
For example, the absolute value function,
it's this V shaped thing right?
It has a sharp corner,
it's not differentiable.
It is difficult to work with.
So the standard way to deal with this is
instead of absolute values, to square it.
One reason as I said is
that the absolute value
is just annoying because
it's not differential.
Kind of a deeper reason though is that
the square, anytime you see squares,
it should start to reminding you
of the Pythagorean theorem, right?
It means that there's a lot of geometry,
there's a lot of beautiful geometry that
goes on with squares, and sums of squares,
and right triangles, and
Euclidean distance, and things like that.
And you lose that geometry if
you're using absolute value, and
there are other reasons as well.
But anyway this is the standard
definition of variance.
So this is on average, how far is x from
its mean, except that we're squaring it.
One annoying thing about squaring it
though, is that we changed the units.
So if x is like a measurement that
let's say it's measured in miles.
We are measuring some distance
in miles and we square it,
we've got miles squared, okay?
And so that's no longer in the same
units as what we started with.
So because of that, something more
interpretable is the standard deviation,
Which is a familiar term.
Standard deviation is defined as just
the square root of the variance.
So this seems, at first, like a kind
of convoluted thing to be doing.
First we square everything then we
take the average then we square
root it back again.
The reason is that the variances
has really nice properties, but
on the other hand we changed the units,
so we just change it back at the end.
So that's the definition
of standard deviation.
In general, variance is a lot nicer to
work with than standard deviation as far
as doing the math.
But then at the end of the day when you
want to have something interpretable.
It's easier to think about what
the standard deviation means,
because you're back on the original units.
Okay, and let's just write one.
One nice thing about this letter E
notation, this is a really good notation.
E for expectation.
Because I could just write
down this one thing and
I didn't assume here that X is
continuous or discrete or anything.
This is just a general definition and
I didn't need to write
a separate definition for
the discrete or for the continuous case.
So this is a unified definition.
Let's just write the other.
Another way to compute variance,
rather than doing this, so
another way to write variance which
is more commonly used than this one.
This one's the usual definition but the
other way to write it which I'm about to
show you is usually easier for
computing it.
Not always, sometimes this one's easier.
Another way to express variance.
So we want the variance of X.
Let's just expand this thing out.
I'm just gonna multiply it out, right.
So that's X squared.
-2 X(EX) I'm just squaring
this thing + (EX) squared.
And let's use linearity this
is E(X) squared, minus.
Now for this middle term, the 2 is
a constant and constants can come out.
The E(X) is also a constant, right?
X is a random variable,
E(X) is just a number.
The 2 E(X) is just
a number that comes out.
So that's 2 E(X), and
then what's left inside is still an E(X).
So we have another E(X) there, and
then plus, this thing is also a constant.
So taking its expected value does nothing
because it's just a constant already.
So that's + E(X) squared, and so
this whole thing just becomes
E(X) squared- E(X) squared.
And it sounds like what I just said was 0,
but the parentheses are different.
Here we square it first
then take the average.
Here we take the average then square it.
We take that difference, okay?
So that's usually easier.
And so that answers the age old question,
if you had, this question came up for
me I think in seventh grade
science class where I had to,
do a bunch of experiments and
then I got a bunch of numbers.
And for some reason I was squaring
the numbers and I wanted the average.
I didn't know whether I should
square first and then average or
average first and then square.
And I think I computed both ways and
I got a slightly different answer.
Which one is correct?
Well, this one doesn't say
which one is correct, but
this says that this one will always
be bigger than or equal to this one.
And equality holds only in
the case when X is a constant.
So if X is a constant, then the variance
is 0 cuz X just equals its mean obviously.
If X is not a constant, than what's
gonna happen is that this thing,
that you're averaging some
numbers that may be sometimes 0.
But it's certainly sometimes positive,
and you can't average positive things and
get a negative number.
You can't average positive things and
get 0.
So it would be strictly positive.
Which means this is
strictly greater than this.
Except in the case of a constant.
So, okay, that's the variance.
And as far as notation,
it's standard to write
E(X) squared for E(X) squared this way.
That's just standard notation.
So, if you see E(X) squared, you should
always interpret that as squaring first,
and then take the E.
That's just a convention,
a pretty standard convention.
This way is a little more clearer,
to avoid any possible ambiguity.
But, it's very common to
see it written this way.
So interpret it as squaring first.
Okay so that's variance, and over here
we can continue our little dictionary,
variance of X = E (X squared)-
E (X) squared the other
way, And then the continuous case,
same thing again.
And the one difficulty with this is,
we've been talking
on how do we compute E(X), but
how do we actually compute E(X) squared,
that's the question that
we need to address.
How do you actually compute that thing?
So we'll talk about that a little later.
But first, we should see at least one
example of a continuous distribution.
The simplest one to start
with is called the uniform.
As far as what you'll
need before the midterm,
there are only two continuous
distributions that you need to know
by name before the midterm,
and then we'll do more later.
One is the uniform,
the other is the normal.
Uniform is the simplest
continuous distribution, so
we'll start with that one right now.
Normal distribution we'll talk about
mostly for next week, and normal
distribution is the most famous important
distribution in all of statistics.
And the reasons why it's so
important will kind of gradually
emerge over the semester.
Let's start with the uniform.
So here's the uniform distribution
on some interval on (a,b).
So we have some interval from a to b.
I'll say here's a, here's b.
We wanna pick a random
point in this interval.
I'll put random in quotes.
In this interval.
How do we do that,
the question is what does random mean?
If it's sort of intuitively
random is too vague because that
just means we have some
random variable okay?
What if we said completely random.
Like what's the most
random that it could be?
Again that's a little bit vague but
let's just kinda explore that intuition a
little bit and then write down a formula.
If it's completely random see I can
just see the probability of any two
points is the same because all real
numbers between here and here.
Every individual number is probably 0.
So it's not so interesting to say
all the probabilities are the same.
So pick some random point say,
there, x but
the problem if I got that
exact value right there was 0.
Okay so that means we still have
the same way it does mean for
it to be completely random.
So well the intuitions
now is suppose we broke
this interval into two halves
where this is the midpoint say.
Intuitively, if it's completely
random it should be that this half is
equally likely as this half.
Cuz If it were not then it seems like we
would kind of prefer to be, you know,
the random variable prefers to be
more to the right than to the left.
And somehow we want a concept
where it's not gonna,
it doesn't care where it is, right?
So in other words we could
say that probability, so for
the uniform means that probability
is proportional to length.
That's a reasonable definition.
That is, if we take two
intervals of the same length,
they should have the same probability.
If one interval is twice as long,
it seems reasonable that that
one should be twice as likely.
So we're just gonna write down
a continuous distribution
where probability is
proportional to length.
And so to specify this, we can
either write down the pdf and drive,
the cdf, or we can try to figure out what
the cdf should be and derive the pdf.
Let's start with the pdf here,
because we're trying to practice pdfs.
So here's the pdf, it's a constant.
If x is between a and
b and it's 0 otherwise,
Because I want probability to
be 0 outside of this interval.
Inside that interval I want the density
to be constant, because if the density
were higher at one point than another,
that doesn't seem very uniform.
So well of course, we could ask, what's c?
Well it has to be that
the integral of the pdf is 1.
And I could start out by integrating
from minus infinity to infinity, but
of course we only need to integrate from
a to b, cuz it's zero outside of there.
If we integrate this we have to get 1,
therefore c equals1 over b minus a.
So it's just one over
the length of the interval.
It has to be this way otherwise
this would not be a valid pdf.
Now suppose we want the cdf.
So to get the cdf we just have to
integrate this thing
minus infinity up to x.
So how do we do that?
Again, we don't really have to go
all the way from minus infinity
we can just start it at a.
f of t dt,
then we have to consider some cases.
Well, first of all this
is 0 if x is less than a.
Well, this expression here that I wrote
down is kind of already assuming that x is
greater than a.
So assume x is greater than a.
If x is less than a, then the probability
is 0 so the cdf has to be 0.
And we also know that it's 1 if well,
let me just write this separately.
So here's the cdf,
if x is less than a it's 0.
If x is greater than b it's 1, because
we know for sure that x is less than b.
Now, the interesting case is
what happens in the middle.
To get the thing in the middle, all we
have to do is integrate a constant and
this is the constant with the integral so
that I plug in f of t equals c here.
That's gonna be c times x minus a.
Right just integrate the constant.
It's a very easy integral.
So that's just gonna be x
minus a over b minus a.
If x is between a and b.
And notice this makes sense because if
we let x equals to a here it reduces
down to 0.
And if we let x equal
to b it reduces to 1.
So this is a continuous function.
So it's saying intuitively,
this is a linear function of x.
They're saying that the probability is,
as you increase x,
the probability is increasing linearly.
Which make sense, cuz you're
accumulating more and more stuff.
So let's get the expected value of x.
Expected value of x, again,
it's just gonna be an easy integral.
Because we just have to
integrate from a to b of x times
the pdf.
So I just wrote down x times the pdf.
So integrating x is easy,
it's just gonna be x squared over 2.
So this is x squared
over 2 times b minus a.
And we evaluate this
as x goes from a to b.
So that's really just b squared.
Let's factor our the 1
over two 2 b minus a.
And then it's b squared minus a squared.
But b squared minus a squared
is b minus a, b plus a.
So we can actually cancel the b minus
a and we just get a plus b over 2.
Just doing that easy integral.
Well, that's a very intuitive answer.
That's just the mid point.
It says the average is in middle which
it would really be weird if that didn't
happen cuz this is supposed to be uniform.
Okay, so that was just check that.
Now, we have a bit of a quandary though.
For how to deal with the variants.
So let's try to get the variants.
So If
we want the variants then that
means we need e of x squared.
Because we know this part
we don't know this part.
How do we get rid of x squared?
Well, E of x squared,
Equals?
So if we think carefully about
this how do we get E of X squared?
Well, X squared is a random variable.
Let's call that thing Y.
So let's let Y equal X squared.
If we take a function of a random
variable it's a random variable.
So Y equals X squared.
So that's E of Y.
And how do we get E of Y?
Well to get E of Y then
we need to know the pdf,
assuming X is continuous right now.
To get E of Y then we need
to know the pdf of Y and
then we integrate Y times the pdf of Y,
it'd be Y.
So the question is do
we need the pdf of Y?
But that sounds kind of annoying
because we don't know the pdf of Y.
Now we can get the pdf of Y, and later in
the course we will talk about how do we
get the pdf of Y, but right now that's
seems like a pretty annoying problem.
So let's kind of do this
more carelessly instead.
Let's just say well it's too
much hustle to get the pdf of Y.
So instead I'm just gonna say
I'm gonna reason by analogy.
And I'm looking at this
formula right now for E of X.
But I don't want E of X.
I want E of X squared.
So I'm just gonna change
that to an X squared.
All right, I want X squared, not X, so
I'm just gonna put down X squared there.
And then I'll go f of x dx.
That's the pdf of X, that's what I know.
And I'm too lazy to find the pdf of Y,
so I'll just change X to X squared.
Well, that doesn't sound very legitimate.
This, what I just did is called the Law
of the Unconscious Statistician.
Which has a nice acronym
that's just LOTUS.
It's called that because that just
seems like if you're kind of like
half asleep and
you just want to find this thing and
you just kind of replace X by X
squared because X squared and
it seems like something you might do
if you're not thinking very hard.
So to state it in general
in the continuous case,
we want the expected value
of a function of that.
X is a random variable who's PDF we know.
We want the expected
value of a function of X.
So, the Principled Approach would be,
find the distribution of this and
then work with that.
The Lazy Approach would be,
still use the distribution of X but
that sounds kind of too good to be true.
So the Lazy Approach here would be well,
I'm gonna take g of X I am
gonna change big X to little x.
And then I am still gonna need
insist on using the density of x and
not convert anything.
Well, this turns out to be true.
So I'll put a box around it.
We can talk sometime next week
about the proof, why this is true.
But this turns out to be true.
And thus, even though it sounds too
good to be true, it actually is true.
So that's called LOTUS.
This is the continuous version.
In the discrete let me
write both versions.
So a continuous LOTUS is
that thing I just wrote,
we have LOTUS so
same equation you can copy that there.
And let me just write the discrete case,
again we want the expect value
of some function g of S,
so all I'm gonna do is take this.
This is the definition
of the expected value.
All I'm gonna do is change X to g of x.
So this is gonna be g of
x times the PMF of x.
It says we don't need to convert and
get a distribution for g of x.
We just do that.
This is also valid.
We'll talk more about why later.
But it's useful to know that right now.
So coming back to this
problem about the uniform,
if we want the variance of the uniform so
let's let, just for simplicity let's
let u be uniform between 0 and 1.
And suppose we want the variance.
So we know the expected value of u,
Is one-half, just the midpoint.
And if we want E of u squared.
According to LOTUS, we don't need
to first find the PDF of u squared.
We can just directly write down
the integral 0 to 1 of u squared
times the PDF times the PDF f
sub u of u du as the PDF, but
this PDF is actually equal to a constant
and that constant is 1 in this case.
So this is just equal to,
this part is just one.
So it's the integral of u squared,
the u, u cubed over 3, which is 1/3.
So therefore the variance
of u equals e of u squared,
minus e of u squared the other way.
And that's one-third minus
one-quarter equals one twelfth.
So the variants of a uniform
zero one is a one twelfth and
that was a very easy calculation
because we were able to use lotus here,
which we haven't proven yet but
we will talk more about that later.
I'm showing you how to use it right now,
then we'll justify it more.
So that thing that's too good
to be true actually works.
So that's Lotus.
One more thing about
the uniform distribution.
It seems like the uniform is the simplest
continuous distribution that
you could possibly imagine.
Because the PDF it's just a constant.
On some interval and
one other point about this is we have
to have some bounded interval here.
We cannot define a uniform
distribution on the entire real line.
Sometimes that it's a bit
annoying if there isn't one.
But if the whole real line there
will be no way to normalize it,
there'd be no way to find a constant and
make it disintegrate to one.
So it sounds like this is
an extremely simple distribution.
And it is,
it's just constant PDF on some interval.
Extremely easy.
So start with the uniform zero one,
it seems very simple, but
actually uniform zero one
has the property that if
you give me one uniform random variable
and you're interested in some other
distribution, there is a way to
convert it and simulate that.
That is from the uniformed
zero one you can simulate or
generate from any distribution
no matter how complicated it is.
At least in principle.
As a matter of computation that may be
easy or hard, but in principle from
the uniform you can get anything, so
I call that universality of the uniform.
Universality of the uniform means
that given a uniform you can
create any distribution that you want.
So that's kind of theoretically nice
in that it kind of unifies concepts and
says this things that's seems very,
very simple to just one uniform.
You can actually use it to generate
something that's as complicated
as you want.
That's kind of cool but it's also useful
in practice where most computers programs
can generate random numbers between zero
and one, it's actually pseudo random.
But they not know how to generate
whatever complicated distribution you're
interested in.
And this, in many cases gives you away to
convert from the random uniforms
to whatever you want to simulate.
So I want to show why that's true.
So the statement is that,
we're gonna start with the uniform
between 0 and 1 and
let F be a CDF, that we're interested in.
So usually we've been talking about here's
the random variable and then find in CDF.
Here we're going the other way
in the sense that we assume
that we have some CDF
that's of interest to us.
But we do not yet have access to
a random variable that has that CDF.
So let F be a CDF and
it's possible to generalize this further.
But to make this something that we can
do fairly quickly let's assume that F is
strictly increasing, so
we don't have to deal with flat regions.
And let's also assume that F
is continuous as a function.
Just so that we don't have to
think about jumps right now,
although you can generalize this.
Now then the theorem says that if we let
X define X to be F inverse of u.
So the inverse function exists in this
case because I took something that was
continuously and strictly increasing,
it will have an inverse.
So we take the inverse and we plug in u.
Then the statement is that X
is distributed according to F.
That is the CDF of X is F.
So what this says is we have
this CDF we're interested in.
We take the inverse CDF, plug in
the uniform, and then we've constructed
a random draw from that distribution
we're interested in, capital F.
So let's prove this very quickly.
And the proof doesn't require
anything fancy at all.
It doesn't require anything, except for
understanding what a CDF is.
So another reason I like to talk about
this is it's just good practice with
really understanding what a CDF is.
Cuz the better you understand CDFs, then
the easier it is to see why this is true.
So to prove this, all we need to
do is to compute the CDF of X.
This notation means that X has the CDF F,
that is, X follows this distribution.
So all we have to do is compute the CDF
of X, but that's actually pretty easy.
Because by definition, X is F inverse
of u, I'm just plugging in what X is.
Now let's apply capital F to both side.
So I am just putting F here and F here.
And that's an equivalent because I made
these nice assumptions about F that's
an equivalent to n,
u is less than and equal to F of x.
You know if we didn't have
an increasing function then if
I apply both sides by minus one then
the inequality flips, things like that.
But since we have an increasing
function then It's preserved.
And since its invertible,
this is really the same event
just written in a different way.
Now we are done with that, because what
the probability that u is less than or
equal to F of x.
I'll just draw a simple little picture.
U is uniform from 0 to 1, and F of x,
remember that's a probability,
so that's just some number between 0 and
1, let's say it's there F of x.
Now I said that probability is
proportional to length for a uniform and
in this case that proportionality
constant is just 1 because the length
of the whole interval is 1.
So for uniforms 0 and 1, the probability
of an interval is its length.
So we want to know, what's the probability
that u is between here and here.
That's just the length of
the interval that's F of X.
And that's the end.
That's the end of the lecture.
So have a good weekend.
Thanks.
