The following content is
provided under a Creative
Commons license.
Your support will help MIT
OpenCourseWare continue to
offer high-quality educational
resources for free.
To make a donation or view
additional materials from
hundreds of MIT courses, visit
MIT OpenCourseWare at
ocw.mit.edu.
PROFESSOR: We're going to finish
today our discussion of
limit theorems.
I'm going to remind you what the
central limit theorem is,
which we introduced
briefly last time.
We're going to discuss what
exactly it says and its
implications.
And then we're going to apply
to a couple of examples,
mostly on the binomial
distribution.
OK, so the situation is that
we are dealing with a large
number of independent,
identically
distributed random variables.
And we want to look at the sum
of them and say something
about the distribution
of the sum.
We might want to say that
the sum is distributed
approximately as a normal random
variable, although,
formally, this is
not quite right.
As n goes to infinity, the
distribution of the sum
becomes very spread out, and
it doesn't converge to a
limiting distribution.
In order to get an interesting
limit, we need first to take
the sum and standardize it.
By standardizing it, what we
mean is to subtract the mean
and then divide by the
standard deviation.
Now, the mean is, of course, n
times the expected value of
each one of the X's.
And the standard deviation
is the
square root of the variance.
The variance is n times sigma
squared, where sigma is the
variance of the X's --
so that's the standard
deviation.
And after we do this, we obtain
a random variable that
has 0 mean -- its centered
-- and the
variance is equal to 1.
And so the variance stays the
same, no matter how large n is
going to be.
So the distribution of Zn keeps
changing with n, but it
cannot change too much.
It stays in place.
The mean is 0, and the width
remains also roughly the same
because the variance is 1.
The surprising thing is that, as
n grows, that distribution
of Zn kind of settles in a
certain asymptotic shape.
And that's the shape
of a standard
normal random variable.
So standard normal means
that it has 0
mean and unit variance.
More precisely, what the central
limit theorem tells us
is a relation between the
cumulative distribution
function of Zn and its relation
to the cumulative
distribution function of
the standard normal.
So for any given number, c,
the probability that Zn is
less than or equal to c, in the
limit, becomes the same as
the probability that the
standard normal becomes less
than or equal to c.
And of course, this is useful
because these probabilities
are available from the normal
tables, whereas the
distribution of Zn might be a
very complicated expression if
you were to calculate
it exactly.
So some comments about the
central limit theorem.
First thing is that it's quite
amazing that it's universal.
It doesn't matter what the
distribution of the X's is.
It can be any distribution
whatsoever, as long as it has
finite mean and finite
variance.
And when you go and do your
approximations using the
central limit theorem, the only
thing that you need to
know about the distribution
of the X's are the
mean and the variance.
You need those in order
to standardize Sn.
I mean -- to subtract the mean
and divide by the standard
deviation --
you need to know the mean
and the variance.
But these are the only things
that you need to know in order
to apply it.
In addition, it's
a very accurate
computational shortcut.
So the distribution of this
Zn's, in principle, you can
calculate it by convolution of
the distribution of the X's
with itself many, many times.
But this is tedious, and if you
try to do it analytically,
it might be a very complicated
expression.
Whereas by just appealing to the
standard normal table for
the standard normal random
variable, things are done in a
very quick way.
So it's a nice computational
shortcut if you don't want to
get an exact answer to a
probability problem.
Now, at a more philosophical
level, it justifies why we are
really interested in normal
random variables.
Whenever you have a phenomenon
which is noisy, and the noise
that you observe is created by
adding the lots of little
pieces of randomness that are
independent of each other, the
overall effect that you're
going to observe can be
described by a normal
random variable.
So in a classic example that
goes 100 years back or so,
suppose that you have a fluid,
and inside that fluid, there's
a little particle of dust
or whatever that's
suspended in there.
That little particle gets
hit by molecules
completely at random --
and so what you're going to see
is that particle kind of
moving randomly inside
that liquid.
Now that random motion, if you
ask, after one second, how
much is my particle displaced,
let's say, in the x-axis along
the x direction.
That displacement is very, very
well modeled by a normal
random variable.
And the reason is that the
position of that particle is
decided by the cumulative effect
of lots of random hits
by molecules that hit
that particle.
So that's a sort of celebrated
physical model that goes under
the name of Brownian motion.
And it's the same model that
some people use to describe
the movement in the
financial markets.
The argument might go that the
movement of prices has to do
with lots of little decisions
and lots of little events by
many, many different
actors that are
involved in the market.
So the distribution of stock
prices might be well described
by normal random variables.
At least that's what people
wanted to believe until
somewhat recently.
Now, the evidence is that,
actually, these distributions
are a little more heavy-tailed
in the sense that extreme
events are a little more likely
to occur that what
normal random variables would
seem to indicate.
But as a first model, again,
it could be a plausible
argument to have, at least as
a starting model, one that
involves normal random
variables.
So this is the philosophical
side of things.
On the more accurate,
mathematical side, it's
important to appreciate
exactly quite kind of
statement the central
limit theorem is.
It's a statement about the
convergence of the CDF of
these standardized random
variables to
the CDF of a normal.
So it's a statement about
convergence of CDFs.
It's not a statement about
convergence of PMFs, or
convergence of PDFs.
Now, if one makes additional
mathematical assumptions,
there are variations of the
central limit theorem that
talk about PDFs and PMFs.
But in general, that's not
necessarily the case.
And I'm going to illustrate
this with--
I have a plot here which
is not in your slides.
But just to make the point,
consider two different
discrete distributions.
This discrete distribution
takes values 1, 4, 7.
This discrete distribution can
take values 1, 2, 4, 6, and 7.
So this one has sort of a
periodicity of 3, this one,
the range of values is a little
more interesting.
The numbers in these two
distributions are cooked up so
that they have the same mean
and the same variance.
Now, what I'm going to do is
to take eight independent
copies of the random variable
and plot the PMF of the sum of
eight random variables.
Now, if I plot the PMF of the
sum of 8 of these, I get the
plot, which corresponds to these
bullets in this diagram.
If I take 8 random variables,
according to this
distribution, and add them up
and compute their PMF, the PMF
I get is the one denoted
here by the X's.
The two PMFs look really
different, at least, when you
eyeball them.
On the other hand, if you were
to plot the CDFs of them, then
the CDFs, if you compare them
with the normal CDF, which is
this continuous curve, the CDF,
of course, it goes up in
steps because we're looking at
discrete random variables.
But it's very close
to the normal CDF.
And if we, instead of n equal to
8, we were to take 16, then
the coincidence would
be even better.
So in terms of CDFs, when we add
8 or 16 of these, we get
very close to the normal CDF.
We would get essentially the
same picture if I were to take
8 or 16 of these.
So the CDFs sit, essentially, on
top of each other, although
the two PMFs look
quite different.
So this is to appreciate that,
formally speaking, we only
have a statement about
CDFs, not about PMFs.
Now in practice, how do you use
the central limit theorem?
Well, it tells us that we can
calculate probabilities by
treating Zn as if it
were a standard
normal random variable.
Now Zn is a linear
function of Sn.
Conversely, Sn is a linear
function of Zn.
Linear functions of normals
are normal.
So if I pretend that Zn is
normal, it's essentially the
same as if we pretend
that Sn is normal.
And so we can calculate
probabilities that have to do
with Sn as if Sn were normal.
Now, the central limit theorem
does not tell us that Sn is
approximately normal.
The formal statement is about
Zn, but, practically speaking,
when you use the result,
you can just
pretend that Sn is normal.
Finally, it's a limit theorem,
so it tells us about what
happens when n goes
to infinity.
If we are to use it in practice,
of course, n is not
going to be infinity.
Maybe n is equal to 15.
Can we use a limit theorem when
n is a small number, as
small as 15?
Well, it turns out that it's
a very good approximation.
Even for quite small values
of n, it gives us
very accurate answers.
So n over the order of 15, or
20, or so give us very good
results in practice.
There are no good theorems
that will give us hard
guarantees because the quality
of the approximation does
depend on the details of the
distribution of the X's.
If the X's have a distribution
that, from the outset, looks a
little bit like the normal, then
for small values of n,
you are going to see,
essentially, a normal
distribution for the sum.
If the distribution of the X's
is very different from the
normal, it's going to take a
larger value of n for the
central limit theorem
to take effect.
So let's illustrates this with
a few representative plots.
So here, we're starting with a
discrete uniform distribution
that goes from 1 to 8.
Let's add 2 of these random
variables, 2 random variables
with this PMF, and find
the PMF of the sum.
This is a convolution of 2
discrete uniforms, and I
believe you have seen this
exercise before.
When you convolve this with
itself, you get a triangle.
So this is the PMF for the sum
of two discrete uniforms.
Now let's continue.
Let's convolve this
with itself.
These was going to give
us the PMF of a sum
of 4 discrete uniforms.
And we get this, which starts
looking like a normal.
If we go to n equal to 32, then
it looks, essentially,
exactly like a normal.
And it's an excellent
approximation.
So this is the PMF of the sum
of 32 discrete random
variables with this uniform
distribution.
If we start with a PMF which
is not symmetric--
this one is symmetric
around the mean.
But if we start with a PMF which
is non-symmetric, so
this is, here, is a truncated
geometric PMF, then things do
not work out as nicely when
I add 8 of these.
That is, if I convolve this
with itself 8 times, I get
this PMF, which maybe resembles
a little bit to the
normal one.
But you can really tell that
it's different from the normal
if you focus at the details
here and there.
Here it sort of rises sharply.
Here it tails off
a bit slower.
So there's an asymmetry here
that's present, and which is a
consequence of the
asymmetry of the
distribution we started with.
If we go to 16, it looks a
little better, but still you
can see the asymmetry between
this tail and that tail.
If you get to 32 there's still a
little bit of asymmetry, but
at least now it starts looking
like a normal distribution.
So the moral from these plots
is that it might vary, a
little bit, what kind of values
of n you need before
you get the really good
approximation.
But for values of n in the range
20 to 30 or so, usually
you expect to get a pretty
good approximation.
At least that's what the visual
inspection of these
graphs tells us.
So now that we know that we have
a good approximation in
our hands, let's use it.
Let's use it by revisiting an
example from last time.
This is the polling problem.
We're interested in the fraction
of population that
has a certain habit been.
And we try to find what f is.
And the way we do it is by
polling people at random and
recording the answers that they
give, whether they have
the habit or not.
So for each person, we get the
Bernoulli random variable.
With probability f, a person is
going to respond 1, or yes,
so this is with probability f.
And with the remaining
probability 1-f, the person
responds no.
We record this number, which
is how many people answered
yes, divided by the total
number of people.
That's the fraction of the
population that we asked.
This is the fraction inside our
sample that answered yes.
And as we discussed last time,
you might start with some
specs for the poll.
And the specs have
two parameters--
the accuracy that you want and
the confidence that you want
to have that you did really
obtain the desired accuracy.
So the specs here is that we
want, probability 95% that our
estimate is within 1 % point
from the true answer.
So the event of interest
is this.
That's the result of the poll
minus distance from the true
answer is less or bigger
than 1 % point.
And we're interested in
calculating or approximating
this particular probability.
So we want to do it using the
central limit theorem.
And one way of arranging the
mechanics of this calculation
is to take the event of interest
and massage it by
subtracting and dividing things
from both sides of this
inequality so that you bring
him to the picture the
standardized random variable,
the Zn, and then apply the
central limit theorem.
So the event of interest, let
me write it in full, Mn is
this quantity, so I'm putting it
here, minus f, which is the
same as nf divided by n.
So this is the same
as that event.
We're going to calculate the
probability of this.
This is not exactly in the form
in which we apply the
central limit theorem.
To apply the central limit
theorem, we need, down here,
to have sigma square root n.
So how can I put sigma
square root n here?
I can divide both sides of
this inequality by sigma.
And then I can take a factor of
square root n from here and
send it to the other side.
So this event is the
same as that event.
This will happen if and only
if that will happen.
So calculating the probability
of this event here is the same
as calculating the probability
that this events happens.
And now we are in business
because the random variable
that we got in here is Zn, or
the absolute value of Zn, and
we're talking about the
probability that Zn, absolute
value of Zn, is bigger than
a certain number.
Since Zn is to be approximated
by a standard normal random
variable, our approximation is
going to be, instead of asking
for Zn being bigger than this
number, we will ask for Z,
absolute value of Z, being
bigger than this number.
So this is the probability that
we want to calculate.
And now Z is a standard normal
random variable.
There's a small difficulty,
the one that we also
encountered last time.
And the difficulty is that the
standard deviation, sigma, of
the Xi's is not known.
Sigma is equal to f times--
sigma, in this example, is f
times (1-f), and the only
thing that we know about sigma
is that it's going to be a
number less than 1/2.
OK, so we're going to have to
use an inequality here.
We're going to use a
conservative value of sigma,
the value of sigma equal to 1/2
and use that instead of
the exact value of sigma.
And this gives us an inequality
going this way.
Let's just make sure why the
inequality goes this way.
We got, on our axis,
two numbers.
One number is 0.01 square
root n divided by sigma.
And the other number is
0.02 square root of n.
And my claim is that the numbers
are related to each
other in this particular way.
Why is this?
Sigma is less than 2.
So 1/sigma is bigger than 2.
So since 1/sigma is bigger than
2 this means that this
numbers sits to the right
of that number.
So here we have the probability
that Z is bigger
than this number.
The probability of falling out
there is less than the
probability of falling
in this interval.
So that's what that last
inequality is saying--
this probability is smaller
than that probability.
This is the probability that
we're interested in, but since
we don't know sigma, we take the
conservative value, and we
use an upper bound in terms
of the probability of this
interval here.
And now we are in business.
We can start using our normal
tables to calculate
probabilities of interest.
So for example, let's say that's
we take n to be 10,000.
How is the calculation
going to go?
We want to calculate the
probability that the absolute
value of Z is bigger than 0.2
times 1000, which is the
probability that the absolute
value of Z is larger than or
equal to 2.
And here let's do
some mechanics,
just to stay in shape.
The probability that you're
larger than or equal to 2 in
absolute value, since the normal
is symmetric around the
mean, this is going to be twice
the probability that Z
is larger than or equal to 2.
Can we use the cumulative
distribution function of Z to
calculate this?
Well, almost the cumulative
gives us probabilities of
being less than something, not
bigger than something.
So we need one more step and
write this as 1 minus the
probability that Z is less
than or equal to 2.
And this probability, now,
you can read off
from the normal tables.
And the normal tables will
tell you that this
probability is 0.9772.
And you do get an answer.
And the answer is 0.0456.
OK, so we tried 10,000.
And we find that our probably
of error is 4.5%, so we're
doing better than the
spec that we had.
So this tells us that maybe
we have some leeway.
Maybe we can use a smaller
sample size and still stay
without our specs.
Let's try to find how much
we can push the envelope.
How much smaller
can we take n?
To answer that question, we
need to do this kind of
calculation, essentially,
going backwards.
We're going to fix this number
to be 0.05 and work backwards
here to find--
did I do a mistake here?
10,000.
So I'm missing a 0 here.
Ah, but I'm taking the square
root, so it's 100.
Where did the 0.02
come in from?
Ah, from here.
OK, all right.
0.02 times 100, that
gives us 2.
OK, all right.
Very good, OK.
So we'll have to do this
calculation now backwards,
figure out if this is 0.05,
what kind of number we're
going to need here and then
here, and from this we will be
able to tell what value
of n do we need.
OK, so we want to find n such
that the probability that Z is
bigger than 0.02 square
root n is 0.05.
OK, so Z is a standard normal
random variable.
And we want the probability
that we are
outside this range.
We want the probability of
those two tails together.
Those two tails together
should have
probability of 0.05.
This means that this tail,
by itself, should have
probability 0.025.
And this means that this
probability should be 0.975.
Now, if this probability
is to be 0.975, what
should that number be?
You go to the normal tables,
and you find which is the
entry that corresponds
to that number.
I actually brought a normal
table with me.
And 0.975 is down here.
And it tells you that
to the number that
corresponds to it is 1.96.
So this tells us that
this number
should be equal to 1.96.
And now, from here, you
do the calculations.
And you find that n is 9604.
So with a sample of 10,000, we
got probability of error 4.5%.
With a slightly smaller sample
size of 9,600, we can get the
probability of a mistake
to be 0.05, which
was exactly our spec.
So these are essentially the two
ways that you're going to
be using the central
limit theorem.
Either you're given n and
you try to calculate
probabilities.
Or you're given the
probabilities, and you want to
work backwards to
find n itself.
So in this example, the random
variable that we dealt with
was, of course, a binomial
random variable.
The Xi's were Bernoulli,
so the sum of
the Xi's were binomial.
So the central limit theorem
certainly applies to the
binomial distribution.
To be more precise, of course,
it applies to the standardized
version of the binomial
random variable.
So here's what we did,
essentially, in
the previous example.
We fixed the number p, which is
the probability of success
in our experiments.
p corresponds to f in the
previous example.
Let every Xi a Bernoulli
random variable and are
standing assumption is that
these random variables are
independent.
When we add them, we get a
random variable that has a
binomial distribution.
We know the mean and the
variance of the binomial, so
we take Sn, we subtract the
mean, which is this, divide by
the standard deviation.
The central limit theorem tells
us that the cumulative
distribution function of this
random variable is a standard
normal random variable
in the limit.
So let's do one more example
of a calculation.
Let's take n to be--
let's choose some specific
numbers to work with.
So in this example, first thing
to do is to find the
expected value of Sn,
which is n times p.
It's 18.
Then we need to write down
the standard deviation.
The variance of Sn is the
sum of the variances.
It's np times (1-p).
And in this particular example,
p times (1-p) is 1/4,
n is 36, so this is 9.
And that tells us that the
standard deviation of this n
is equal to 3.
So what we're going to do is to
take the event of interest,
which is Sn less than 21, and
rewrite it in a way that
involves the standardized
random variable.
So to do that, we need
to subtract the mean.
So we write this as Sn-3
should be less
than or equal to 21-3.
This is the same event.
And then divide by the standard
deviation, which is
3, and we end up with this.
So the event itself of--
AUDIENCE: [INAUDIBLE].
Should subtract, 18, yes, which
gives me a much nicer
number out here, which is 1.
So the event of interest, that
Sn is less than 21, is the
same as the event that a
standard normal random
variable is less than
or equal to 1.
And once more, you can look this
up at the normal tables.
And you find that the answer
that you get is 0.43.
Now it's interesting to compare
this answer that we
got through the central limit
theorem with the exact answer.
The exact answer involves the
exact binomial distribution.
What we have here is the
binomial probability that, Sn
is equal to k.
Sn being equal to k is given
by this formula.
And we add, over all values for
k going from 0 up to 21,
we write a two lines code to
calculate this sum, and we get
the exact answer,
which is 0.8785.
So there's a pretty good
agreements between the two,
although you wouldn't
call that's
necessarily excellent agreement.
Can we do a little
better than that?
OK.
It turns out that we can.
And here's the idea.
So our random variable
Sn has a mean of 18.
It has a binomial
distribution.
It's described by a PMF that has
a shape roughly like this
and which keeps going on.
Using the central limit
theorem is basically
pretending that Sn is
normal with the
right mean and variance.
So pretending that Zn has
0 mean unit variance, we
approximate it with Z, that
has 0 mean unit variance.
If you were to pretend that
Sn is normal, you would
approximate it with a normal
that has the correct mean and
correct variance.
So it would still be
centered at 18.
And it would have the same
variance as the binomial PMF.
So using the central limit
theorem essentially means that
we keep the mean and the
variance what they are but we
pretend that our distribution
is normal.
We want to calculate the
probability that Sn is less
than or equal to 21.
I pretend that my random
variable is normal, so I draw
a line here and I calculate
the area under the normal
curve going up to 21.
That's essentially
what we did.
Now, a smart person comes
around and says, Sn is a
discrete random variable.
So the event that Sn is less
than or equal to 21 is the
same as Sn being strictly less
than 22 because nothing in
between can happen.
So I'm going to use the
central limit theorem
approximation by pretending
again that Sn is normal and
finding the probability of this
event while pretending
that Sn is normal.
So what this person would do
would be to draw a line here,
at 22, and calculate the area
under the normal curve
all the way to 22.
Who is right?
Which one is better?
Well neither, but we can do
better than both if we sort of
split the difference.
So another way of writing the
same event for Sn is to write
it as Sn being less than 21.5.
In terms of the discrete random
variable Sn, all three
of these are exactly
the same event.
But when you do the continuous
approximation, they give you
different probabilities.
It's a matter of whether you
integrate the area under the
normal curve up to here, up to
the midway point, or up to 22.
It turns out that integrating
up to the midpoint is what
gives us the better
numerical results.
So we take here 21 and 1/2,
and we integrate the area
under the normal curve
up to here.
So let's do this calculation
and see what we get.
What would we change here?
Instead of 21, we would
now write 21 and 1/2.
This 18 becomes, no, that
18 stays what it is.
But this 21 becomes
21 and 1/2.
And so this one becomes
1 + 0.5 by 3.
This is 117.
So we now look up into the
normal tables and ask for the
probability that Z is
less than 1.17.
So this here gets approximated
by the probability that the
standard normal is
less than 1.17.
And the normal tables will
tell us this is 0.879.
Going back to the previous
slide, what we got this time
with this improved approximation
is 0.879.
This is a really good
approximation
of the correct number.
This is what we got
using the 21.
This is what we get using
the 21 and 1/2.
And it's an approximation that's
sort of right on-- a
very good one.
The moral from this numerical
example is that doing this 1
and 1/2 correction does give
us better approximations.
In fact, we can use this 1/2
idea to even calculate
individual probabilities.
So suppose you want to
approximate the probability
that Sn equal to 19.
If you were to pretend that Sn
is normal and calculate this
probability, the probability
that the normal random
variable is equal to 19 is 0.
So you don't get an interesting
answer.
You get a more interesting
answer by writing this event,
19 as being the same as the
event of falling between 18
and 1/2 and 19 and 1/2 and using
the normal approximation
to calculate this probability.
In terms of our previous
picture, this corresponds to
the following.
We are interested in the
probability that
Sn is equal to 19.
So we're interested in the
height of this bar.
We're going to consider the area
under the normal curve
going from here to here,
and use this area as an
approximation for the height
of that particular bar.
So what we're basically doing
is, we take the probability
under the normal curve that's
assigned over a continuum of
values and attributed it to
different discrete values.
Whatever is above the midpoint
gets attributed to 19.
Whatever is below that
midpoint gets
attributed to 18.
So this is green area is our
approximation of the value of
the PMF at 19.
So similarly, if you wanted to
approximate the value of the
PMF at this point, you would
take this interval and
integrate the area
under the normal
curve over that interval.
It turns out that this gives a
very good approximation of the
PMF of the binomial.
And actually, this was the
context in which the central
limit theorem was proved in
the first place, when this
business started.
So this business goes back
a few hundred years.
And the central limit theorem
was first approved by
considering the PMF of a
binomial random variable when
p is equal to 1/2.
People did the algebra, and they
found out that the exact
expression for the PMF is quite
well approximated by
that expression hat you would
get from a normal
distribution.
Then the proof was extended to
binomials for more general
values of p.
So here we talk about this as
a refinement of the general
central limit theorem, but,
historically, that refinement
was where the whole business
got started
in the first place.
All right, so let's go through
the mechanics of approximating
the probability that
Sn is equal to 19--
exactly 19.
As we said, we're going to write
this event as an event
that covers an interval of unit
length from 18 and 1/2 to
19 and 1/2.
This is the event of interest.
First step is to massage the
event of interest so that it
involves our Zn random
variable.
So subtract 18 from all sides.
Divide by the standard deviation
of 3 from all sides.
That's the equivalent
representation of the event.
This is our standardized
random variable Zn.
These are just these numbers.
And to do an approximation, we
want to find the probability
of this event, but Zn is
approximately normal, so we
plug in here the Z, which
is the standard normal.
So we want to find the
probability that the standard
normal falls inside
this interval.
You find these using CDFs
because this is the
probability that you're
less than this but
not less than that.
So it's a difference between two
cumulative probabilities.
Then, you look up your
normal tables.
You find two numbers for these
quantities, and, finally, you
get a numerical answer for an
individual entry of the PMF of
the binomial.
This is a pretty good
approximation, it turns out.
If you were to do the
calculations using the exact
formula, you would
get something
which is pretty close--
an error in the third digit--
this is pretty good.
So I guess what we did here
with our discussion of the
binomial slightly contradicts
what I said before--
that the central limit theorem
is a statement about
cumulative distribution
functions.
In general, it doesn't tell you
what to do to approximate
PMFs themselves.
And that's indeed the
case in general.
One the other hand, for the
special case of a binomial
distribution, the central limit
theorem approximation,
with this 1/2 correction, is a
very good approximation even
for the individual PMF.
All right, so we spent quite
a bit of time on mechanics.
So let's spend the last few
minutes today thinking a bit
and look at a small puzzle.
So the puzzle is
the following.
Consider Poisson process that
runs over a unit interval.
And where the arrival
rate is equal to 1.
So this is the unit interval.
And let X be the number
of arrivals.
And this is Poisson,
with mean 1.
Now, let me take this interval
and divide it
into n little pieces.
So each piece has length 1/n.
And let Xi be the number
of arrivals during
the Ith little interval.
OK, what do we know about
the random variables Xi?
Is they are themselves
Poisson.
It's a number of arrivals
during a small interval.
We also know that when n is
big, so the length of the
interval is small, these Xi's
are approximately Bernoulli,
with mean 1/n.
Guess it doesn't matter whether
we model them as
Bernoulli or not.
What matters is that the
Xi's are independent.
Why are they independent?
Because, in a Poisson process,
these joint intervals are
independent of each other.
So the Xi's are independent.
And they also have the
same distribution.
And we have that X, the total
number of arrivals, is the sum
over the Xn's.
So the central limit theorem
tells us that, approximately,
the sum of independent,
identically distributed random
variables, when we have lots
of these random variables,
behaves like a normal
random variable.
So by using this decomposition
of X into a sum of i.i.d
random variables, and by using
values of n that are bigger
and bigger, by taking the limit,
it should follow that X
has a normal distribution.
On the other hand, we know
that X has a Poisson
distribution.
So something must be wrong
in this argument here.
Can we really use the
central limit
theorem in this situation?
So what do we need for the
central limit theorem?
We need to have independent,
identically
distributed random variables.
We have it here.
We want them to have a finite
mean and finite variance.
We also have it here, means
variances are finite.
What is another assumption that
was never made explicit,
but essentially was there?
Or in other words, what is the
flaw in this argument that
uses the central limit
theorem here?
Any thoughts?
So in the central limit theorem,
we said, consider--
fix a probability distribution,
and let the Xi's
be distributed according to that
probability distribution,
and add a larger and larger
number or Xi's.
But the underlying, unstated
assumption is that we fix the
distribution of the Xi's.
As we let n increase,
the statistics of
each Xi do not change.
Whereas here, I'm playing
a trick on you.
As I'm taking more and more
random variables, I'm actually
changing what those random
variables are.
When I take a larger n, the Xi's
are random variables with
a different mean and
different variance.
So I'm adding more of these, but
at the same time, in this
example, I'm changing
their distributions.
That's something that doesn't
fit the setting of the central
limit theorem.
In the central limit theorem,
you first fix the distribution
of the X's.
You keep it fixed, and then you
consider adding more and
more according to that
particular fixed distribution.
So that's the catch.
That's why the central limit
theorem does not
apply to this situation.
And we're lucky that it
doesn't apply because,
otherwise, we would have a huge
contradiction destroying
probability theory.
OK, but now that's still
leaves us with a
little bit of a dilemma.
Suppose that, here, essentially
we're adding
independent Bernoulli
random variables.
So the issue is that the central
limit theorem has to
do with asymptotics as
n goes to infinity.
And if we consider a binomial,
and somebody gives us specific
numbers about the parameters of
that binomial, it might not
necessarily be obvious
what kind of
approximation do we use.
In particular, we do have two
different approximations for
the binomial.
If we fix p, then the binomial
is the sum of Bernoulli's that
come from a fixed distribution,
we consider more
and more of these.
When we add them, the central
limit theorem tells us that we
get the normal distribution.
There's another sort of limit,
which has the flavor of this
example, in which we still deal
with a binomial, sum of n
Bernoulli's.
We let that sum, the
number of the
Bernoulli's go to infinity.
But each Bernoulli has a
probability of success that
goes to 0, and we do this in a
way so that np, the expected
number of successes,
stays finite.
This is the situation that we
dealt with when we first
defined our Poisson process.
We have a very, very large
number so lots, of time slots,
but during each time slot,
there's a tiny probability of
obtaining an arrival.
Under that setting, in discrete
time, we have a
binomial distribution, or
Bernoulli process, but when we
take the limit, we obtain the
Poisson process and the
Poisson approximation.
So these are two equally valid
approximations of the binomial.
But they're valid in different
asymptotic regimes.
In one regime, we fixed p,
let n go to infinity.
In the other regime, we let
both n and p change
simultaneously.
Now, in real life, you're
never dealing with the
limiting situations.
You're dealing with
actual numbers.
So if somebody tells you that
the numbers are like this,
then you should probably say
that this is the situation
that fits the Poisson
description--
large number of slots with
each slot having a tiny
probability of success.
On the other hand, if p is
something like this, and n is
500, then you expect to get
the distribution for the
number of successes.
It's going to have a mean of 50
and to have a fair amount
of spread around there.
It turns out that the normal
approximation would be better
in this context.
As a rule of thumb, if n times p
is bigger than 10 or 20, you
can start using the normal
approximation.
If n times p is a small number,
then you prefer to use
the Poisson approximation.
But there's no hard theorems
or rules about
how to go about this.
OK, so from next time we're
going to switch base again.
And we're going to put together
everything we learned
in this class to start solving
inference problems.
