So, we were discussing convergence of characteristics
functions. So, 1 theorem 1 result
we said was, if ... So, this we proved yesterday.
So, X n converges to X distribution, then
C X n of t converges to C X of t for all t
right, this is the theorem we proved yesterday.
So, converge and distribution necessarily
implies; the convergence of characteristic
function. This we proved using skorokhod and
dominery convergence theorem right, we
did this proof.
Now we were talking about the opposite; the
converse. So, if you were, if you is it true
that if the sequence of characteristic function
goes to has a certain limit, is it true that
you
have convergence in distribution right. So,
this is the result we were talking about.
So, it
turns out. So, I will state it properly. So,
let X n be a sequence of random variables
with
characteristic functions C X n of t and let
X be a random variable with characteristic
functions C X of t. if C X n of t converges
to C X of t for all t, then X n converges
to X n
.distribution.
So, what this theorem says; I think I did
not stated very precisely into tend to the
last
class. This is correct statement. So, if you
have a sequence of characteristic functions,
which converges to the characteristic functions
of another random variable, then you
have convergence in distribution. So, the
catches that you may have the sequence of
random variable, sequence of characteristic
functions, converging some functions of t
which is not a valid characteristic functions,
it may happen that way. Now what do I
mean? Not a valid characteristic functions.
So, the 3 properties, so characteristic functions
have to have 3 defining properties right.
The 1 is that, it should absolute value is
nor equal to 1. Then you must have absolute
uniform continuity and you must have non negative
Kernel right, every characteristic
functions is a non negative Kernel. See if
these 3 properties, even one of them is not
satisfied by the limit function, it will not
be a valid characteristic functions and there
is a
no question of convergence in distribution.
However, if the limit is a valid characteristic
functions, namely if u verify the C X the
limit satisfies this 3 fundamental properties,
then you have convergence in distribution.
Try constructing an example of a sequence
of
characteristic functions, it converges to
something that is not a characteristic functions,
it
is fairly easily. So, that is that.
So, now, actually there is a more refined
theorem called continuity theorem, which says
that, you do not even have to check all 3
properties of the limiting function right.
Here
what we are saying is that, if a sequence
of characteristic functions goes to some
function, you have to verify the 3 basic properties
of a characteristic functions, to verify
whether or not it is eligible at characteristic
functions. It turns out that, you do not have
to that, you do not have to work so hard,
you do not have to verify all 3 conditions.
In
fact, it turns out; it is enough to verify
continuity of the limit function at t equal
to 0.
So, if your limit function is continues at
t equal to 0, you are guaranteed that, you
have
convergence in distribution. So, that is because.
So, I will ... So, it is somewhat involve
theorem, it is a fairly sophisticated theorem,
it involves against tools from harmonic
analysis complex and all this and all.
..
Then I will tell you fundamental reason, it
may be I will state it and then explain. So
it is
continuity theorem for convergence of characteristic
right. So, this it says. So, let X n be
a sequence of random variables, with characteristic
function C X n of t, this is the
sequence of characteristic functions.
..
And suppose limit n 10 into infinity C X n
of t exists for all t then either, so either
one of
the following is true. Actually then exactly
one of the following is true. So, let us call
this some C of t. So, this limit exists for
all t, I am calling this function C of t.
Of course,
this C of t I am not saying that it a valid
characteristic function, I am going to say
more
about it. If, C of t is discontinuous at t
equal to 0, then one of the following is true
right.
So, then I said C of t is discontinuous and
t equal to 0. And in this case 
X n does not
converges in distribution. The other possibility
is C of t is continues at t equal to 0. And
in this case, C of t there is a valid characteristic
function of some random variable X and
X n necessarily converges distribution to
X.
So, this theorem says that, this limit sum
C of t. So, there are only 2 possibilities
right.
So, in C of t, you check for continuity at
equal to 0. If it so happens C of t discontinues
t
equal to 0; the limit is discontinues at t
equal to 0, the C of t cannot possibly a
characteristic function because, characteristic
function are continuous. In fact, they are
uniformly continuous right. So, C of t cannot
be a characteristic function. And in this
case, you can conclude that X n does not converging
distribution to anything, which
means; affects in does not converge.
The other possibility is; if you have continuity
t equal to 0; that is all you need, then it
.automatically becomes a characteristic function
and or some random variable. And you
have then converging distribution. That is
because; when you have if the limit is
characteristic function, then you have convergence
in distribution. So, why do you think
this is true? So, intuitively without going
into the mathematics of the proofs, why do
you
think this is true? So, I am saying that,
if C of t has to be a valid characteristic
function, I
need to satisfy the 3 basic properties right,
but, I am saying now that is enough to check
for continuity at t is equal to 0, not even
for other values of t. That is enough for
continuity, not even uniform continuity for
or non negative kernel or so on.
So, the C of t, it could it might happen that,
the C of t is continuous at t equals to 0,
but,
does not have other properties of the characteristics
function right. But, this theorem is
saying that is all possible, if you have continuity
is equal to 0, you guarantee that is a
valid characteristics function right. Do you
have any guesses on how this is happening?
Yes that is the reason. So, because this C
of t is not some arbitrary function of t right,
it is
obtained as a limit of a sequence of characteristics
functions.
So, these guys already have a lot of structure
to them, they have all the properties of the
characteristics function. So, whatever limit
it has, I know it is not some limit if some
arbitrary sequence of function right, it already
has the structure to it. So, it all turns
out to
that, if you just verify continuity at t is
equal to 0, all the other structures will
fall into
place uniform continuity and all, that will
work out automatically because, the C of t
is
not any whole function of t, it is obtained
as a limit of a sequence of a characteristics
function, which already have a lot of structures.
So, this bit is very useful.
So, we take the limit of some sequence of
characteristics functions and I only verify
continuity at t is equal to 0, then immediately
as set convergence in distribution. So, this
theorem will be very instrumental in proving
center limit theorem. Any questions on
these? So, I forgot this 1 result that is
outstanding still. So, I should have done
this last
class, the previous class.
..
There is 1 result that I have forgot to do,
which I have just mentioned there. This says;
this is about convergence math me k this is
something that I must have done in the
previous class. Theorem X l converges X in
the earth mean implies; X l converges in the
… if r is bigger than s bigger or equal
to 1. So, this result basically says that,
if you have
convergent, convergence let us say convergence
in mean square r equal to 2, then you
have necessarily have convergence in the mean.
Now it says they are convergence of a
bigger value of r; that converges in smaller
value of s. So, this is something I forgot
to it
says at the beginning right.
So, the proof of this, actually proof of this
follows in a straight forward way, from a
inequality called inequality. So, it follows
from inequality and that brings me to a
digression, about a certain geometric in equality,
which I have not done so far. So, I will
briefly for 5 minutes, I digress into a 3
inequalities, I will just mention these inequalities.
I will not have class time to prove them,
but I will probably refer you to a probably
upload some profiles on the proofs. So, let
me put these inequalities doubt, there is
3 of
them. One is called holders inequality, one
is called Minkowski inequality and then
inequality 1 equality is up.
..
So, this is the digression. So, this is what
about convergence. So, I want to mention
holder’s inequality. This says if p and
q is greater than 1 and 1 over p plus 1 over
q equal
to 1, then expectation of absolute X Y less
than equal to expectation of. So, expectation
of X to the p over to the 1 by p times, absolute
Y to the q over this to 1 q. So, this is
holder’s inequality. So, this generalizes
the inequality. So, this is for p equal to
q equal to
2. So, this generalizes that.
So, the proof is quite short, but, again you
have to do it in a particular way, otherwise
you will never get. If it you do, it in a
you have to know how to do it and then it
comes in
3 steps, I will point you to some reference
on this proof. So, that is holder’s inequality
and it is something called Minkowski inequality.
See all these inequalities are highly in
geometric in nature, as I will explain shortly
right. In fact, this is the generalization
of
right. So, it is geometric. Minkowski rather
says; if p is greater than or equal to 1,
then
expected X plus Y to the p raised to 1 over
p is less than or equal to expected value
of X
to the p to the 1 over p plus expected value
of Y to the cube Y to the p to 1 over p.
Finally, says that, if r is greater than s
is greater than or equal to is 1, then expectation
of
X 
to the s over to the 1 over s less than or
equal to expectation of absolute of X to the
r to
the 1 over r, does it make sense or is it
the other way? So, I should imply; this result.
So,
.I think this is correct. So, the brief note;
so this Minkowski inequality, if you look
at it, it
tells you, so again. So, this basically tells
you that, if you take the p’th moment and
raise
to the 1 over p, that satisfies triangle inequality
well, that is essentially what you said
correct.
So, this Minkowski inequality essentially
asserts that, this object expectation to the
whole to 1 over p. So, this is acts like a
norm, it can actually show be shown using
Minkowski inequality that, this object acts
like a norm in this phase of l p random,
random for the p’th moment is finite, 0
and which the p’th moment is finite. So,
we
know what we have mentioned earlier that,
the standard deviation behaves like a norm
right. And we used to mention the expectation
of X Y behaves like a you know a product
right in a specialized random base. This actually
is more generally true that the l p the
space of random arrogates with the finite
p th moment, this behaves like a norm and
it
satisfies the triangle inequality. And this
guy which holders inequality, which is also
true
in space where in holders inequality, you
can easily prove holders inequality in space
for
vectors in r l. It shows that, this is like
a generalization ward in some sense and it
gives
you a relationship between these 2 dual norm;
this p’th norm and q’th norm are dual
norms, if this is satisfied, and so exactly
at this inequality works in space.
So, again this again you are saying that,
in the space of l p random which has both
p’th
and q’th moment finite, that dual norm satisfies
this right these. So, things are to in
space. So, that was just a digressive you
may encounter these inequalities and they
are
useful to know and this in particular gives
a gives that, this is a norm in l p this 1
is in l p.
So, that completes. So, that completes what
I had to say that completes what I had to
say
about generally about convergence. So, what
remains then is just a limit theorems of 2
or
3 major theorems and then we will be done.
So, in particular we will do weak law large
numbers and strong law large numbers and this
central limit theorem. These 3 are very
fundamental results.
..
So, we will do laws of large numbers 
and central limit theorem. So, here we will
do
weak law and strong law, so these both laws
of large numbers. So, laws of large numbers
I have said in plural because it is not 1
theorem, because it is like a family of theorems.
In fact, similarly, I should say because central
limit theorems is not a 1 theorem. It refers
to a family of theorems. We will only do some
very specific version of it for these for
the
iid case. These 2 families of theorems really
are at the core of theory, they are very
important theorems. And in fact, 1 there is
no exact edition to say that the laws of large
numbers are really the back bone of polymeric
theory.
So, if you have to choose a theorems or a
family of theorems that make worth, it is
the
laws of law of large numbers. That is because,
the laws of large numbers essentially,
they give you an interpretation of the expected
value, the expectation of X as an average,
as an average value. So, far nothing we have
said about expectation of X which is, we
just defined as integral X d p, it is just
a number right. At the back of your mind,
you
might have this notion, that it is some average
value or something, but, we have not said
anything about it. I have kept saying that,
so far it is just a number for you. It is
the law
of large numbers, which gives the expected
value of a random variable, the operational
meaning or the capital meaning the as an average.
.So, if the law of large numbers where to
know, nobody would study problematic theory
because, whole this frequent interpretation
of probability that, if you toss a coin a
million
times, fraction p would head or whatever right.
So, that is essentially because of the law
of large numbers. So, it is a law of large
numbers well. In fact, the back bone of
problematic theory, without it nobody would
study the theory that, then the entire theory
would be useless right. And central limit
theorems, in some they establish the importance
of the. ..
So, central limit theorem basically says;
the family of the central limit theorems,
they say
that, in the space of finite variance random
variables, the is the is like a attracter,
it is
what it says is; if you add a large number
of finite variance random variables, no matter
what the distribution, the some will look
roughly like a in the solution. So, in some
sense
the c l t establishes the as in some sense
the king of the as of the boast preeminent
distribution in the finite variance world.
So, that is the high level picture of this.
I will
not let the ideas poke it out. So, that is
why I will not put down the results more properly.
Any questions at a very high level? So, any
questions?
.
.So, let me let us first deal with the weak
law of large numbers. Let X 1 X 2 dot dot
dot
be independent identically distributed random
variables, with expected value 
expectation
of X which is finite. So, we are saying that
the expected value exists, your iid random
variables, for which the expected variables
exists and it is finite.
.
Then, so let S n is equal to … 
then S n over n converges to expectations
of x in
probability. So, this you have iid random
variables with finite expected value,
.expectation of X. And you are looking at
the sample average. So, you are looking at
sum
over i equals 1 through 1 X i over n. So,
have your this iid realization of this random
variable and you are taking the sample average
right, you are adding the first n and
dividing by n, you are just taking the statistical
of sample average of the random variable
X size. And the law of the weak law of large
numbers says that, this sample average
converges in probability to expectation of
X in probability. And it is in this sense
that,
the law of large number says; the sample average
is sample average converges to the
statistical average, that is why this expectation
of X as the interpretation as the statistical
average, the average value of the random variable.
So, this weak law of large numbers is fairly
old. In fact, the first the first published
proof
of it goes back to Jacob Bernoulli. And it
was published first in this, after he died
in the
year 1713. So, it is almost exactly 300 years
old, that is a old result, it is not a modern
result it is not. And most of that we have
done is modern problematic theory, you get
all
these measures and so on. So, this is … So,
in, so if you want a more explicit statement.
So, it says that limit n tending to infinity
probability that, S n over n minus expected
X
greater than equal to 0 for all greater than
0. So, as n becomes large, the probability
of
the sample average been different from the
expected value goes to 0, the n becomes
large. So, this means that with probability
close to 1, if you just look at S n over n,
it
takes value close to the expectation of X
right. And this you can choose any you want
10
power minus 6 and you will find a larger of
n for which this probability equals to 0 right,
for the large reference probability will be
very close to 0 will be 0. So, this is the
weak
law of large numbers.
Now maybe I should just take the strong law
also because, if you prove in both. The
strong law on the other hand is a modern result.
The strong law of large numbers says,
under exactly this say under these conditions,
these convergence is almost sure, where
there is the only difference between weak
law and strong law. The only difference is
here
instead of i p you get a s and no further
assumptions. So, strong law subsumes the weak
law because, you almost convergence in place
using probability. So, 1 might ask why
you do you even have to bother about weak
law. So, it is for historical reasons.
So, if people figured it out 300 years ago,
whereas it took all these measure theoretic
.machinery. So, and proved a special case
of it at 19 or 9 or something then proved
the
general strong law right. This is strong law
of as we know it today is about 100 years
old
right. So, this is about 300 years old. So,
for 200 years people dint know that a stronger
result is possible. And that is because they
dint understand this measure theoretic tools
properly right, that kind of only in the early
1900s. Strong law of large numbers; if X i
or
same story X i greater than or equals to 1
are iid random variables with finite move.
So,
expectation exists and its finite, then S
n over n converges to expectations of X 
almost
surely of probability 1.
.
So, we are saying to this is the sequence
of random variables which converges to … So,
you should look at this not as a constant,
it is a random variable which always takes
the
value expectations of X k this is; obviously,
I have sequence of random variables
convergent to random variables. But, that
random variable is a constant here, that is
the
correctly interpreted, but this convergence
is almost sure. So, in order to write, if
you
want it to write it a little better i e, the
probability of omega for which S n of omega
over
n converges to expectation of x is equal to
1. So, this is this is a very different statement
from this we. In fact, know this is the stronger
statement right.
..
So, if you look at this, it is a statement
about sample parts samples, the sample omega.
And this is not a statement about, this statement
saying that S n over n is close to
expectation of X for larger, that is all this
is saying. But, this is saying that, the sequence
S n of n as a function of omega converges
to expectation of X for almost all values
of
omega for probability 1 essentially right.
So, remember, so I will help you interpret
the
statement a little bit, it is very important
to understand this statement.
So, you remember that all these exercises
are living in form of probability space omega
f
p right. So, all these guys are omega f p
right. And the moment nature fix a little
omega,
all these random variables of omega realize
as real numbers. And then; obviously, S n
of
omega over n also realizes as sequence of
sequence of real numbers well. The sequence
of real numbers S n of omega over n, converts
first of all its just some sequence of real
numbers for every omega. See change omega,
sequence will take some other value
So, what this theorem is saying the strong
law of large number is saying is; you can
imagine let us say you will have 3 buckets.
So, you pick an omega and you check if this
sequence converges expectation of X. So, if
this sequence does not converge at all, put
that omega in the first bucket. If it converges,
but, converges to some value or other than
expectation of X, it could go to any other
value, put it to the second bucket. If it
.converges to expectation of X, put it in
the third bucket. We are partitioning the
sample
space very defectively.
So, in some sense what you are saying is;
so let that omega the sequence does not
converges, that to say. And for those omegas,
the sequence converges, but, not to
expectation of X. And for these omegas, the
sequence converges to expectation of X.
What is the strong law of large numbers is
saying is that, these 2 have probability 0
and
all the probability is here. So, the moment
omega realizes, with probability only you
get
a sequence whose average value converges to
expectation of X and the probability of
either the sequence not converging at all,
or converging to some value other than
expectation of X is 0.
So, that is statement of this. That is what
this means. There is very difference in this,
you
see. And we in fact, know an equivalent characterization
of convergence, which is also
worth stating here and what is it right. We
did a theorem about equivalence of
convergence to excursions happen in the beyond
line right. That is that is what is also is
equivalent to saying this.
.
Equivalently you have to help me out here
I think. Probability of union right n greater
.than or equal to m omega for which 
S n over n minus expectation of X, creating
an of is
it right. Or the another way of writing it
is; limit n tending to infinity, probability
that sup
n greater than or equal to m S n over n minus.
Or I mean the same as this you have got a
sup here it is the same thing right. So, this
statement is equivalent to this statement
right
because, we proved it like; it is just equivalence
of convergence to this.
So, again to emphasize; this says the both
exactly the same like, whether you write the
sup or you write the union is the same, so
S n of omega. So, this says that, as if as
fix a n,
if you fix a large n, probability of an excursion.
So, if you look at the difference between
sample average and expectation of X, the difference
exceeding is like an n. So, the
probability of having even 1 excursion after
m from goes to 0, whereas, this convergence
in probability says; the probability of an
extra excursion at n goes to 0 right, this
is
saying probability of an goes to 0. So, this
is stronger, you have seen this already.
So, are there any questions on in these 2?
Did you understand the difference between
these 2? So, if you strong law of, if you
strong law implies the weak law right. Some
senses, weak law is completely subsume. So,
if you quit the strong law, you quit the
weak law. But, on the other hand, weak law
is much easier, which was done 300 years
ago and this is much harder, today it was
done you know. This is a measure theoretic
statement. We are looking at a measure of
all omegas, where this convergence happens.
So, I think I will stop here. Good place to
stop. So, next class we have to prove these
results and then I will have 1 class for center
limit theorem. So, I think that is; this is
about right.
.
