This is lecture two, so
we will continue with
Geometry of High Dimensions,
that's what we are going to
do for the first two weeks.
I'm going to recap in one
slide before I start, right?
This is what we did
already last time.
And if you have questions
on this, you can ask me.
Otherwise, we'll start doing
new stuff after the slide.
So I didn't give
you many examples.
But I just told you that in
a lot of situations, data points
are just vectors in Rd,
d dimensional real space, right?
Each component is a real number,
postive or negative.
And viewing them as
vectors is useful.
It's not just a bookkeeping
device to keep a list
of things, right?
There's some components.
Again, that's not something that
we will delve into in examples,
but that's something
I told you last time.
What our main focus
is to see that for
large number of dimensions,
which is the case for
a lot of applications datas,
if you had documents.
We saw the example of documents
being put down as vectors
that are as many components as
the words in the language and
also web pages, right?
With that being described as
the main piece, and then there
are as many components as there
are web pages in the billions.
So d is very large,
things behave differently,
that was our starting point.
And we wanted to see some
important differences, so
one of them was that.
I told you, and
we'll prove that today, that
the volume of this hypersphere.
I keep, in exchanging
these two terms but
they mean the same thing.
The unit ball hypersphere
all mean the same thing.
Is the set of x in Rd such that
the length of x is less than or
equal to 1.
And in the slides,
vectors will be bold faced.
Right, here I can't do that,
but.
And I'll try to do that.
Keep the convention that vectors
are bold faced and scalars
are not bold faced, right,
I mean, but not on the board.
Okay, we'll see that,
we'll also, this other fact
we saw this already, so since
the recap just mentioning it.
Most of the volume of any d
dimensional object is near
the surface.
Not just for spheres but for
everything we approved already.
And there's no big deal in
the proof except just a sort of
calculus thing that you
divide an object into
tiny infinitesimal cubes and
integrate it, right?
And if you scale a cube
down by 1- epsilon factor,
then you scale the side
down by the factor,
it's still a cube
that's worth checking.
And furthermore the volume,
because it's three dimensional,
just gets scaled by this.
So that's a lot of serious
scaling in some sense.
Because these large,
this is the dth power, right?
So if you go to 0.9,
epsilon is 0.1, this is 0.9,
you get 0.9 to the d
which is very small.
Exponentially small, okay?
That's why the volume is
mostly near the surface.
So we saw these two things.
Now I'm gonna start proving
some of the other statements
that we didn't prove.
So I'll prove today first, we'll
prove a bunch of things that
the volume of the d dimensional
unit ball, we'll find it,
we'll actually more than
prove that it goes to zero,
we'll actually find it.
So it is this integral, right?
I mean, if I'm writing
down the integral.
So that's a little bit of an odd
notation, because it's d of x1,
x2 and so on, because there
are d dimensions, d of xd.
Bare with that, so, of course,
x1 goes from -1 to plus 1.
And x2 goes one set of x1,
x2 squared can be at most 1- x1
squared, because we
are inside the unit sphere.
So these are the limits, this
is the integral and yesterday,
I said try integrating it and
I hope you did, right?
I mean, I don't know
how many of you did but
you get sin to various powers,
right?
Sin and cosine to d powers or
more if you did it.
But if you didn't,
you spent that sort of in vain
because we are going to do
something better just to sort of
give you a feel for why it was,
multi-dimensional integral
is a bit of a hassle, right?
So instead,
we'll go to polar coordinates.
Standard thing that we
do in three dimensions.
And I'm going to draw a little
picture on the board.
Now in polar coordinates,
I'm going to write this down.
It seems completely obvious,
except I'm going to try to
impress upon you that this
is not completely
automatic in all cases.
So you have to be a little
more careful, but
let's draw that picture, right?
In two dimensions or
three dimensions,
2D is all we can draw.
The picture is this, right?
I guess, I said what is d omega?
D omega is
an infinitesimal surface
element, And r is the radius.
Polar coordinates, right?
So r is the radius.
So I'm integrating
from r = 0 to 1
because the whole ball
of the radius is 1.
So at the particular r,
and this is the r.
Okay, so we want to take
this infinitesimal area,
surface area.
Surface area really is a d-
1 dimensional volume, right?
If this was a 2D,
if we're really doing it in 2D,
it's the length of the curve.
If we're doing it 3D,
it's the area of the surface.
If you're doing it in 4D,
it's the three dimensional
volume of this, and so on.
So if you're doing it in 10D,
this is a nine
dimensional object.
Right, so
this is that scaled by r, right,
just scaled by r,
r is less than 1 in this case.
Therefore scaling by r, our d- 1
dimensional object scales volume
by that much, right?
So that's the area of that,
okay?
And I'm multiplying by dr to get
this infinitesimal volume, okay?
So this is times dr, okay?
Is there anything I have to be
a little careful about in doing
that or
is that completely automatic?
Is it sort of true that,
I mean, it's true but, okay, so
let me ask you this question.
Instead, suppose I had
a square in two dimensions and
I did the same thing, right?
This is a cube, this is r
equals 1 and this is radial.
So here is another, this is r,
this is r equals 1.
This is d omega again.
And is it still d omega
times r to the d minus 1?
I'm sitting in d-space.
So first question,
is the infinitesimal
element r to the d
minus 1 times d omega?
It is, actually.
That's not a problem.
Is it true that,
this is equal to r to
the d minus 1 d omega dr?
Okay, I won't give
you the answer, but
assume that that is true,
do an integral.
You get the wrong area for
the square even in 2D.
So you know it's wrong.
Okay, these are the same
picture more or less.
That is wrong and
this is correct.
Why is that?
[INAUDIBLE]
>> Right, but now it's, okay.
>> [INAUDIBLE]
>> Okay, so
yeah, what's the reason
this is through?
The dr here is orthogonal to the
surface, not so there, right?
So in two and three dimensions,
we are familiar with this, but
in d dimensions,
the volume of this would be
equal to this times that,
provided they're orthogonal.
You must take only the height,
right?
You must take the height
just as you do in two to
three dimensions.
So d-dimensional volume of
an infinitesimal volume
is infinitesimal surface
area times height, but
the perpendicular height.
It cannot be an angular height.
So, as I said, you get the wrong
answer if you do this and
integrate even in 2D.
You won't get the right answer.
So I'll come back to this point
once more because in 2 and 3D,
we are familiar with this,
right?
But in higher dimensions,
it may look all right, but
it's not all right.
Here they are perpendicular.
The radial direction is,
in fact,
octagonal to the tangential
direction, right?
So this is a d- 1 dimension.
Okay, maybe in 3D, this is a
two-dimensional surface, right?
There's only one normal,
one-dimensional normal
with the space, right?
And that's just orthogonal,
radial direction.
So that's why that's correct for
the sphere, but
not necessarily always.
Okay, good.
So far, for the ball, it's true.
These don't interact, so
I can move the integrals around.
So I get r to the d- 1 dr,
d omega comes out.
This integrates to 1 / d,
of course.
And d omega integrates to A(d),
where A(d) is
a surface area of the d-
1 dimensional unit ball.
So d-dimensional unit ball,
the surface is d- 1 dimensional,
okay?
So all we have done is related
volume to surface area, okay?
So what we have done, maybe it's
worth writing on the board,
is that the volume = 1
/ d times surface area.
And for the cube,
that's not true.
So this argument fails,
that's not true, okay?
But, thinking about one thing,
this argument does give
something, some relationship
between these two.
What does it give?
So does it give you anything
about volume and surface area?
Sorry, less than, correct, yeah.
It does tell you that volume
is at most surface area
divided by d, and
that's true for the cube.
I mean, so you know that for
the square, if the size is 2,
the parameter is 8,
the area is 2 squared is 4,
dimension is 2, they're equal.
But if you go to three
dimensions, that's not true.
So you'll see that it's
less than or equal.
But it's always less than or
equal to.
Why is that?
Well, less than or equal to is
correct, but why is that true?
Really?
The height cannot be more
than the actual dr, right?
The height is
a projection of that.
So this is dr height, and
then the height is that.
Dr is the hypotenuse
of the triangle, so
the height is at most that.
So it's true that volume is at
most 1 / d times surface area.
Okay, so
this area is exactly equal.
But I'm now still left with
the problem of computing
the surface area.
And that's what I'm going to do,
okay?
Again, if you did this
multiple integration,
you'll see that that's
much more complicated.
We'll do it with
a very nice trick.
Okay, so this is a relation
between the sphere and
Gaussians, so
there are lots of relationships.
We are going to see once more
the Gaussian comes in very
naturally.
So I presume you all know what
the Gaussian density is, right?
In 1D, it's just e to the -x
squared over 2 if its standard
deviation is 1.
I've forgotten the 2 here
because I don't care about
the standard deviation.
And I've omitted
the normalizing constant.
So we're just going to consider
this integral, where the limits
are plus to minus infinity, or
minus infinity to plus infinity.
So the limits are simple.
So the integrand can be
factored, so you get this.
And this is root pi.
So if you've
downloaded the book,
there's a proof that
this integral is root pi.
I guess I won't do that,
just the definite integral.
For instance,
if you want to compute
the normalizing constants for
the Gaussian, that's
the integral you want, right?
So you get that I(d) is this.
So in the appendix of our book,
we have a proof of that,
find that thing.
I said, do on the board,
but maybe not, okay?
Now I'm going to write this I(d)
now in polar coordinates, okay?
Okay, so, again, it's worth
going to that picture.
So now I'm trying to
integrate e to the- length of
x squared, okay?
So that quantity here is
e to the -r squared, and
you still have r to the d- 1.
The value of the function
is e to the -r squared.
The area is r to
the d- 1 d omega dr.
Again, we remove the d omega,
and
then I have to do this integral,
okay?
So this integral you need to
do if you want to compute
the d- 1 moment of the Gaussian,
right?
So that's what that is doing,
and you do that.
So I'm gonna go over not all the
details, but I'll tell you some
of the substitutions
just to get this right.
So it's a gamma function,
basically, so
you put r squared = t.
You substitute r squared = t and
you'll get that,
provided I didn't
make a mistake.
So you get this integral.
This is what's called the gamma
function, this gamma of d / 2.
I guess I did not say that here,
gamma is related, as you know,
to the factorial function,
right?
Gamma(n + 1) is n factorial.
But gamma is defined for
any real value.
So it's defined whether or
not you have integers.
For integers, it's a factorial,
but it's defined always
by that mechanism, right?
So gamma of something here,
d / 2,
is e to the -t, that thing,- 1.
It's a moment of the exponential
probability density, right?
e to the -t is an exponential
probability density,
it's a moment of that, okay?
So that's a way to compute I(d),
but we know that I(d) is that
from the previous slides,
so we equate these two.
That gives me A(d), and
this gives me V(d), okay?
So that's the volume and
surface area of the sphere.
So, as you see,
this is much simpler to do.
Once we connect
with the Gaussian,
then perhaps we did by
just multiple integration.
Now, the thing to observe
from this is gamma of d,
I didn't say that on the slide.
But gamma(d / 2), Okay,
is less than or equal to c
to the d d to the d / 2.
It's like the factorial
function.
So n factorial is n to the n
/ e to the n approximately.
There's a root n, but,
I think root n is upstairs.
This is Stirling's
approximation,
so that, Is going to lead us,
okay, before I go on,
I did draw the picture here.
That's just the same picture
as that I had written, but
I stressed the point that
the 90 degrees is important.
So the dr and
surface area is perpendicular,
I already said that.
But we want to be careful that
that's true before we claim
that that is the volume.
Okay, now, so
that proof is finished and
I'm going to have four to five
observations about the proof
before I go on to
the next theorem.
The first worry or
question is the following.
d dimensional unit
sphere is just,
if I take a d + 1 dimensional
sphere, cut it in the middle,
I get a d dimensional sphere.
So a d dimensional sphere is
a subset of the d- 1 dimensional
sphere.
It's just an equatorial
plane if you want, right?
So that would seem to say that
the volumes should be
in that order, right?
We know that's not true
even in three dimensions.
In three dimensions, unit ball
has volume four-thirds, right?
Or maybe it's true in three
dimensions, I might be wrong.
Four-third pi r cubed,
four-third pi, right,
for the volume, and
the area of a circle of
radius one is pi r squared.
So the volume is,
in fact, greater than.
So, okay, I was wrong there.
So I think you have to
go to six dimensions.
So V(6) is not greater than or
equal to V(5), okay?
But 2 and 3, it's true, okay?
Okay, so we know that it's
not true, ultimately.
Because, as I said, the lemma
implies asymptotically,
V(d) is c to the d divided
by d to the d / 2.
This is a gamma function, okay?
That was said in the last slide.
So this obviously goes to 0.
Not only that,
if I take V(d) / V(d + 1).
Okay, you have to be a little
more careful because it's not
a very nice function that
the limit exists so simply,
but pretend the limit exists.
This limit is what,
from that formula here?
Suppose you believe that,
then what's the limit as V goes
to infinity of V(d) / V(d + 1)?
This c / root d roughly,
but it goes to 0, right,
this goes to 0.
So obviously, that's false.
Okay, asymptotically,
V(d + 1) is less than V(d),
even though in two and three
dimensions, that's not the case.
Okay, so why are these two
not contradictory statement?
So that's worth thinking
about for a minute, right?
So again, the d dimensional
sphere is a subset of the d + 1
dimensional, yet
its volume seems to be more,
asymptotically, when
d is large enough.
So is that contradictory or not?
It's not contradictory
because it's true.
Anybody wants to think of a way,
a reason that it's
not contradictory?
>> [INAUDIBLE]
>> Yeah, so
because it's comparing apples
and oranges, exactly, right?
So comparing these two is
comparing apples and oranges.
Because, I mean, in two and
three dimensions,
this is square feet,
this is cubic feet.
So you don't say that five
square feet is less than or
equal to six cubic feet, okay?
So they're different units.
They are different quantities.
So just a sanity check.
Now, a little harder question.
So in this integration,
I used the Gaussian.
I pulled it out of the hat,
right?
So going back two slides, I just
pulled this I(d) out of the hat,
right, and then related it.
So what I did was I
computed I(d) two ways.
One way led me to a formula
involving the surface area.
Another way led me to polar
coordinates for closed-form
formula, and it related the two
to get the surface area.
So I pulled it out of the hat.
Now, is there something else
you can do besides Gaussian?
So Gaussian is e to the -x
squared, length of x squared.
This is Euclidean length, so
if you will, the 2 norm, right?
So could I have used
the 1 norm instead?
So I could try to
do the integral,
I could try to do I prime(d),
Is equal to the integral of
e to the -1 norm of x dx.
And try to relate that
to the surface area.
Can I do that?
What might go wrong if
you try to do that?
You continue with the-
>> [INAUDIBLE]
>> Yeah, okay,
it's not continuous,
but we're integrating,
not differentiating, so
that may not be a problem.
So two things happen
which are nice for
us when we use Gaussians.
And it's probably worth noting
these problems with Gaussian.
Maybe I should have said that.
So here's one thing
that happened, right?
We were able to split it
up into just a product of
one dimensional integrals,
which we know how to do.
That's fact number one.
What's the other nice
thing that happened?
>> [INAUDIBLE]
>> Constant, radially symmetric,
so we got this.
So we also got this,
where it's a function of just r.
That means it's radially
symmetric, right?
At the same r, this function and
that function has
the same value.
So the Gaussian is the only
thing that has both these
properties, where it's
separable, you can split it up,
and it's radially symmetric.
Right, so I cannot use that.
I cannot use even e to
the minus length of x,
even the Euclidean length of x.
That doesn't split, right?
It's not the product of
one dimensional integrals.
So I needed both symmetry
that's only dependent on r,
plus the splitting.
So a Gaussian is used in many
context because of these
properties, right?
It's the only thing that
has all these properties.
I forgot, yeah, I was
reminded that I should repeat
the questions from the audience
cuz they can't revert it.
So I have to repeat them
next time [INAUDIBLE].
So we saw that, we saw that.
Radial symmetry is crucial,
as well as splitting,
which I didn't write down,
I should do that.
And then this is a question
I already went through.
Is the surface area of any
d dimensional object d
times its volume?
Picture proof of
this same argument,
that is a picture proof.
As for spheres,
works for anything.
So the picture proof
seems to work.
Is this picture correct, right?
We went went through
that already.
And we did get, though, this
quantity, that's nice to know.
This is.
You should check all
these things for
small dimensional spheres, and
so on, just as a sanity check.
So now we'll prove that
the volume is near the equator.
So we're gonna prove
several of these things.
So this will be by integration.
There's no Gaussian trick here,
but it will be by integration.
We will come back to
using Gaussians for
more things a little later.
So we want to prove most
of the volume, say 99% of
the volume of the d sphere
is in the equatorial region.
And the equatorial region is
the absolute value of x1.
So above and below
the equatorial plane, right,
is less than or equal to delta.
So first intuitive
question we're gonna ask
is what delta should be and then
we'll try to prove it, right?
So here is a nice thing, this
is something that you all know,
but this is very
central to this course.
So I want to remark on this,
right?
That first question is the same
as the question that I can
rephrase in terms of
probabilities, right?
I want to know
the 99th percentile
of the absolute value of x1 when
x is picked uniformly at random.
We'll use this phrase a lot, so
I'm gonna abbreviate it u.a.r.,
uniformly at random from
the hypershpere, okay?
So again, this is extremely
intuitively simple, right, but
just think of that.
So volumes and
probabilities are related.
After all, probability
is just an integral of
probability density, and
volume is also an integral.
So you expect a relation, but
this is what we're asking.
Okay, so more intuition
of what delta should be.
So here's more intuition, right?
So the length of x squared
is this Pythogorous,
right, sum of
the coordinates squared.
We know that the expected value
of the length of x squared is
at most 1.
In fact, this length of x is
at most 1 in this sphere.
Not only is that at most 1,
we actually know that it's
approximately 1, right?
So how close is it to 1?
Do you know the answer?
What's the expected
value of x squared
from what we've already seen?
What can you put
there that's true?
1- 1 / d, right?
Because saw that
most of the volume,
it's not completely proved
here to sit and work it out.
But that probability was
falling exponentially, so
you should be able to
integrate to get 1- 1 / d.
So in fact, expected value
of x squared is 1- c / d for
a particular constant c.
You can work that out, okay?
By symmetry, all the expected
value of x1 squared must all be
equal, so each of them is 1 / d.
So that suggests that delta,
which is absolute
value of x1 without the square
could be order 1 / root d.
But we want that to hold
with high probability.
That's only expectation, right?
So these were expectations.
It doesn't imply high
probability automatically.
That tells you an intuition.
Okay, so sorry,
let's go through this.
For a general random variable,
let's consider x1 absolute value
to be a general random variable.
In a second moment,
at most 1 / d, we saw that.
That's this top, okay?
Does that automatically imply
that with high probability, high
probability means exponentially
small failure probability.
It's 1 / root d,
order 1 / root d.
That's not true, right?
General random variables
can have terrible tails.
So if you had a polynomial
density, the tail is so
long that this won't be true.
But for all the geometric
quantities, we are going to see
the tail will not be bad,
it'll be exponential.
So we already saw two
exponential bounds.
One is for the volume
of this in the surface,
where it's exponentially small.
Another one was the equator.
So things work with
very high probability.
So in fact, even though this
is not rigorously correct,
it'll be true.
By the way, one quick aside as
a recap, I'd like to prove and
will prove this,
with high probability
exponents 1 / root d.
We do know the expected value
of x1 squared is at most 1 / d.
Can you at least say this?
The expected value of
the absolute value
is at most 1 / root d.
Do we know this to be implied
from what we know already?
Anybody remember an inequality
that might give you this?
Cauchy-Schwarz, Jensen's
inequality it's called, but
proved by Cauchy-Schwarz.
So for any random variable x,
the expected value of
the absolute value
of x is less than or
equal to the expected value
of x squared, square root.
And you prove it that way.
This is Jensen's inequality,
right?
You prove it by taking square
roots or just by convexity of
the square function, or any
number of ways, but that's true.
So we do know that.
But I wanna tell you that,
in fact,
even though it's a fallacious
argument to say expectation
implied with high probability,
it's ordered that,
it's true for
geometric quantities, right?
So they are fairly concentrated.
So we'll do a proper
proof by integration.
So I should draw a picture on
the board, so I'll erase this.
So I'm going to,
So I'm going to find the volume
of the following set.
What is the set?
A is the set of y so that x1
is greater than or equal to,
Dual A, both top and bottom,
that case, I made a mistake.
There's a factor of 2 missing,
sorry.
So what I have here is
only that top part.
And it should be d -1 there,
excuse me.
So it should be c / d- 1.
So in the picture,
I'll indicate all of this.
So this is c / root
d- 1- c / root d- 1.
I'm integrating this.
I'm taking slabs and
integrating.
So this is a slab.
And this is x1.
This is dx1.
And the volume of this slab,
what is it?
So this is radius 1.
So a little geometry here,
this is radius 1.
This is just basically planar
because all dimensions
are symmetric.
And this is x1, so this is,
Square root of 1- x1 squared.
Now, I'm going to take the d-
1 dimensional volume of this,
okay?
This is a sphere of radius,
so this flat thing.
I guess I should have
drawn this picture.
It's a sphere in d- 1 dimensions
of radius root 1- x1 squared.
Right, and therefore,
that's the volume, right, so,
>> [INAUDIBLE]
>> Yeah,
the question is whether it
shouldn't be multiplied by 2.
The answer is yes.
I made a mistake and
didn't put the 2 there
because this is only computing
the top thing, right?
So I will correct this slide.
Yeah, you have to
have a 2 there, good.
So it's D minus one dimensional
sphere of radius one has this
volume, v of d minus one, but
now it is radius square root of
1- x 1 squared, so
you get that extra factor, and
times dx one, okay,
so is this correct?
Am I being careful about
the point I made, right?
Yes, dx 1 is
perpendicular to this.
Right so that's fine.
Okay, so
here's a nice standard trick,
I mean I'm gonna ell you
some of these tricks.
They're just
integration tricks but
then we use them quite often.
So then for the Gaussian
integrals these are tails, so,
when you want to compute the
tail probability of a Gaussian.
Right, the probability that x
is may standard deviations away,
this is what you have to do,
this is the interval.
So, you get x- alpha,
x over alpha,
x is great that alpha and this
is alpha by the way to infinity.
X is greater than alpha so
I can throw that.
And this is nice because
this is interval, right.
E to the- x squared to
derivative has an exponent.
So you can integrate that, so
that's how you often compute the
tail probability of Gaussian.
So that's a very good estimate
if you go more than a few
standard deviations away.
Okay, so
I have to integrate that.
Sorry, I have to integrate
that and I throw in this extra
factor, this x over
alpha factor here right?
And I want to claim that it
needs only to be checked,
but basically, Vd- 1 comes out.
That's c, you can check that
there's no mistake there.
Yeah, was there a question?
No.
I think I made no mistakes, but
it's worth checking
to make sure.
In the book these
are written down,
most steps are written down so
that you can check.
Okay and
now I want the denominator, so
I wanted the ratio of
the volume of A to V of d.
Now we've got volume of
A in terms of V of d -1.
And my object now is to get V
of d in terms of V of d- 1.
Roughly, V
of d is V of d-1 over square
root of d, because we lose,
the volume is d to the d
over 2 in the denominator,
so you lose the square
root of d each time.
So that's what, roughly,
is true, but we will prove that.
So here's a proof, volume of
the hypersphere is v of d is at
least the volume of the cylinder
of this height contain in there.
So it's the volume of
this kind of cylinder.
So I'm going to take that volume
as a lower bound on the volume
of the whole thing.
Again, there are a few
more steps in the notes.
If you don't agree with
something, you can ask me.
So which is, why is that?
Okay, let's see if
you agree with that.
So, this is supposed to be
the volume of the cylinder
whose height is 2
divided by root d- 1.
So, it's 1 divided by root d- 1
away from the equatorial plane.
So, why do you get this factor?
Now where do you get that,
right?
Roughly, what is this
factor coming from?
From the fact that this thing
has radius 1- x square root and
then this height is
2 over root d- 1.
So I think that's
also done correctly.
Worth checking, but So
at the end of it, you get
a theorem which says that for
any constant c greater than 1,
c is anything greater than
1 more than d dimensions.
More than three dimensions.
You have, most fraction of
the volume of the sphere
has x1 greater than this amount.
So, again I expect the absolute
value of 1 to be velocity 1
over root d.
I put down 1 over root d- 1 here
about the same, this large.
So the probability of this
greater than that for
the exponential we would see.
Okay, all of these things will
be exponential false fails will
be very well behaved.
Okay, again please do
check all the details.
So, I guess the question is,
why did they recompute V of d in
step 6, that's because I
wanted the denominator and
the numerator to be in
terms of V of d- 1, so
that's not a very
interesting question.
Now this result implies
a bunch of things.
So the equator being heavy
is another way to prove that
the volume of the whole sphere
goes to 0and that requires
another picture.
So, all right I'll
draw a fresh sphere.
Lets draw a big one here.
So we proved that.
If I put here c times root log
d, in the last I write only a c,
now I'm putting a c
times root log d.
I wrote root d -1 but
almost the same as root d.
Then the volume was e to the- d
squared, so I put the root log
d so that I get e to the- this
volume, I'll get e to the- c in
log d because, I squared it
times the volume of the sphere
which with large enough c is
less than 1 over d squared.
If c is at least 2, let's say,
it's at least one over d
squared, that's for
the x one axis.
And I could do the same
thing for the x2 axis.
This is x2 and I know.
Okay, so let me.
This shaded region is small.
But from the x2 picture,
I also know.
But this shaded region
is not.
Now, if I take out those
two shaded regions in two
dimensions, in d dimensions I
have to take our d of them,
what's left is a cube.
You have to convince yourself
that what is left is
a cube, right?
I mean, so let's, I think that's
the next statement, okay,
I doesn't say precisely but So,
yeah, so it does stay here.
So, it's 1 over d squared, each
of those things I take out and
I have to take out d of them,
one for each axis.
So, once I take them all out,
I still only taken out 1
over d of the volume, okay?
So, for example, half
the volume is still left, but
that's all inside the cube.
Now the cube has a much smaller
side than the radius of
the sphere,
because the cube has only side
about root log d over root d.
Again, d is going to infinite,
so this is going to 0.
So, it's a tiny cube, right?
So what this is saying is that,
a tiny cube in the center
of the sphere contains half
the volume or more, okay?
Again, not a surprise
anymore because we've
seen many of these things.
And the cubes volume is
the side raised to the power d.
So that goes to 0, right?
Now the volume is a sphere
we saw more exactly
did not have this log factor,
right?
It still had this factor,
that is correct.
That's a higher order term,
that is correct.
But it didn't have
the log factor.
This is a crude argument which
leaves the log factor in.
Okay, but that's still
going to 0 because log
d is much more than d.
So that's another proof.
You can give multiple
proofs at this close to 0.
Okay, so yeah,
I mean this thing.
The d minus one
dimensional surface.
Yeah, the bulk of the volume is
near the d minus one dimensional
surface, which is
the equatorial plane, right?
Does that answer?
So maybe we should revisit
the symmetry question,
which I meant to.
Which is, remember,
we had this puzzle last time,
maybe you asked
the question two, right?
Anything can be the multiple.
I don't care.
It's not fixed.
So every north pole, so, This
region has 99% of the volume,
so does this region right?
And so on.
So how do we reconcile that?
Okay, that maybe even we didn't
quite answer that question.
That's what revisiting.
So why is that not
contradictory?
>> [INAUDIBLE]
>> Right, so
if it's 99% if you
intersect ten of them,
the intersection will allow
only to 90% perhaps, right?
So a huge number of them,
the intersection
could be very small.
So it will contain the origin,
but it will be very small.
So it's not a contradiction.
The other little thing was that,
equator and the surface, so
everything was near the surface.
Everything was near the equator,
so these things are very heavy.
These things together
contain 98% of the volume of
the sphere, okay?
It doesn't look at all possible
in 3D or 2D, because it's true.
Okay, because we did say it's
all near the surface, and
it's all in equatorial plane.
So that's true.
Okay, so another
consequence of the equator.
Thing is that, it's a general
principle in some sense.
If I pick two independent
random vectors u, v.
And so I'm writing here
the expectation of a vector,
what I mean is component wise
you take expectations, right?
Usually you take expectations of
real value of random variables.
So each component, you take the
expectation, and their 0 vector.
So each component of u and
v have expectation 0, and
they're independent.
Then with high probability, it's
another phrase we'll use a lot.
So I'll use whp for
with high probability.
Usually it means
exponentially small
probability of failure, right?
These vectors u and
v will be nearly orthogonal.
We will not have very much of
a component along u, okay?
We'll quantify that.
So intuitively, this is clear.
But dot products
measure correlation.
And this says that
independent vectors will not
be correlated, right?
So intuitively that
should be the case and
in fact it is the case, right?
So we'll prove that
using the equator thing.
We'll prove something stronger.
So, the way to see that for
two vectors and
I'm going to do it for
multiple vectors in a minute.
For two vectors, if u and v are
independent uniform at random
from the units here,
we'll see that
the dot product is very small
compared to the lengths.
First the length is big, right?
With high probability,
the length of x,
length of u is this big.
So that's clear,
right, why is that?
What result implies that?
Just the sanity check.
Volume near the surface
implies that, so yeah.
So that implies that.
So first we pick u,
and then I'm going to
rotate the coordinate system so
that u is the north pole.
I'm allowed to do that.
I can choose any north pole.
So then u becomes this vector,
with only the first component
0 and the other components.
Now I pick v.
Now v is independent of u, so
the rotation does not
affect how v looks, right?
This is just a change
of coordinate system,
it doesn't affect
what v looks like.
And we saw from the equator
result that with high
probability, the absolute value
of v1 is order 1 over root d.
And now I take the dot product,
I only have to worry about v1.
And so the cosine theta
which is the dot product
divided by the length
is something like this,
1 over root d times
the length of u.
Then there are higher
order terms, but
it's roughly 1 over root d.
And that implies theta is
almost 90 degrees, okay?
Cosine inverse is that.
Okay, that's for two vectors.
What happens if I
pick n vectors?
If I pick n vectors, iid,
in three-dimensional space,
one can prove first
all of them lie near
the surface if log n is
much smaller than d.
And secondly,
all the dot products are small.
Again, if log, again,
is not very big compared to d,
if n is smaller than 2 to the d,
then the dot products
also goes to 0, okay?
So let me now to
actually prove this.
Yes, I don't know, maybe
I'll ask questions about, so
how do we prove the first one?
I want to say that high
probability, all the xi are big.
And I wanna get the log factor.
First, where would I
get that the first one,
X1 is big in length,
that we saw.
That's from the volume
near the surface.
How would I get that all of
them are near the surface?
Union bound, right?
So because the probability
that one of them.
I'm sorry, the answer was union.
So the probability that one of
them is not near the surface is
whatever it is, and then you're
to multiply it by n, right?
But the failure probability is
exponential, so you get a log.
This is a calculation that
you can do at home, okay?
How about this, xi dot xj?
So we proved for one pair u,
v, we get u dot v is small.
We did some rotational
coordinates and so on, but
those ought to be ignored.
It's all independent of
the coordinate system, okay?
So how did we get this?
Same union bond, right?
Because there are n
squared of them now.
Okay, maybe I should do them,
but there are n squared of them.
So, If p is
the probability of
failure of one of them,
probability that x1.x2
is greater than or
equal to that quantity,
6 root log n over root d-1.
Then, I want n squared
times p Right,
now this probability,
I know is less than or
equal to each of
the minus c squared of c
that's is c squared, right.
So yeah, so
it's a square of this basically.
It's log in times some constant
I think 10, I'll put 10.
6 divided by 2.
We have to go back
to the actual 2.
So this is less than or
equal to 2.
Now the n squared,
we have to union n squared
events, not n events.
But the n squared only costs
you a factor of 2 more,
because they are taking logs.
That's the point, right?
Again, please work that out,
that's just a union bound
that gets you that.
So even if I pick
a lot of vectors,
they're all fairly
mutually orthogonal.
So I want to do one more
thing and then we can stop.
So how do you pick
uniform random points?
Actual algorithmically
how do you
sample points from a unit ball?
So in 2 and 3D,
the simplest algorithm to
do that is the following.
So I want to pick a uniform
random point from this circle.
That's a bit difficult,
because the coordinates
are dependent, right?
Cuz we have this constraint,
x1 squared plus x2 squared
less than or equal to 1.
So the coordinates
are not independent.
So you put it in
a box of side 2.
And I pick random
points from the box.
Now I can do that because X1 and
X2 are independent and
I accept the point only if
it's inside the sphere.
Okay, this is called
rejection sampling, right?
In statistics that, I mean,
[INAUDIBLE] I want to sample
from a distribution
which is somewhat hard,
which is uniform distribution of
the sphere, P of X in this case.
I instead draw a sample from
an easier distribution, Q of X,
which is uniform in the cube
because they are independent.
And then I accept
that this probability
is easy to check that
the samples I get are correct.
I ID samples that distribution
P of X Okay, here's a profound
question, would you do the same
procedure if I told you to pick
from 1,000 dimensional
sphere uniform random point,
would you do the same procedure
in a 1,000 dimensional sphere?
No, the answer is,
he's shaking his head, but
he, no, why wouldn't you do it?
[INAUDIBLE]
Yes, so the answer is, yes.
The volume of the sphere is
exponentially smaller, so
you reject most of the time.
The cube is so much bigger than
the sphere that this would make
rejection very likely okay?
So in fact this is
a general phenomenon
which we won't do
in this course.
But if you see one of the later
chapters from our book on
Markov Chains it turns out
to be interesting to pick
random uniform points.
From general sets,
let's say convex sets.
Simplices, cubes,
spheres, polytopes in
very high dimensions.
And rejection sampling does
not work for this very reason,
because the enclosing object
is always much bigger.
So what do you do
if it doesn't work?
You can look up that chapter.
Or I could give you a little.
We won't actually do
that in this course.
You apply a Markov chain
to multi [INAUDIBLE].
So you wouldn't do this
in 1,000 dimensions.
Here's a procedure that It does
work for spheres, it doesn't
work for general objects and
that's much more complicated.
For spheres,
the correct procedure is,
we pick x one to x independent
from standard Gaussian.
This is mean zero,
variance one normal density.
Now you normalize
the length to one.
That's a uniform end point of
length one from the surface of
the sphere or if you want
to pick from inside, So I'm
going to scale it by a factor
row and take that as my sample.
So, again I have a spear I
picked a [INAUDIBLE] point x
With calcium density
over all of space and
then I scaled it down
to x over length of x.
Then I'm going to scale it
down to one of these points.
This is zero times that but
I have to pick row with
the correct density.
Those are the real
random variables.
I have to pick it with
the correct density.
Right?
What is the density of flow?
Uniform is not right.
Right?
It's not uniform over zero one.
>> Yeah.
So which is,
it's not in a form but
yeah it's got to be the area,
but what might it be?
So you have to look again at
the differential element right?
So I want That volume.
And that volume
should be a function
of rho to the d minus 1.
So raw, the probability
that rho is equal to x Is
proportional to x
to the d minus 1.
Between 0 and 1.
And I do have an integer, which
is d, but to normalize it, but.
So you have to pick over
the correct density, and
then you get a point.
Uniformly at random
from this theorem.
Big question,
such a big question.
So I think, yes.
The question is, instead of
picking according to a normal,
what if I pick x1 to t phi
independent minus r to plus r
uniform, right?
So what is wrong?
It doesn't work.
Why not?
Okay, so
we should draw a picture.
In fact, I think that's part of
the homework problem, so anyway,
in the back of the book.
So what the suggestion was this,
all right?
So here's the sphere from
the [INAUDIBLE] points.
I pick a point uniformly at
random from the cube, and
then scale it down
to length one.
So I pick it somewhere
here lets say, and
then scale it down to here.
So would that be correct?
No, so yes the answer's no,
but why won't it be correct?
>> [INAUDIBLE]
>> Some directions are points.
Which direction more point?
>> [INAUDIBLE]
>> The corner, right,
because yeah,
the cone here is very long.
So, this volume is much
bigger than this volume.
All right, so, that is what.
So, you need a radially
symmetric density, in any case.
But, any radially
symmetric density will do.
Okay, so if I pick points
the probability e to the minus,
so is e to the -r squared.
I could instead do e to the -r,
that's fine.
But it has to be symmetric.
Okay, so we'll go to the,
the next time I am going
to do the following.
We are going to prove
an Annulus Theorum for
Gaussians, if I pick a Gaussian,
its also true that theres
a small annulus their points
lie, and to do that we'll need
a concentration result, which
is actually quite generall.
so it's saying that sums of
independent [Inaudible 00:17.5]
are constant rated
Which we'll prove, and
that'll imply Chernoff, and all
kinds of all kinds of bounds,
before we go on.
