Let's look at normal quantile quantile
plots, which can help answer the question:
is a set of observations approximately
normally distributed?
If a set of observations is approximately normally distributed,
a normal quantile quantile plot of the
observations will result in an approximately straight line.
We sometimes call a quantile quantile
plot a QQ plot for short.
Here's an example of a normal quantile
quantile plot
for a sample of size 9 from a normally distributed population.
These points fall in a pretty straight line.
We sometimes draw a line in for a little perspective.
More on that line a little bit later.
Assessing normality is important,
as many statistical inference procedures
assume we are sampling from a normally distributed population.
And so we are going to want to
investigate that assumption,
to see if it's reasonable.
Now let's work through some of the details in
how a normal quantile quantile plot is constructed.
Does the following sample come from a
normally distributed population?
First we're going to order the data from
smallest to largest.
And then we're going to plot these values
against the appropriate quantiles from
the standard normal distribution.
Let's look at that in a little more detail.
Here we have the standard normal
probability density function, truncated at -3 and 3.
And down here we have our 9 sample values.
And what we're going to do
is we're going to place 9 values down
here for a standard normal random variable
such that the distribution is split up
into n+1=10 equal areas.
So we're going to split this distribution up into 10 equal areas,
because we're placing 9 values down here,
and so the distribution is going to be split up into 10 areas.
And that looks like this.  
It might be a little hard to tell, but this value says 0.1
all of these areas are 0.1,
because there were 10 areas, and the area
under the entire curve is 1 of course.
And now we're going to find the values of a
standard normal random variable that make that happen.
So we could go to computer software 
or possibly a standard normal table,
we would find this first value, rounded
to two decimal places, is -1.28.
Now let's do that for a few more of them.
I'd put all 9 of those values in, but
that would look a little messy here,
so I just put in 5 of them.
Now, if we were sampling 9 values from the
standard normal distribution,
a little loosely speaking we would expect the
smallest value to be around -1.28.
And this one's not written there, but
it's -0.84 to two decimal places.
We would expect the second smallest
value to be around -0.84.
And the third smallest value to be about
-0.52 and so on and so forth up
to the largest value around 1.28.
So what we're going to do is,
we're going to
plot the smallest value in our sample of size 9 against
what we would expect to get as the smallest value
in a sample size 9 from the standard
normal distribution.
And the second smallest value against
what we would expect to get
as the second smallest value in a sample
of the same size from the standard normal distribution.
All the way up to plotting our largest
value in the sample
against what we would expect to get for
our largest value from
the standard normal distribution at the
same sample size.
And if our data is approximately
normally distributed,
that should result in an approximately straight-line.
We could have picked a different normal distribution,
we didn't have to pick the standard normal distribution.
Some things would change in terms of the scaling,
but overall the general idea would be the same
if we were to use any normal distribution.
but we typically simply use the standard normal distribution.
And if we were to plot that out, 
here's what we'd get.
I'm labelling the y-axis here as the
sample quantiles
and those are simply our observed values
in the sample.
And on the x-axis, I've got the theoretical quantiles.
And loosely speaking those are the
values we'd expect to get
in a sample of this size from the
standard normal distribution.
And the resulting plot is a pretty darn straight line.
So this would say that that sample data
is approximately normally distributed.
And just in case it's not clear what
these points represent,
this value here is the smallest value
observed in our sample, 3.89,
and it was plotted against
the corresponding value from the
standard normal curve of -1.28.
And  similarly for the rest of them.
What did we just do here? 
We plotted the ith ordered value,
sometimes called the ith order statistic,
against the i/(n+1)th quantile
of the standard normal distribution.
So for example, when i was 1, we took the smallest value in our sample
and we plotted that against 1 over,
we had a sample of 9, so 9+1, or 0.1.
And the 0.1th quantile of the standard normal distribution.
Or in other words,
we have 0.1 to the left, this value here
that yields 0.1 to the left
is the 0.1th quantile of the standard normal curve
also called the 10th percentile.
If we wanted to call it that.
And we used that method for all of our
ordered data values.
Instead of using i/(n+1), there's a variety of other possibilities,
such as (i-1/2)/n, or more generally
(i-a)/(n+1-2a),
where a is some number between 0 and 1/2.
This, if we think about this, this is the
same as the general method
with a of 1/2,
And what we had up here is the same as
the general formulation with a=0.
And different values of a have been
proposed as well.
But let's not get too bogged down in that.
What we are simply trying to do is come
up with a method that approximates
what we would expect to get if we were
sampling from the standard normal distribution.
Now let's look at a few things we might
see in practice.
Usually in practice of course we use
statistical software to actually do the plotting.
And here I'm going to use the statistical software R.
R's method differs slightly from what I
described earlier,
but overall the idea is the same.
In this first plot I'm sampling 50
observations from a normally distributed population
and we get a normal quantile quantile plot here
that results in a pretty darn straight line.
This line is drawn in for a little
perspective.
And R draws in line that joins the
first and third quartiles.
But there are other lines we could draw in but that's R's method.
So overall our points form a pretty darn
straight line,
which shouldn't be too surprising
because we are indeed sampling from a
normally distributed population.
But let's see a couple of instances
where we're not sampling from a normally distributed population.
Here I'm drawing a random sample of 50
observations from a uniform distribution.
And if I was to superimpose a normal curve over this say,
we'd see that the uniform distribution has truncated tails,
so we're not going to get extreme values in this distribution.
How that manifests itself in the normal
quantile quantile plot
is that these values, corresponding to
the large values in our sample,
are not as far out in the right tail
as would be expected if we were sampling
from a normally distributed population.
And the smallest values in our sample
are not as far out in the left tail
as would be expected if we were sampling
from a normally distributed population.
So we see this curvature here.
One thing to keep in mind is that there
are different plotting methods.
And sometimes people switch the axes here,
and have the theoretical quantiles on the y-axis,
and the sample quantiles on the x-axis.
In some ways it doesn't really matter, 
we're still going to see the curvature,
but we just have to be careful in our
interpretation of what that means.
Here I'm drawing 50 observations from
a distribution that is strongly right skewed.
And what we see in our normal quantile quantile plot,
is that the largest values in our sample
are much larger than would be expected
if we were sampling from normally distributed population,
Because we've got this heavy right tail here.
But in the left tail, the dots creep
above the line here,
indicating that the smallest values in
our sample
are not as extreme as would be expected
under normality.
And we can see that up here in our pdf,
as it ends pretty quickly here at 0.
Here I'm drawing 50 observations from
a heavy-tailed distribution,
a distribution that has greater area in
the tails than the standard normal distribution.
so we're going to expect to
get more extreme values.
And we see that in our normal quantile quantile plot.
These values in the right tail of our distribution,
the largest values, are even larger than would be expected under normality.
And these values in the left of the
distribution, our smallest sample values,
are even more extreme than would be
expected under normality as well.
Properly interpreting normal quantile quantile plots 
can be a little tricky without some experience.
But fortunately we can get some
experience pretty quickly using simulation.
What I've plotted out here is the normal
quantile quantile plot
for a sample size 50 from the
standard normal distribution.
And this results in a pretty darn
straight line.
But sometimes of course, natural variability will lead to us seeing some curvature
or a major outlier something along those lines.
So I'm going to replicate this, or carry
this out 20 times
and see what these different normal
quantile quantile plus look like
when we are in fact sampling from a
normally distributed population.
When the sample size is a little bit
smaller, the variability can be a bit more extreme.
so let's see what it looks like for a sample of size 10.
This is the normal quantile quantile plot for a sample of size 10 
from a normally distributed population.
Let's do this 20 times and see what
the variability is like.
So that's our introduction to normal
quantile quantile plots.
And we'l be using this method to investigate the normality assumption
in a variety of different statistical
inference procedures.
