In this video we want to identify outliers
in a set of data.
If you are not sure what an outliers is, here
is what they are.
An outlier is an extremely high or extremely
low value in the data set.
Now in addition to just being something extremely
high or something low, you want to make sure
that it satisfies the following criteria.
If you want to find an outlier it must be
greater than Q3 + 1.5(Interquartile Range)
or it must be lower than Q1 - 1.5(Interquartile
Range)
This is making sure that it really is an extremely
high value or extremely low value.
You can see though that you need to compute
a few different things like Q3 and Q1 and
the Interquartile Range if we are going to
properly identify one of these outliers.
So lets look at some data, and see how this
works.
In my data I have a chart of how many phone
calls were received on any given day.
So I have 10 phone calls on the first day,
12 phone calls on the second day, and so on
and so forth.
If I'm going to compute things like Q1 and
Q3 and the Interquartile Range, its probably
a good idea to take all of this data and write
it out in order.
10, 11, 11, 11, 12, 12, 13, 14, 14, 15, 17,
22
Alright, so you can see that when I list out
my data like this 22 does look like a pretty
high value and 10 looks like a fairly low
value.
To double check that, you know, one of these
might be an outlier or maybe both of them,
lets go ahead and start breaking down our
data to find Q1 and Q3.
So I want to find the half way point of my
data, and I have twelve data points, so one,
two, three, four, five, six.
Alright, so I need the median of the first
half and the median of the second half.
Let's see, the half way point of the first
half lets call this Q1.
And looks like that is equal to 11.
Remember you find that by adding 11 plus 11,
dividing by 2.
The median of the second half, this would
be 14.5
Alright, now to 
find our Interquartile range, we would end
up subtracting these two values from one another.
This would give us 3.5.
Alright, we have all of the information we
need, now we can figure out other values so
we can figure out outliers.
So to look for an extremely high value it
must be larger than Q3, which is 14.5 plus
1.5 times the interquartile range, 3.5.
And to find an extremely low value I'd take
Q1, 11 and I would minus 1.5 times the interquartile
range.
Let's see what these equal.
19.75
And 5.75
Alright, so here is how this works, if I have
any data points that are larger than 19.75,
they are an outlier.
If I have any data points that smaller than
5.75 those are outliers.
Well looking at all of our data, we can see
that the 22 is definitely larger than 19.75,
so its definitely an outlier.
Unfortunately I have nothing less than 5.75,
so I don't have any lower outliers.
So this entire set of data only has one outlier
and its just the 22, so its definitely an
extreme value.
So remember that you have to find a few different
bits of information first, but this is how
you go about finding your outliers.
