Other measures we care
about, measures of center.
Median versus mean
is an age old debate
on the internet going
all the way back
about whether the
median or the mean
is the better way to measure
the center of a data.
And as is often
the case with age
old debates on the internet,
the answer is both.
Means are easy to calculate,
but very sensitive to outliers.
Means also can give you
a real sense of the skew
if you have a skewed data.
Means can give you a sense
of the skew of your data
very nicely.
So on the other hand, the
median is the number such that--
is the number such that
50% of values are below it
and 50% of values are above it.
The median is the
50th percentile value.
There's also something called
a trimmed mean, which I want
to talk about a great deal.
So medians tell you exactly
where your center is.
So if you really
want to know what
the exact middle of your data
is, such that 50% of people
are below it and 50% are
above it, median's great.
It's basically
immune to outliers.
It's very good that way.
But it's harder to
calculate in some ways,
and it doesn't tell you anything
about the skew of your data.
If you do have a
really long tail,
the mean will let you know
about that in particularly.
It's the difference between
the median and the mean
that is often what we care about
because that's what tells us
about how our data is skewed.
We want both numbers.
One is not necessarily
better than the other.
The last summary statistics
that we tend to care about
are measures of spread,
range and variance.
So variance or
standard deviation
are the most common measure of
a spread of a set of points.
It tells us about how different
the points are very nicely.
Variance and standard
deviation are
effectively measures of
the spread of our data
very directly.
Range is the difference
between maximum and minimum,
which is definitely something
we might care about.
But range, variance,
and standard deviation
are all very sensitive
to outliers, so there
are other measures that we use.
So we use interquartile
range, which
is the difference between
the 75th percentile value
and the 25th percentile
value in a set of data.
And we'll sometimes use the
median absolute deviation,
which is essentially the
median of the variances.
And sometimes, we'll use the
average absolute deviation too,
which is the mean
of the variances.
So all of these show up as
we're trying to calculate
summary statistics.
