welcome back to statistics mon
and today's lecture will be talking
about descriptive statistics
now this in the first lecture
most introductory statistics textbooks
whether they're in psychology or
economics or sociology or political
science
split the field of statistics into
descriptors
and inferential statistics
that's a common technique because
descriptive statistics one are a lot
easier
and to you need to use a lot of the
concepts that you learn and descriptive
statistics to do inferential statistics
so today in just one lecture are
provided with
an overview of the essentials of
descriptive statistics
and then will be ready to move on
to the more difficult task of
inferential statistics
so to give you a layout of today's
lecture
its organizer to three segments
in the first segment we're going to take
a broadview sort of look that
all the data
uh... it in a very broad
range so
this is comin when we collect data in
research
uh... once an experiment is done
we want to look at all of our data all
of our variables all of our cases of all
of our very roles
just so we know what we have
in the second segment we're gonna do the
opposite
calculates summaries statistics so one
ir
gather meyer my research together
and i want to go to a conference to
present my work or wanna rated up for
publication i mean just a few talking
points and that's when all moved from
the broadview of histograms and
distributions
to this more
uh... focused view
uh... summaries statistics and that in
the third segment
uh... focus on some the key tools that
you'll need
answered
we move forward to inferential
statistics
stands for segment
we're gonna talk about histograms and
sample distributions
so we're going in by that's that's one
of the
goals of this segment is to understand
what i mean
when i say sample distribution
the sample distribution will be
displayed in histogram you'll see that
momentarily
you'll see the normal distribution or
bell curve repeatedly throughout this
course
uh... but that doesn't mean that all
variables are distributed normal
there are lots of variables in nature
and and psychology
uh... that aren't normally distributed
so we need to know
how to spot non normal descriptions
and
for that all i'll talk about the idea of
sqn kurtosis
so first
you might think wire we starting with
histograms
uh... just in the regular
justin nevers day too
to are browsing the internet or reading
newspaper
uh... or watching television the you're
exposed to lots of fancy graphics
uh... that summarized data
uh... much fancier than histograms
obvious grams are the place to start
because they show the entire sample
distribution for one variable
why do we want to do that
we want overcome
and and i i'd like for you to work on
this as you go through the the course is
to overcome the tendency and it's a
natural tendency
too
to rely upon summary or aggregate
information
so as an analogy
we often stereotype people based on
their group membership
but we all know we shouldn't do that
uh... if we really want to get to know a
person right we want to look at all the
people within a group
to get to know each individual person
so the stereotype is sometimes helpful
'cause it's an aggregate
it is a sound basis for prediction about
people
but we don't want to rely upon that
initially
so at first will take this broadview and
look at everybody in the distribution
sos first example you start with
is
this very simple our normal distribution
uh... measuring
our sample of one hundred people's body
temperature
uh... what i'm plotting is or histogram
so
histogram
will have the actual score so in this
case it's body temperature in fahrenheit
so are ready and revealing my
american
uh... he dosent resume their sorry
um... so
the averages about a hundred parent
that's about what twenty thought since
then
well i'm plotting v
body temperatures on the backs are
excessive
and then just frequency on the line axis
of how many people the sample
the body temperature
uh... that's in a particular interval so
we're just looking at frequency um...
the y_ axis
when you see here
is just a very normal distribution has
sort of a high peak
uh... relative to the standard normal
istration
the other thing you'll notice about it
if you
uh... remembers her firm
you know high school biology
uh...
and again forgive me if you're
or your used to thinking of this and
celsius
but most of us think of ninety eight
point six as the
average normal body temperature
if you look at this
uh... histogram were getting an average
of about
a hundred
and the reason for this
this sample distribution
was taken
by going to going around two hundred
people healthy people
and using this new infrared wand
uh... for mom inter
so the different ways that you can
measure body temperature and the wind
tends to over estimate
body structure
and that bias in measurement
is important really come back to that
more talk about measurement issues
mirza of better known histogram that you
may be familiar with
uh... the bell curve
representing individual differences in
intelligence
uh... this was the name of the book
published by her instant and murray that
cause a lot of car controversy
and psychology and social sciences in
general
uh... but again it's a histogram so
here's the frequency
and where were plotting his intelligence
scores
and you get it
and all this tradition so most people
are
around
but some people are exceptionally bright
there were a pair
and some people not so much they're down
here
so that's the idea of a normal
distribution
uh... and their are very well known one
and their comfort zone one
again the reason we're starting with
histograms as they often reveal
important information that's messed
or glossed over if you just provide
summary statistics
and others are controversial example
that
uh... is perfect to demonstrate this
point
the notion that there might be
sex differences in spatial reasoning
by unit in adults
so in this house
been observed in some studies and the
intelligence and partner sekali
literatures
uh... depends on the type of spatial
reasoning past you administer
uh...
but sometimes you do see
if u
rely on the summary statistics the mean
say
uh...
the need for the men
might be a little higher than the mean
for the when
the variance within each group
the spread within each group the
differences within each each each gender
is far greater than that little doubt in
the mean differences and you'll see that
in their strengths
so let's look at histograms
uh... studies that demonstrate the sex
difference they
they typically
we can call it an independent variable
sex um... and then the dependent
variable
uh... scoring a spatial reasoning task
but remember back to lecture one we
talked about randomized experiments
this is not a true independent variable
so that's why i wrote concerns at the
bottom right we can randomly assign
people
could tip the balance payment
uh...
so we have to be careful
characterizing this as an experiment but
but just for use of the demonstration
there's our experimental setup
and i just simulated data in our that's
really easy to do a show you how to do
that in uh... in the intro to our
lecture
and
by
uh... i just
assume that we would have about fifty
females and fifty males in the sample
as it turns out
uh... the mean for the males was a
little bit higher than the named for
females and as i said this has been
observed
in some studies depending on how uh...
what type of spatial reasoning task
is used
but let's look at the histograms
first let me show you
uh... females in red
and what you see is sort of normal
distribution
of the peak
is it a little
toward the positive and of this region
but it's pretty normal
nominee has a pretty high peak
if i wanted to smooth it
uh... but one thing to note is in this
in this graph is the frequency
list fourteen women
who have a score in the range sort of
like seventy to eighty
actually free add-in
it does
over twenty
uh... better in this range
let's compare that to the to the mails
citizens tram for males
where you see is uh... if you look at
the y axis frequency
since number here is
let's go back and look at the women
may fourteen was the peak and that is to
graham
sourcing here and then in the mail
distribution
is
also a normal distribution
it's sort of
it doesn't have high as high of a peak
the other thing to notice in this will
be clear if we put them side by side
is
there's a difference up here
and of the distribution
and this is actually what's observed
in allot of studies
that show
the sex difference
but before i get to that
again noticed that the massive amount of
overlap
of these two distributions
so if we look at the overlap
from here to hear
that's the overlap
so the to distribution sh have much more
in common
uh... separates them
and what really separates them
is
some mails at the top top top end of
this mission
worried or not the reality
we have a lot more females
and in the middle part of this region
through
average
uh... and again that is comin
uh...
what's exciting is that new research
very
uh... they release in the last few years
has demonstrated that these tapes of
spatial reasoning skills
uh... can be trained through practice
and
videogames and things like statistics
uh... and any in certain spread to
sports uh... like martial arts are
wrestling are we have to sort of plan
and execute
uh... and three-dimensional space
and what's even more exciting is that
the trade those training effects appear
to be bigger
and women and men
out so i think what we're seeing is just
some of these people
some the women in the middle of the of
the distribution
moving out
to the end of the distribution
through practice
okay one more example just to too
to drive home the point that
looking at histograms can often show you
more than just looking at summer estes
takes
um...
this example
as a little bit bolder
uh... in the nineteen sixties when
contraception was firsts bum
made available legally in the united
states
of the drug manufacturers were concerned
that certain
uh... drugs
were causing an increase in blood
pressure
so they wanted to do a study to test
weather
when they were taking the pill as it was
called
uh... had
increase blood pressure
relative to win
who were taking the pill
again the problem here is it was
difficult id difficult to do
randomized
experiment
uh... especially in the sixties is a
very controversial topic in the united
states
so wasn't very easy to just
take people into eleven randomly assign
him
to contraception or not
uh... the price still within five
so
uh... they just had to rely on
women who are either taking
the pillar not taking the pill so again
this is not a true independent variable
that's began why have concerns down
there
and then we can measure blood pressure
and against for ease of illustration a
minute
just systolic
blood pressure
again i just made updated and are a
hundred users a hundred non users and if
we just look at the means
it looks like there's a little bit of a
of ineffective the drugs so
uh... one twentieth about normal for
cyst
for systolic and the users are up at
about one twenty four
whereas the nine years as a right at one
twenty
so it does look like there's a little
bump
but let's look at the
the histograms to see where that bump is
coming from
so the users are in yellow so these are
the women who were on the pill
and what you see as an average around
one twenty a little higher because of
this peak right here
but you get to have and normal
distribution
let's look at the nine users
and then
put them together
and here it is sort of see
but different pictured them the spatial
reasoning sample right
it looks like feel whole
histogram behold distribution
of users is just sort of shifted
about four five
uh...
points
so if itis
pro-growth that
and then occur river that
the whole distribution shifted
which again this isn't it true
independent variables so we can really
make a causal statement
but the fact that the effect it looks
like
effected the entire distribution in the
same exact way
is suggestive
that indeed
it was causal uh... and
follow-up research actually supported
that conclusion
another thing you can see in these
histograms an f
if you've taught statistics a long time
i got i have a review do a lot of
research and stare at these things they
were dead
uh... you probably sought a ready
is it looks like there is almost
to distributions in that right so
in the users i could sort of see like
one their miller one there
and histograms for show you that
and in indeed
these data were simulated based on a
study
where they had
smokers and non-smokers in each group
because they wanted to see if the drug
habit interacted with that
and anyway
so what you're seeing in this saket
distribution
are at the smoker smokers generally have
higher blood pressure band
and non-smokers
so we'll we would see that
by looking at
the sample distributions and histograms
we would lose that if we just looked at
some risk the chest x
so those were all just assuming
that we had
normal distributions
uh... let's look at what some non normal
distributions look like
and they're all sorts of possibilities
uh... and she'll show you
and to do that
where no wind gusts
um...
by injury one true thing
uh... altho like that this picture has
wine glasses that have sort of sd
different distributional shapes
uh... so it reminds me of bell curve and
and non normal distributions
uh... so as a good image to take
and this is this
is actually done there's a science in on
uh... one
tasting inner experts uh... so their
actual studies like this
so it's a soon
uh... that we have thirty
wind experts
rate the overall quality of four
different
red ones
and they've raped them on a scale of one
to ten higher scores and the
indicate higher
i text four red wines that have really
funny names but they're really
their true
read what's so these are actual red ones
you can go find them that your local
uh... one-star
so the four that i picked are what book
bread truck hob knob
and foreplay
uh...
perot fund say
uh... there are still pretty good to and
not a big fan a red truck but what book
uh...
i think that
uh...
i'm planning
four histograms all online and slide
here so forgive me if it's a little hard
to see
resume and
uh... but im plotting them all on one
slide
so that we can hat can compare what
these
different distributions look like
so what you see for what book is
just or wrecked
tango or
for the distribution
so the ratings were all over the place
some of the line readers loved it some
of them hated it some of them
part was at ridge
it was they were all over the place
that's a uniform distribution
red truck is showing us nice normal
distribution
hob knob is showing
positives u
and foreplay is showing
and negative skew
and the way to remember this you idea
is
this to you is where there are few
in the distribution
so for hob knob
there were distant you are a terrorist
really liked it
so that's why it was you'd it wasn't
like red truck where is normal and
symmetrical
there just a few that liked it
so that's positively skewed
four plays the opposite they're just at
the couple that really didn't like it
uh... so as negatively
i can show that
michelle those distributions instead of
me awkwardly drying with the stylus i by
promising a better
uh...
you can do this in our interest plot
what's called a density plot it's
basically a smooth
with mister gramm
and
this slideshows the difference among the
four distributions
nicely so
it's into tries to smooth that that rec
that rectangle right
uh... but
it's those rectangle
then we got the nice normal
we've got a little positive antelope
negative skin
so that's taking the broadview
and looking at
sample distributions the entire
distribution via
histograms
you could then summarize that
information and we're gonna do that
basically in four ways for a look at
measures of central tendency
relook at measures of variability
then sqn proposals which refers to the
peak
or
like the uniform distribution
uh... as the opposite of a about p
and if you're coming from an uh... a
strong math background particularly if
you
uh... know your calculus inside now
you'll recognize these as before moments
of the meeting
soda review this segment
on histograms
the important concepts to take away is
are
just one y y me when i say at this trip
mission
i mean a distribution of scores and the
sample on some very well
we're gonna use that normal distribution
allot we're gonna soon that some of our
variables are normally distributed will
question that assumption book
will assume it
and wool earn how to spot
non normal distributions by looking at
sqm curtis
