[MUSIC PLAYING]
LECTURER: So welcome
back this week.
And so what we're
going to do this week
is we want to talk about
correlations and causations
and how you infer causations.
So I want to start
off with what I would
like by the end of the class.
Some people will
find this funny now.
I'd like everyone to find it
funny by the end of the class.
xkcd, "I used to think
correlation implied causation.
Then I took Calling Bullshit.
Now I don't.
Sounds like the class helped.
Well, maybe."
So if that's not funny now,
it will be in 45 minutes,
if I do my job.
Let's start out with
some definitions.
I know a lot of
you have seen this.
But I want to just make sure
we're all on the same page.
When you talk
about correlations,
we're saying two
variables are correlated
when knowing the value of
one gives you information
about the value of the other.
So x and y are correlated.
When I know x,
then that tells me
something about what
y is likely to be.
We're going to focus, in
particular, in this lecture
on linear correlations.
And we'll show you
some examples of those.
Causation-- two states
are causally related
when one state influences
the other through some kind
of cause-and-effect process.
And of course,
there are thousands
of years of
philosophy getting at,
well, what does this
actually all mean?
But in common language,
I think we all
know roughly what it means for
one thing to cause another.
So let's start off and jump
right in with an example
of correlation.
So here's a
correlation we'll often
hear people talking
about, particularly
if they're trying to
justify limited government.
They'll say, well, one of
the problems with welfare
and other forms of
social assistance
is it takes away people's
incentives to work.
And so what happens is welfare
can actually cause poverty.
And so you might see
a graph like this.
Here are some numbers
I put together.
Each dot is one particular
county in Oregon.
And we've got on this
horizontal axis, my x-axis,
we've got the fraction of
the population in that county
below the poverty line.
And on the y-axis, I've got the
fraction receiving food stamps.
And as you see, there's
this general trend.
These things kind of run
along with one another.
They increase with one another.
And so we say they have
a positive correlation.
And statistically, this is a
very significant correlation.
We don't need to go into the
details about what I mean
by statistical significance.
Do you have a
clicker for me there?
So we take a graph like this.
And so we've got
this correlation.
Now, what I want to do is
think about how to depict that.
How would we draw
that correlation out?
And so we can
diagram it like this.
And what we want to
do in this class today
is take you through
a way to diagram
correlational and
causal relationships
and think clearly about them
and not be fooled by information
that you read or create
yourself or whatever else.
So we'd say, well,
in this case, poverty
is correlated with recipients
receiving food stamps.
And if it were me,
I would probably
say, well, there's a
causal relationship here.
If you're poor, you will
receive food stamps.
So poverty is causing people
to receive food stamps.
And in fact, that causal claim
is sort of implied by the graph
that I showed you.
Some of you will know this.
Some don't.
When you take a graph like this,
when you do a scatterplot where
you're looking at
these correlations,
there's a convention
where you put
what we call the independent
variable on the x,
or horizontal axis.
And this is the variable
that we think about
as not being determined
by the other variables.
So we think about
this one just being
this exogenous thing that's set
by nature, the world, whatever
it is.
And then we put the
so-called dependent variable
on the y-axis.
And so that's the one
that we're hypothesizing
is being caused by our
independent variable.
So when you actually
look at a scatterplot,
there is this implicit
suggestion of causality
already there, not that
anyone is necessarily
trying to mislead you.
Often, you'll do scatterplots
for things where you don't
know the causal relationship.
But just realize, when you're
looking at scatterplots,
we do have this convention
of how we display data,
that we do typically put the
cause here and the effect here.
So be aware of that.
So we could look at it this way.
So the independent
variable is, of course,
causing the dependent variable.
We could reverse things and
take the exact same data
and just flip the axes, right?
So now I've got the fraction
receiving food stamps
as my independent variable
here on the horizontal axis
and the fraction
below the poverty
line as my dependent
variable on the y-axis.
Again, I get a positive
correlation, of course.
I've just flipped the axes.
Again, I get the
same strong p-value.
But now the story that
this graph is telling
is sort of implicitly a little
bit of a different story,
because it's sort of suggesting
that the fraction receiving
food stamps is influencing
the fraction below the poverty
line.
So that would be a different
way to view what's going on--
it's not that poverty
causes food stamps.
It's that food stamps
actually cause poverty.
What I'd like to do is ask
you guys, what do you think?
So get out, poll everywhere.
And which way do you
think causality goes?
Do you think that poverty causes
people to receive food stamps?
Do you think food
stamps causes poverty?
Or do you think that this is
just a happenstance correlation
and there is no
cause there at all?
Let's see what people think.
OK.
So this class, a lot of
people think that poor people
get food stamps.
Relatively few, almost no one,
thinks that getting food stamps
makes people poor.
And some people think this
is a coincidence (SARCASTIC)
because, after all, this
is a class on bullshit,
and I'm probably up here
trying to trick you.
Fair enough.
That's not a bad guess.
So that's interesting.
That's certainly my
own political leanings.
And Lynn leads me to
view the world this way.
And I see a lot of you
share that, not everybody.
That's actually kind of
unusual in the United States.
Here's a poll that NBC did,
along with The Wall Street
Journal.
And they asked people,
what do you think?
Which of the following
reasons is most responsible
for the continuing
problem of poverty?
So 24% say, well,
it's too much welfare.
It keeps people from
having any initiative.
Yeah, that's the problem.
That's why people are
poor, is they get welfare.
And only 4% say it's from a
lack of government funding.
The government's not investing
enough in keeping people out
of poverty.
So in the US population at
large from which this survey was
taken, it goes about 6 to 1 in
the direction of food stamps
causing poverty, rather than
poverty causing food stamps.
I just think that's kind of
an interesting observation.
What I think is
actually going on here
is that we've got some
kind of feedback cycles.
So in practice, poverty
is causing people
to receive food stamps.
Food stamps aren't
without effect.
They do have some effect.
I don't know what they are.
But it's certainly
not crazy to say
that they may reduce
people's incentives
to generate their own income.
And so you may have some
feedback in this direction.
My personal view of all of
this is that, well, poverty
is the dominant driver, and
food stamps generating poverty
is a very minor effect.
[MUSIC PLAYING]
