So we're talking about
conditional probability, right?
We're continuing to do
conditional probability.
And so today we're gonna talk
about a couple of the most
interesting famous problems related
to conditional probability.
First one, a lot of you have
seen in some form or other.
And that's called the Monty Hall problem,
the three door problem.
Very, very famous problem,
it's been in movies, on TV, and
New York Times articles about this.
There are entire books written
just about this problem,
popular books on just this problem,
entire problem.
So a lot of you have seen it before,
but I'm not assuming you've seen it.
Even if you have seen it though then I'm
hoping I can give you some additional
ways to think about it.
And we'll approach it from
more than one perspective
because it's a very subtle problem.
This is the sorta problem that almost
everyone gets wrong the first time they
see it.
And then if they think about it for
a while they think they understand it.
And then you ask the same question
just in a slight disguise.
And then everyone gets it wrong again.
So, I'm gonna try to talk a lot about
how do you really think about this so
that you won't get fooled
if I ask the same thing.
And then just change a few things around.
What's really going on here.
But anyway I'm not assuming
you've seen the problem before,
so let's start from the beginning.
So Monty Hall is a game show host.
He's retired now, but for
many years he hosted a game show
on TV called Let's Make a Deal.
And so the problem is named after him
because it's not exactly what he did on
his game show but he did kinda
similar kinds of games on his show.
So the game is like this,
there are three doors.
Let's just call them door 1, 2 and 3.
[COUGH].
And suppose that you're
the contestant on the game show.
Monty Hall asks you to choose.
And right now you don't know
what's behind each door.
Monty Hall asks you to choose a door.
So say you pick that one.
So, there are all
assumptions in this problem.
We are supposed to assume that
one door has a car behind it.
The other two doors have goats.
So you don't know which one.
One of these has a car behind it.
The other two have goats.
And we're assuming that you,
the contestant,
have no idea which one has the car.
They're all equally likely, but we assume
that Monty Hall knows which is which.
So Monty knows which one has the car.
Okay, that's a key
assumption of this problem.
Sometimes it's kinda left implicit, and
I think it's important to
state that explicitly.
It's a different problem
if Monty Hall doesn't know.
So one door has a car,
two doors have goats.
You pick one.
Another assumption is that you want
the car, you do not want the goats.
So you pick a door.
Suppose you pick this one.
So let's say you pick this one.
Pick this one.
Just by symmetry, we can simplify things,
if we assume that we,
being the contestants, we choose door one.
If you want door 3 instead, that's fine
with me, we could just renumber the doors.
So let's assume that we pick
this one initially, door 1,
just to simplify the notation.
Then what happens is that Monty opens up
either door 2 or door 3 revealing a goat.
So for example, he could possibly
opens door 2 and there's a goat there.
So then you know the car's either
the one you initially chose or door 3.
Question is should you switch?
Monty Hall gives you the option of do you
wanna switch to the other door that's
unopened or keep your original choice?
The question is,
is it beneficial to switch?
So that's the problem and
there's one more assumption.
Which is that Monty always opens,
he's always gonna offer you that choice,
so he's always gonna open a goat door.
I'm just saying goat door
is a door with a goat.
That's an assumption.
But if Monty opened the door that had
the car then it's a stupid game, right?
Like, it's okay.
>> [LAUGH]
>> So
he's always gonna open
the door with the goat.
That's why he needs to know otherwise
by chance he might spoil the game.
There's one more assumption that
usually doesn't usually get stated but
it's actually pretty important.
If he has a choice of which door to open.
He picks with equal probabilities.
And the case where that happens is
if initially you guessed right, so
the car is here.
These ones both have goats.
So Monty could open either door 2 or
door 3, assume those are equally likely.
On the strategic problems,
I posted the new homework and
the new strategic practice.
On the strategic practice you'll find
a problem there which is kind of
an extension to the case where,
what you might call Lazy Monty Hall.
Where he's standing here and
he prefers opening door 2 to door 3,
cuz he doesn't wanna walk all
the way from here to here.
So suppose he opens this one, probability
P this one probability one minus P.
Where maybe P is greater than one half,
or less than one half, or whatever.
If he has a choice,
sometimes he has no choice right?
Because he's not going to spoil the game,
but sometimes he has a choice.
But for the basic problem we assumed
he picks with equal probabilities.
Okay, and the question is,
should you switch?
And most people when they first
hear this problem they say,
okay, well suppose you open door 2.
And you're picking between door 1 and
door 3.
There's two doors.
You don't seemingly have any
information about whether door 1 or
door 3 is more likely.
It looks kind of symmetrical,
so it's 50/50.
And controversy has raged about this
problem for years and years and years.
Part of the controversy is because
people don't understand probability and
they haven't taken Stat 110.
Part of the reason for
the controversy is that sometimes some of
these key assumptions are left implicit.
And so then you know different people
might interpret it differently.
So it's important to have
these assumptions clear.
Under these assumptions,
the answer is that you should switch.
And if you switch, your probability
of success is two-thirds, and
if you stick with your original choice,
your probability of success is one-third.
So, it's better to switch.
It's not 50/50.
We can't, it's sorta like an abuse
of the naive definition of probably.
If you just immediately say well
it's 50/50 cuz there's two doors and
we don't know which is which.
Well, it's the naive definition, right?
Assuming they're equal to likely but
why they're equal likely?
We don't know that they're equal
likely given the evidence.
We are assuming that initially I
didn't write this but I said it.
Initially it's one-third, one-third,
one-third as the probabilities.
But that doesn't mean that conditionally
after we observe what happens.
It doesn't mean that it's
still equally likely.
Cuz we have the information.
One key intuitive point.
Suppose that Monty opens door 2.
Just for concreteness.
Well, obviously,
we know that Door 2 has a goat.
So the kind of naive approach would be,
well, let's just condition on
the fact that Door 2 has a goat.
And originally, it was one-third,
one-third, one-third.
And we're getting rid of this, so then
it's still one-half, one-half for these.
There's a subtle flaw in that reasoning.
So we know Door 2 has a goat,
but a key thing with conditioning
is we wanna condition on all the evidence,
okay?
The evidence is not just
that Door 2 has a goat.
We need to condition on all the evidence.
There's one more key piece
of information there.
That's the fact that
Monty Hall opened Door 2.
So now we're gonna start thinking about,
why does that matter
that he opened that door?
So we're not just conditioning
on there being a goat there.
We're conditioning on the fact
that Monty Hall opened Door 2.
So we have to kind of see,
why is that relevant,
why does it become two-thirds?
Okay, Door 2, all right, so
I wanna approach this problem
in a couple different ways.
Maybe three different ways,
one intuitive, one with a tree, and
one with a probability calculation, okay?
Actually, two intuitive ways.
There are a lot of ways to think about
this problem, but some of them are wrong.
So I'm gonna show you some
correct ways to think about this.
So first of all, a tree diagram, I think,
is a very good way to
picture this problem.
So let's draw a tree.
So you choose the door, and just to
simplify not having to draw as many
branches, I'm gonna assume that
the contestant chooses Door 1.
So choose Door 1, Door 1, okay?
Now it branches depending on which
door Monty Hall opens, right?
Monty Hall, well, there's two things.
One is, what's the door that has the car?
And the other branching is,
what door does Monty Hall open, okay?
So let's actually do the door
that has the car first.
So it branches three ways, and
these branches each have probability of
one-third, one-third, one-third,
Door 1, Door 2, Door 3.
So this is just the car door.
By car door,
I don't mean that the door of the car, but
the door that has a car behind it.
That's either Door 1, 2, or 3, right?
And they're equally likely by assumption,
that's the first branch.
Secondly, we have Monty door, that's
whatever door that Monty Hall opens.
And let's just consider the cases.
We choose Door 1,
the car is behind Door 1.
Then Monty Hall can open either Door 2 or
Door 3.
So that's where he has two choices,
so it branches two ways.
Then he opens Door 2 or Door 3.
And we're assuming those are equally
likely, so I'm putting one-half,
one-half on those branches.
Now if we picked Door 1,
and the car is behind 2,
then Monty Hall has no choice but
to open Door 3, right?
And then he'll offer you,
do you wanna switch to Door 2 or not?
So in this case, he has no choice,
so this has a probability of 1.
And then lastly, we picked Door 1,
car is behind Door 3.
Monty Hall has no choice but
to open Door 2 to reveal a goat, right?
So that is probability 1, Door 2.
Okay, so now how do we actually use
this tree diagram to get the answer?
Well, we just have to consider cases.
So suppose that Monty Hall opens,
just for example,
suppose he opens Door 2, okay?
That means I'm just, basically,
I'm gonna show you how do you do
conditional probability
in terms of a tree, okay?
So what does it mean?
Suppose we condition on the fact
that Monty opened Door 2.
That means that it must be that we took
this path here, I'll circled that one.
Or we took this path here, right?
These two are now irrelevant, just like
when I was drawing the pebble diagram.
And when you condition on something,
you delete from your space everything
that you can now rule out, right?
You rule out everything that's
inconsistent with what you observe.
And whatever's left would
be these two cases.
And now let's just see what happens.
The probability of this
path from here to here,
I'm just multiplying cuz
I'm assuming independent.
Well, I'm assuming that one-half is the
probability of going from here to here if
you're at here.
So one-third times one-half is one-sixth.
And for this branch,
one-third times 1 is one-third.
And remember what I said when we were
doing conditional probabilities using
the pebble world perspective.
We deleted all the pebbles that
are irrelevant, and then we renormalized,
right, to make the total mass equal to 1.
So if we wanna renormalize one-sixth and
one-third, what we should
do is just multiply by 2.
Because then I'm making it two-thirds,
one-third.
Now I've renormalized so
that they add up to 1.
So what that says is that,
conditional on Monty Hall opening Door 2,
there is a two-thirds chance now
that the car is behind Door 3.
And there's a one-third chance
that it's behind Door 1.
So therefore, if we switch,
we have a two-thirds chance of success.
So what this thing, the circling,
just showed was that the probability
of success, if switching,
Given that Monty opens Door 2,
cuz I was assuming he opened Door 2.
And we just showed just by this tree
diagram that this is = two-thirds.
Of course, you could do the same
thing if he opens Door 3.
Just circle the other two, same thing
again, and it's still two-thirds, okay?
So that's one way to think of it,
just looking at what
are the different possibilities.
And let me just say, and
what does this say intuitively?
What this says intuitively is one-third of
the time, your initial guess is correct.
And then you would fail if you switched,
right?
No matter what happened,
you would fail cuz you got it right.
But that's only one-third of the time.
The other two-thirds of the time
your initial guess is wrong.
Monty opens up a goat door,
and then he should switch.
So two-thirds of the time you should
switch, I mean, you should always switch.
Cuz two-thirds of the time
you'll get it right.
And the one-third where you're getting
right is where you were initially correct,
but that's only one-third of the time,
okay?
So that's a tree diagram,
which I think is useful.
But it's also useful to
just see how can we do this
as a conditional probability argument.
You can do this problem using Bayes' rule.
But actually, I would prefer to just
use the law of total probability here.
So let's do this by using the law
of total probability, LOTP.
When we're using the law
of total probability,
the key step is deciding
what to condition on, okay?
And one of the really nice
things about statistics and
probability that's basically
unique to this subject is,
with most mathematical subjects, if you
have a problem And you're stuck, right?
You can't just say well I wish that I
knew this and I wish that I knew that.
Well, you can say that, but
it doesn't help you, right?
In probability you think I wish
I knew this, I wish I knew that.
That's giving you a hint as to
what you should condition on, and
then you just condition on it.
And you act as if you did know that.
Okay?
So
I didn't name the law of
total probability, but
if I had I would have just called
it wishful thinking, all right?
It's like what do we wish that we knew?
Well, I don't know about you, but for
me the first thing I think of if someone
asks me, what do I wish that I knew I
wish I knew where the car was, right?
So it's a pretty obvious
thing to think about.
That's what we wish we knew,
so we wish we know car door.
So that's all we wish.
We wish we knew where the car is,
so we're gonna condition on that.
That's all, okay?
So So we need to define some events.
I'll just say, let S be
the event that we succeed.
Succeed, assuming that we're
using the switching strategy.
So I'm gonna assume that
we're gonna switch.
And I'm gonna see what the probability
that we succeed by that strategy.
So assuming our strategy is always switch.
You know Monty Hall's gonna give us
the chance to switch, and we'll say yes.
That's our strategy, and
we wanna see what's the probability
of success of that strategy.
Okay, and then let's just let, Dj be
the event that Door j has the car.
So I'm using D for Door, Door j has car.
Where j is just 1, 2, or 3.
Okay so that's just some simple notation.
Now let's do a calculation of
the probability of success.
Law of total probabilities just says
well condition on which door has
the car that's the only wish we knew.
So to do that that's just P(S)=P(S|D1)
which is 1/3 + P(S|D2) which
is 1/3 + P(S|D3) times 1/3.
So it's P(S|D1), right?
But P of D1 is the prior probability.
That door one has the car and we're
assuming that there are equally likely to
start with so it's just 1/3,
1/3, 1/3 for these waits.
Okay now what's this?
D1, I'm just assuming, remember we're
assuming well we pick door one so
that means we got it right initially.
But we switch, so that's bad.
This is a bad case.
So this is gonna be 0.
It's never gonna work in that case.
Now in this case, we picked door 1.
The car is behind door 2.
Monty Hall will open door 3,
and we should switch, right?
So, we have probability of success 1 here,
1 times 1/3.
Similarly in this case, it's gonna work.
So, that's also 1 = 2/3.
So, it's actually a very easy calculation,
right?
Because these problems are very, very easy
to compute once we know where the car is.
So, it's 2/3.
There's one slightly subtle
point here which is that
this is the unconditional probability
that our strategy will be successful.
And you could say well what if we
wanted the conditional probability,
given that Monty Hall opened door 2,
is that gonna be different?
But in this case, I defined it
symmetrically so by symmetry we also have.
We can make up a notation for
which which door Monty Hall opens.
P(S| Monte opens
2)= 2/3.
You could write out another law of total
probability calculation if you want,
but all it's a condition on whether
Monty Hall opens door 2 or door 3.
But by symmetry it couldn't
be because from door 2 and
door 3 are completely symmetrical
until Monty opens one of them.
So that means both the conditional and
the unconditional probabilities
of success are 2/3.
Now in the extension on the strategic
practice that I mentioned,
the unconditional
probability is still 2/3.
But the conditional probability changes
because in that problem I said that
Monty Hall is too lazy to walk over and
open door 3 unless he has to or
that kind of thing.
Then there's an asymmetry but
in this version It's symmetrical,
so conditional and
unconditional works the same way.
So okay so we got 2/3.
And they're nice,
you can easily find applets online,
I put a link on the webpage for
a nice one that the New York Times had,
if you wanted to try it out.
There's one thing that, so
just to tell you a little bit about
the controversy over this problem.
That controversy started raging
when Marilyn vos Savant,
who writes a column in Parade Magazine.
Someone wrote in and asked this question.
It was not 100% explicitly specified,
all of the assumptions.
But it was fairly clear that
this was what was intended.
And she basically gave the correct answer.
And then thousands of people started
writing in telling her that she was wrong.
Telling her that she was stupid,
including people with PhD's in math and
just writing all these scathing,
nasty letters to her and
that controversy was raging for
a long time.
And one thing that I find
ridiculous about that is
well okay so most people's intuition
about this problem are wrong.
But if you think about it carefully you'll
get the right answer if you have some
familiarity with conditional probability.
But even for
people who had no understanding of
any of the stuff I was doing here.
This is a problem that and
this is true in statistics in general,
you could just simulate it, right?
It's very, very easy to try it
out with a friend, 3 doors.
You can just do it with cups and
props and whatever.
You can also do it on a computer, very
very easily, write a little simulation.
Try it out 1,000 times,
just via computer simulation, and
you'll see that if you switch,
you'll win 2/3 of the.
So I don't understand how people still
continue to argue with that, when you just
try it out, simulate it, you'll see 2/3
of the time you succeed by switching.
I mean it's just It's just
kind of mind boggling but
maybe some of the math PhD's
didn't want to actually try
it out cuz they thought they
proved that it 1/2 or something.
Anyway, so that's the Monty Hall problem.
I wanted to mention one other intuition
for this, that's kind of unusual.
Usually when we have a complicated
problem, the suggestion would be,
look at a simpler case.
But I did emphasize the fact that
it's useful to consider simple and
extreme cases.
So an extreme case here would
be what if instead of 3 doors,
what if we considered the Monty Hall
problem with a million doors?
So you pick one of those one million doors
and then Monty Hall proceeds to open
999,998 doors leaving just
one door should you switch.
In that case, I've never met anyone
who would not switch, right.
I've never met anyone like that.
That because with a million
doors you're extremely confident
that your initial guess is wrong and
you're extremely confident that that
one remaining door has the car.
Conceptually though, there's no
difference between that and this.
It's like three doors or
a million doors, it's the argument for
one-half, one-half here would apply
in the same way with a million doors.
It's just in that case everyone sees
intuitively that it's ridiculous and
here people don't have that intuition.
So anyways,
that's another intuition from the problem.
All right, so
that's the Monty Hall problem.
You should look at the strategic practice
problem, and on the homework three,
there's an extension with more than
three doors that you can think about.
I'm talking on that problem about the case
where Monty Hall Possibly leaves more than
one door still unopened.
But anyway, you can work on that later.
But this is just an introduction
to the Monty Hall.
So, that's a very famous fun problem,
but I think it also illustrate a lot
about how to think conditionally, right?
Like the tree is useful this perspective,
it's just the problem is really easy when
you set it up this way, but if you just
try to just apply naive intuition then
most people get this completely wrong.
Okay.
So let's do another example.
So this example.
Is called Simpson's paradox.
It's another kind of notorious,
Problem or paradox that comes up.
It comes up a lot in everyday life.
And it's another problem where
most people see it the first time,
it seems impossible and
then they think they understand it,
then change a few things around and
then they fall for it again.
So, I love paradoxes in general so
we'll be seeing a lot of
paradoxes in this course.
So the best way,
I mean I could start making up some
abstract notation and stuff, but
the best way to start understanding
Simpson's paradox is to see an examples.
So I'll write down an example,
and then we can discuss that and
then we can try to write out with
the more general thing that's going on.
By the way, there is no such thing
as a true paradox, a real paradox.
If there were a real paradox,
we'd have a contradiction,
and the universe would explode,
and we would not be here.
So there are no paradoxes
in the literal sense.
What it is though, is something that's
deeply counter-intuitive to most people.
Which means it forces you to think harder.
And if you think hard enough,
then eventually, you understand
that actually, it does make sense.
But you have to think hard, okay?
So here's the problem, one example.
Is it possible to have two
doctors where the first
doctor has a higher success
rate at every single
possible type of surgery
imaginable than the second one?
Name any surgery, first doctor is more
successful than the second doctor,
measured in terms of percentage
of successful outcome.
Yet the second doctor, overall,
has a higher success rate.
That's the basic question.
Okay, well Simpson's paradox
says that that's possible.
And at first that sounds
wrong to most people.
Because it sounds like if
person a is better than person b in
every single category then when you
aggregate those categories together
it's not gonna somehow flip, right?
Except that's wrong since this
paradox says it can flip.
The signs of inequalities can flip
when you aggregate data together.
So one thing seems better than
another in every individual case.
You add up all those cases to get the
total, and then it flips which way it is?
So that sounds really weird at first, but
I'm just gonna illustrate
through a very simple example.
And if you think carefully
about this simple example,
then you can see what's really going
on there, why is that possible.
Okay, so
I like to illustrate Simpson's paradox
through examples that I make
up based on the Simpsons.
As I've been watching the Simpsons
since I was a kid and I like the show.
And it helps me remember the example
that the Simpsons paradox.
So I don't know how many of you watch the
Simpsons, but it doesn't matter if you do.
On the Simpsons there are two doctors Dr.
Hibbert and Dr. Nick, okay?
The data that I'm gonna
give you are made up.
They don't actually tell you the
percentage of success on the show, okay?
But anyway, Dr. Hibbert is kind
of the reasonably well respected
town doctor that everyone who can afford
to goes to, but he's gonna charge a lot.
Doctor Nick is like the guy
with the infomercials on TV
who advertises that he'll perform
any surgery for 129.99, okay?
So probably most
people would consider Dr.
Hibbert to be the better doctor and Dr.
Nick is kind of like a a cheap,
quack doctor, okay?
Now, I made up some numbers.
So, suppose for simplicity,
to understand the paradox,
we only need to consider
two types of surgeries.
Once you understand what happens with two
surgeries you can easily see what would
happen if you had like 50
different types of surgery.
But to understand that the crucial
phenomenon, remains what is assumed
that there's two doctors,
two different types of surgeries, okay?
So I made up some numbers, and
we can summarize everything in
terms of two, two by two tables.
Okay?
One table for Dr. Hibbert.
And one table for Dr. Nick.
And I'm gonna fill in some numbers.
So these tables represent two types
of surgeries and success or failure.
So let's just write success here.
First row is for success.
Second row is for failure.
So each doctor performs
a certain number of surgeries.
And to simplify things,
I think I assumed that each doctor
performs 100 surgeries total.
So there's nothing going on where you'd
say one doctor is doing a lot more
surgeries total.
This is success failure for Dr. Nick.
And let's assume that there
are only two types of surgeries,
one is heart surgery and the other one is,
I just made up and example,
a bandage removal.
I just tried to make up
something that seems hard.
I'm not a doctor obviously, well doctor
of philosophy but that doesn't count.
One's an easy surgery and
one's a difficult surgery, okay?
Like bandage removal I can do.
Now I can't do heart surgery.
So, all right, I made up some numbers.
So suppose that Dr.
Hibbert performed 90 heart surgeries and
succeeded 70 times, failed 20 times.
And suppose that he performed 10 bandage
removals and succeeded all 10 times.
Now, Dr.
Nick performed 10 heart surgeries.
Succeeded 2, failed 8 times,
that's kind of sad.
He performed 90 bandage removals,
he succeeded 81 times, he failed 9 times.
Some how,
he failed 9 times at bandage removal.
Now, how many of you would rather
go to Dr. Hibbert than Dr. Nick?
Would anyone here prefer to go to Dr.
Nick, based on this data?
No?
[COUGH] Okay, well, I would definitely
prefer to go to Dr. Hibbert.
But on the other hand,
you could argue from this that Dr.
Hibbert is successful 80% of the time.
And Dr. Nick had 83 successes.
So on his infomercial,
he can say he succeeded 83%
of the time and
that 3% better than Dr. Hibbert.
And yet you're paying so much less for
a 3% higher success rate.
So what did we do to get 80 and 83?
We were just aggregating, right?
Here I listed out separately by surgery.
Here we aggregated.
Notice that the direction flipped, okay?
So for each individual surgery, Dr.
Hibbert is better.
So that's conditional, right?
So it's the difference between
conditional and unconditional.
Conditional on which type of surgery.
If you condition on performing heart
surgery, you'd rather have Dr. Hibbert.
If you condition on having band-aid
removal, you'd rather have Dr.
Hibbert, right?
But unconditionally, Dr.
Nick has a higher percentage rate.
And the reason, you can see it here.
I just made this up as kind of an extreme
example so that you get some intuition.
You see what's going on is that 90% of Dr.
Nick's surgeries are band-aid removals.
Well, band-aid removal is so much easier
than heart surgery that it would be very
easy for him to get a higher success rate
because he's doing easier surgeries.
And you can see, of course,
this is just made up.
But in real examples you can imagine
that maybe the most famous and
most renowned best neurosurgeons in
the world, could be the that their
success rate isn't as great because
they're getting the hardest cases right?
They're getting the cases that no
one else knows how to deal with.
So you refer to the world's
leading expert, and
that's gonna hurt their success rate,
right?
So that's what's going on.
That's an example of Simpson's paradox.
I think when you see this example
it's clear how that can happen.
And yet, in problems that are completely
the same, just in a different language
different setting,
then it sounds like it's impossible.
How can it flip like that?
Okay?
So let me explain it a couple other ways
and mention a couple other examples.
Another way to think of Simpson's paradox,
well here's another example.
[COUGH] In baseball,
it's possible to have two players
where the first player has
a higher batting average.
A batting average is just what
percentage of times they were at
bat that they got a hit.
One player has a higher batting average
for the first half of the season and for
the second half of the season.
Yet, if you look at the whole season,
the second player has
a higher batting average.
And again,
that sounds impossible at first.
If you are better for the first
half of the season and better for
the second half of the season how could
you be worst for the whole season?
But you can see that it's basically the
same problem is that you can easily make
up some numbers to illustrate that.
So it can flip.
What's really going on here,
another way to look at it, is just
in terms of adding fractions, okay?
So if you had one-third plus two-fifths.
Well, let's see,
I haven't added any fractions in a while.
But I think you're not supposed
to just add the numerators and
add the denominators, right?
This is not equal to three-eighths, right?
That's our inequality for the day.
If you haven't studied adding fractions
before, when you're learning for
the first time,
that's kinda the obvious thing to do.
Add the numerators.
Add the denominator.
Well, that doesn't actually work, right?
However, if you add fractions this way,
that is closer to how you aggregate,
right?
Because we're just adding up successes and
adding up trials, right?
So that is sort of how we do
the aggregation here, right?
Add up the successes.
Add up the total number of trials,
that kind of thing.
So if fractions were added this way,
then Simpson's paradox would not occur.
But since that's not how we add fractions,
it doesn't happen.
Okay, they're a lot of other examples
of Simpson's paradox if you just
look at the Wikipedia entry, and
you can find as many as you want.
Examples that happen in real life and
actually have policy implications and
legal.
There are some interesting legal
cases involving Simpson's paradox.
Let me talk about how do we express
Simpson's Paradox in terms of
conditional probability?
So, just to kind of map this
example into some events.
Let's let A be the event that surgery,
suppose someone's going to have surgery,
and let A be the event that
the surgery is successful.
And then we'll try to write out in
probability notation what is this example
saying?
Okay?
So let A be the event that
the surgery is successful.
Let B be the event that
we're treated by Dr. Nick.
So, B complement would correspond to Dr.
Hibbert.
So we don't need separate notation for
that.
And then we need one more event,
which is what type of surgery?
So let's call that C.
C be the event that we have heart surgery.
So C complement is band-aid removal, okay?
So those are the events that we need.
And let's just write out,
in probability notation,
what are these tables telling us?
Okay?
Well, what's it saying?
It's says the probability of success
given that we're treated by Dr.
Nick, and we're having heart surgery,
is less than the probability
that the surgery is successful given that
we have Dr. Herbert for heart surgery.
Right, that's just comparing
the two heart surgeries.
Now, what about for band-aid removal?
The probability of success for
band-aid removal, again,
it's less likely to have success with Dr.
Nick, than Dr. Hibbert.
So this is just gonna be A given
B complement, C complement.
Okay?
So those are the two statements we have.
But overall, when you aggregate,
and aggregating means
don't condition on the type of surgery,
then we're just looking at the overall.
So overall, the probability of A given B,
that is success,
given that you're being treated by Dr.
Nick, is greater than
the probability of success,
given that you're treated by Dr. Hibbert.
So notice, it looks like these
expressions look similar.
And it looks like if you
combine these two cases,
all these are A given B on the left,
and we have the two cases.
Given C, given C complement.
Looks like you combine
those cases you get this.
But if you add up or
combine these inequalities somehow,
then you would guess that it would remain
less than but then the inequality flipped.
So, Simpson's Paradox says
that this is possible.
This is an explicit example
showing that that's possible.
C is called a confounder.
I chose the letter C cuz it can stand for
confounder or control.
It's something you wanna control for.
In this problem,
it seems a lot more relevant to condition
on what type of surgery you have, right?
So the more relevant comparison would be
the more conditional one in this case.
And everyone here agrees that Dr.
Hibbert is better, that means we
should condition on more things.
If we fail to condition on C, though,
we can get a very misleading answer here,
right?
Because of the fact that if we don't
condition on the type of heart surgery,
what happens is knowing
that we got treated by Dr.
Nick gives us information about
what type of surgery we had.
Which then in turn affects
the probability of success.
So that's Simpson's Paradox,
and this is just kinda like the general
setting of Simpson's Paradox.
So basically any example of Simpson's
Paradox can be written in this form.
So I would take the definition of
Simpson's Paradox is whenever you have
these inequalities but
it flips when you aggregate in this way.
That's a concrete example.
That's just the generic setting.
Okay, so you should try to think
intuitively about why is this possible,
by thinking about examples.
Think about why this is possible.
And think about you know should
you be conditioning on C.
And then there's one other thing
that I wanted to mention about this,
just to go into a little bit more
of the math of those equations,
why is that actually possible?
In the sense that the first
time that I saw this,
I thought that from here and here,
I thought I could probably prove this.
Because it looks like it's gonna
be the law of total probabilities.
So I'm gonna show you what goes wrong.
Obviously it's not gonna work
to prove less than here,
because we just saw an example
that it could go this way.
But what if you were trying
to prove that this and
this implies this with a less than,
will it go wrong?
I just wanted to show you quickly what
would be wrong with that argument.
So the obvious argument to
make is I wanna A\B to A\B,C
that means we need to use
the law of total probability.
So by the law of total probability,
I would be conditioning on C.
This is just the conditional form
of the law of total probability.
Conditional probabilities are
probabilities, so I can do Bayes' rule,
law of total probability, anything I want,
it's gonna work the same way,
just everything's given B.
So I'm conditioning on C,
whether C happened or not.
But everything is still given
B is the conditional form,
+ A given B, C complement,
P(C) complement given B, okay?
So if you deleted the given Bs everywhere,
this would just be exactly the law
of total probability, right?
And so therefore this is true,
cuz conditional probabilities
are probabilities, so
it's still true with given Bs.
Now if we compare this with what we had,
we know that A given BC,
this thing is less
than P(A given B complement in C).
And this thing is less than P(A
given B complement and C complement.
So you would then you
would wanna plug those in.
But at that point in the calculation,
you're not gonna be able
to reduce it down to this.
It's because of this term and
this term, we might call those weights.
Those are conditional on B, right?
And if we tried to write the same
thing for P(A given B complement),
then we would have B complement here,
and B complement there.
But we don't have any way to relate
those weights for one case or
the other, so it's not gonna follow.
[COUGH] So
it's important to contrast this with
the law of total probability, basically.
Intuitively, what this term is in that Dr.
Nick case,
this is the probability of performing
heart surgery for Dr. Nick, which is
very different from the probability
of heart surgery for Dr Hibbert.
So the weights change, and that's what
enables Simpson's Paradox to happen.
Okay, so let me mention one or
two more examples of Simpson's Paradox.
So here's another one that's a pretty
famous one that involved it.
A court case, so
what happened was that [COUGH] there
was a lawsuit against
UC Berkeley claiming a sex
discrimination in admissions
to their graduate programs.
That is claiming that UC Berkeley
was discriminating against women for
admissions to graduate programs.
And something interesting happened,
in the sense that when
you kind of looked at kind of
the overall admissions rates,
it looked like UC Berkeley
was making it easier for
men to get in than women for grad school.
So then it seems to be a clear-cut
case of discrimination.
When you look at the data more closely
than that, then something different
turned up, which was the fact that if
you look at each individual department.
Cuz when you were apply to grad school,
you apply to a specific department, right?
For each individual department,
there was not clear evidence, right?
So each individual departmency,
I'm not saying they were all exactly fair.
But each individual department generally
did not seem to be discriminating.
But when you aggregated all
of the departments together,
then it seemed like it
was very unfair to women.
And the reason is that certain
departments are more popular for
women to apply to relative to men.
And certain departments are harder
to get into than others and
somehow those things are what led
to Simpson's Paradox occurring.
All right,
one more example of Simpson's Paradox.
This is actually the first one.
I love paradoxes for a long time.
And this was the first one I ever saw, and
I still find it helpful to think about it.
Let's see how does it work?
We have two jars like
that with jelly beans.
So though, if I were phrasing it now,
I would use gummy bears.
But I first learned this
problem with jelly beans.
There's two types of jelly beans.
Let's say open circles and closed circles,
and two flavors of jelly beans, okay?
In these two jars, and
there's two more jars here, okay?
And there's more jelly beans.
I'm not gonna make up numbers.
All I'm saying is that there are two types
of jelly beans in each of these four jars.
Okay, now suppose that this, suppose that
you like one flavor better than another.
So you get to choose a random jelly
bean from this jar or this jar, right?
But you just have to reach in, and
you can't actually pick
which color you're getting.
So you just pick a random jelly
bean from this jar or this jar.
And suppose this jar is
better than this jar.
By greater than,
I mean that this jar has a higher
percentage of the one you like, right?
And suppose this jar you can
make up your own numbers.
It's actually good practice to make up
your own specific numbers for this.
Suppose that this jar is
better than this jar, okay?
That is, a higher percentage
of the jelly mean you like.
Now suppose that someone then created
a bigger jar and then created a bigger.
I should've used a board where
I could write this below.
These two jars,
just dumped everything into one big jar.
So this one and
this one all got dumped into here.
And this one and
this one all got dumped into here, okay?
So I combined the two better jars and
I combined the two worse jars.
Well, naively you'd think, well,
I combine the better ones,
that's gonna be better, okay?
But then it could flip, and
after you aggregate the jars,
suddenly this one has a higher
percentage of your favorite jelly beans.
You should make up your own example, your
own numbers showing all that I can have.
And I'll give you a lot of intuition for
Simpson's Paradox.
Okay, so that's all for
today, have a good weekend
