The following content is
provided under a Creative
Commons license.
Your support will help MIT
OpenCourseWare continue to
offer high quality educational
resources for free.
To make a donation or view
additional materials from
hundreds of MIT courses, visit
MIT OpenCourseWare at
ocw.mit.edu
JOHN TSISIKLIS: So here's
the agenda for today.
We're going to do a
very quick review.
And then we're going
to introduce some
very important concepts.
The idea is that all
information is--
Information is always partial.
And the question is what do we
do to probabilities if we have
some partial information about
the random experiments.
We're going to introduce the
important concept of
conditional probability.
And then we will see three
very useful ways
in which it is used.
And these ways basically
correspond to divide and
conquer methods for breaking
up problems
into simpler pieces.
And also one more fundamental
tool which allows us to use
conditional probabilities to do
inference, that is, if we
get a little bit of information
about some
phenomenon, what can we
infer about the things
that we have not seen?
So our quick review.
In setting up a model of a
random experiment, the first
thing to do is to come up with
a list of all the possible
outcomes of the experiment.
So that list is what we
call the sample space.
It's a set.
And the elements of the
sample space are all
the possible outcomes.
Those possible outcomes must be
distinguishable from each other.
They're mutually exclusive.
Either one happens or the other
happens, but not both.
And they are collectively
exhaustive, that is no matter
what the outcome of the
experiment is going to be an
element of the sample space.
And then we discussed last
time that there's also an
element of art in how to choose
your sample space,
depending on how much detail
you want to capture.
This is usually the easy part.
Then the more interesting part
is to assign probabilities to
our model, that is to make some
statements about what we
believe to be likely and what
we believe to be unlikely.
The way we do that is by
assigning probabilities to
subsets of the sample space.
So as we have our sample space
here, we may have a subset A.
And we assign a number to that
subset P(A), which is the
probability that this
event happens.
Or this is the probability that
when we do the experiment
and we get an outcome it's the
probability that the outcome
happens to fall inside
that event.
We have certain rules that
probabilities should satisfy.
They're non-negative.
The probability of the overall
sample space is equal to one,
which expresses the fact that
we're are certain, no matter
what, the outcome is going
to be an element
of the sample space.
Well, if we set the top right
so that it exhausts all
possibilities, this should
be the case.
And then there's another
interesting property of
probabilities that says that,
if we have two events or two
subsets that are disjoint, and
we're interested in the
probability, that one or the
other happens, that is the
outcome belongs to A or belongs
to B. For disjoint
events the total probability of
these two, taken together,
is just the sum of their
individual probabilities.
So probabilities behave
like masses.
The mass of the object
consisting of A and B is the
sum of the masses of
these two objects.
Or you can think of
probabilities as areas.
They have, again, the
same property.
The area of A together with B is
the area of A plus the area
B.
But as we discussed at the end
of last lecture, it's useful
to have in our hands a more
general version of this
additivity property, which says
the following, if we take
a sequence of sets--
A1, A2, A3, A4, and so on.
And we put all of those
sets together.
It's an infinite sequence.
And we ask for the probability
that the outcome falls
somewhere in this infinite
union, that is we are asking
for the probability that the
outcome belongs to one of
these sets, and assuming that
the sets are disjoint, we can
again find the probability for
the overall set by adding up
the probabilities of the
individual sets.
So this is a nice and
simple property.
But it's a little more subtle
than you might think.
And let's see what's going
on by considering
the following example.
We had an example last time
where we take our sample space
to be the unit square.
And we said let's consider a
probability law that says that
the probability of a subset is
just the area of that subset.
So let's consider this
probability law.
OK.
Now the unit square is
the set --let me just
draw it this way--
the unit square is the union of
one element set consisting
all of the points.
So the unit square is made up
by the union of the various
points inside the square.
So union over all x's and y's.
OK?
So the square is made
up out of all the
points that this contains.
And now let's do
a calculation.
One is the probability of our
overall sample space, which is
the unit square.
Now the unit square is the union
of these things, which,
according to our additivity
axiom, is the sum of the
probabilities of all of these
one element sets.
Now what is the probability
of a one element set?
What is the probability of
this one element set?
What's the probability that our
outcome is exactly that
particular point?
Well, it's the area of that
set, which is zero.
So it's just the sum of zeros.
And by any reasonable
definition the
sum of zeros is zero.
So we just proved that
one is equal to zero.
OK.
Either probability theory is
dead or there is some mistake
in the derivation that I did.
OK, the mistake is quite
subtle and it
comes at this step.
We're sort of applied the
additivity axiom by saying
that the unit square is the
union of all those sets.
Can we really apply our
additivity axiom.
Here's the catch.
The additivity axiom applies
to the case where we have a
sequence of disjoint events
and we take their union.
Is this a sequence of sets?
Can you make up the whole unit
square by taking a sequence of
elements inside it and cover
the whole unit square?
Well if you try, if you start
looking at the sequence of one
element points, that sequence
will never be able to exhaust
the whole unit square.
So there's a deeper reason
behind that.
And the reason is that infinite
sets are not all of
the same size.
The integers are an
infinite set.
And you can arrange the integers
in a sequence.
But the continuous set
like the units
square is a bigger set.
It's so-called uncountable.
It has more elements than
any sequence could have.
So this union here is not of
this kind, where we would have
a sequence of events.
It's a different
kind of union.
It's a Union that involves a
union of many, many more sets.
So the countable additivity
axiom does not
apply in this case.
Because, we're not dealing
with a sequence of sets.
And so this is the
incorrect step.
So at some level you might think
that this is puzzling
and awfully confusing.
On the other hand, if you think
about areas of the way
you're used to them from
calculus, there's nothing
mysterious about it.
Every point on the unit
square has zero area.
When you put all the points
together, they make up
something that has
finite area.
So there shouldn't be any
mystery behind it.
Now, one interesting thing that
this discussion tells us,
especially the fact that the
single elements set has zero
area, is the following--
Individual points have
zero probability.
After you do the experiment and
you observe the outcome,
it's going to be an
individual point.
So what happened in that
experiment is something that
initially you thought had zero
probability of occurring.
So if you happen to get some
particular numbers and you
say, "Well, in the beginning,
what did I think about those
specific numbers?
I thought they had
zero probability.
But yet those particular
numbers did occur."
So one moral from this is that
zero probability does not mean
impossible.
It just means extremely,
extremely unlikely by itself.
So zero probability
things do happen.
In such continuous models,
actually zero probability
outcomes are everything
that happens.
And the bumper sticker version
of this is to always expect
the unexpected.
Yes?
AUDIENCE: [INAUDIBLE].
JOHN TSISIKLIS: Well,
probability is supposed to be
a real number.
So it's either zero or it's
a positive number.
So you can think of the
probability of things just
close to that point and those
probabilities are tiny and
close to zero.
So that's how we're going to
interpret probabilities in
continuous models.
But this is two chapters
ahead.
Yeah?
AUDIENCE: How do we interpret
probability of zero?
If we can use models that
way, then how about
probability of one?
That it it's extremely
likely but not
necessarily for certain?
JOHN TSISIKLIS: That's
also the case.
For example, if you ask in this
continuous model, if you
ask me for the probability that
x, y, is different than
the zero, zero this is
the whole square,
except for one point.
So the area of this is
going to be one.
But this event is not entirely
certain because the zero, zero
outcome is also possible.
So again, probability of one
means essential certainty.
But it still allows the
possibility that the outcome
might be outside that set.
So these are some of the weird
things that are happening when
you have continuous models.
And that's why we start to
this class with discrete
models, on which would
be spending the
next couple of weeks.
OK.
So now once we have set up our
probability model and we have
a legitimate probability law
that has these properties,
then the rest is
usually simple.
Somebody asks you a question of
calculating the probability
of some event.
While you were told something
about the probability law,
such as for example the
probabilities are equal to
areas, and then you just
need to calculate.
In these type of examples
somebody would give you a set
and you would have
to calculate the
area of that set.
So the rest is just calculation
and simple.
Alright, so now it's time
to start with our main
business for today.
And the starting point
is the following--
You know something
about the world.
And based on what you know when
you set up a probability
model and you write down
probabilities for the
different outcomes.
Then something happens, and
somebody tells you a little
more about the world, gives
you some new information.
This new information, in
general, should change your
beliefs about what happened
or what may happen.
So whenever we're given new
information, some partial
information about the outcome
of the experiment, we should
revise our beliefs.
And conditional probabilities
are just the probabilities
that apply after the revision
of our beliefs, when we're
given some information.
So lets make this into
a numerical example.
So inside the sample space, this
part of the sample space,
let's say has probability 3/6,
this part has 2/6, and that
part has 1/6.
I guess that means that out here
we have zero probability.
So these were our initial
beliefs about the outcome of
the experiment.
Suppose now that someone
comes and tells you
that event B occurred.
So they don't tell you the
full outcome with the
experiment.
But they just tell you that the
outcome is known to lie
inside this set B.
Well then, you should certainly
change your beliefs
in some way.
And your new beliefs about what
is likely to occur and
what is not is going to be
denoted by this notation.
This is the conditional
probability that the event A
is going to occur, the
probability that the outcome
is going to fall inside the set
A given that we are told
and we're sure that the event
lies inside the event B Now
once you're told that the
outcome lies inside the event
B, then our old sample space
in some ways is irrelevant.
We have then you sample space,
which is just the set B. We
are certain that the outcome
is going to be inside B.
For example, what is this
conditional probability?
It should be one.
Given that I told you that B
occurred, you're certain that
B occurred, so this has
unit probability.
So here we see an instance of
revision of our beliefs.
Initially, event B had the
probability of (2+1)/6 --
that's 1/2.
Initially, we thought B
had probability 1/2.
Once we're told that B occurred,
the new probability
of B is equal to one.
OK.
How do we revise the probability
that A occurs?
So we are going to have the
outcome of the experiment.
We know that it's inside B. So
we will either get something
here, and A does not occur.
Or something inside here,
and A does occur.
What's the likelihood that,
given that we're inside B, the
outcome is inside here?
Here's how we're going
to think about.
This part of this set B, in
which A also occurs, in our
initial model was twice as
likely as that part of B. So
outcomes inside here
collectively were twice as
likely as outcomes out there.
So we're going to keep the same
proportions and say, that
given that we are inside the set
B, we still want outcomes
inside here to be twice as
likely outcomes there.
So the proportion of the
probabilities should be two
versus one.
And these probabilities should
add up to one because together
they make the conditional
probability of B. So the
conditional probabilities should
be 2/3 probability of
being here and 1/3 probability
of being there.
That's how we revise
our probabilities.
That's a reasonable, intuitively
reasonable, way of
doing this revision.
Let's translate what we
did into a definition.
The definition says the
following, that the
conditional probability of A
given that B occurred is
calculated as follows.
We look at the total probability
of B. And out of
that probability that was inside
here, what fraction of
that probability is assigned to
points for which the event
A also occurs?
Does it give us the same numbers
as we got with this
heuristic argument?
Well in this example,
probability of A intersection
B is 2/6, divided by total
probability of B, which is
3/6, and so it's 2/3, which
agrees with this answer that's
we got before.
So the former indeed matches
what we were trying to do.
One little technical detail.
If the event B has zero
probability, and then here we
have a ratio that doesn't
make sense.
So in this case, we say that
conditional probabilities are
not defined.
Now you can take this definition
and unravel it and
write it in this form.
The probability of A
intersection B is the
probability of B times the
conditional probability.
So this is just consequence of
the definition but it has a
nice interpretation.
Think of probabilities
as frequencies.
If I do the experiment over and
over, what fraction of the
time is it going to be the case
that both A and B occur?
Well, there's going to be a
certain fraction of the time
at which B occurs.
And out of those times when B
occurs, there's going to be a
further fraction of
the experiments in
which A also occurs.
So interpret the conditional
probability as follows.
You only look at those
experiments at which
B happens to occur.
And look at what fraction of
those experiments where B
already occurred, event
A also occurs.
And there's a symmetrical
version of this equality.
There's symmetry between the
events B and A. So you also
have this relation that
goes the other way.
OK, so what do we use these
conditional probabilities for?
First, one comment.
Conditional probabilities
are just like ordinary
probabilities.
They're the new probabilities
that apply in a new universe
where event B is known
to have occurred.
So we had an original
probability model.
We are told that B occurs.
We revise our model.
Our new model should still be
legitimate probability model.
So it should satisfy all sorts
of properties that ordinary
probabilities do satisfy.
So for example, if A and B are
disjoint events, then we know
that the probability of A
union B is equal to the
probability of A plus
probability of B. And now if I
tell you that a certain event C
occurred, we're placed in a
new universe where
event C occurred.
We have new probabilities
for that universe.
These are the conditional
probabilities.
And conditional probabilities
also satisfy
this kind of property.
So this is just our usual
additivity axiom but the
applied in a new model, in which
we were told that event
C occurred.
So conditional probabilities
do not taste or smell any
different than ordinary
probabilities do.
Conditional probabilities, given
a specific event B, just
form a probability law
on our sample space.
It's a different probability
law but it's still a
probability law that has all
of the desired properties.
OK, so where do conditional
probabilities come up?
They do come up in quizzes
and they do
come up in silly problems.
So let's start with this.
We have this example
from last time.
Two rolls of a die, all possible
pairs of roles are
equally likely, so every element
in this square has
probability of 1/16.
So all elements are
equally likely.
That's our original model.
Then somebody comes and tells us
that the minimum of the two
rolls is equal to zero.
What's that event?
The minimum equal to zero can
happen in many ways, if we get
two zeros or if we
get a zero and--
sorry, if we get two
two's, or get a two
and something larger.
And so the is our new event B.
The red event is the event B.
And now we want to calculate
probabilities
inside this new universe.
For example, you may be
interested in the question,
questions about the maximum
of the two rolls.
In the new universe, what's
the probability that the
maximum is equal to one?
The maximum being equal to
one is this black event.
And given that we're told that
B occurred, this black events
cannot happen.
So this probability
is equal to zero.
How about the maximum
being equal to two,
given that event B?
OK, we can use the
definition here.
It's going to be the probability
that the maximum
is equal to two and B occurs
divided by the probability of
B. The probability that the
maximum is equal to two.
OK, what's the event that the
maximum is equal to two?
Let's draw it.
This is going to be
the blue event.
The maximum is equal to
two if we get any
of those blue points.
So the intersection of the two
events is the intersection of
the red event and
the blue event.
There's only one point in
their intersection.
So the probability of
that intersection
happening is 1/16.
That's the numerator.
How about the denominator?
The event B consists of five
elements, each one of which
had probability of 1/16.
So that's 5/16.
And so the answer is 1/5.
Could we have gotten this
answer in a faster way?
Yes.
Here's how it goes.
We're trying to find the
conditional probability that
we get this point, given
that B occurred.
B consist of five elements.
All of those five elements were
equally likely when we
started, so they remain equally
likely afterwards.
Because when we define
conditional probabilities, we
keep the same proportions
inside the set.
So the five red elements
were equally likely.
They remain equally likely
in the conditional world.
So conditional event B having
happened, each one of these
five elements has the
same probability.
So the probability that we
actually get this point is
going to be 1/5.
And so that's the shortcut.
More generally, whenever you
have a uniform distribution on
your initial sample space,
when you condition on an
event, your new distribution is
still going to be uniform,
but on the smaller events
of that we considered.
So we started with a uniform
distribution on the big square
and we ended up with a
uniform distribution
just on the red point.
Now besides silly problems,
however, conditional
probabilities show up in real
and interesting situations.
And this example is going
to give you some
idea of how that happens.
OK.
Actually, in this example,
instead of starting with a
probability model in terms of
regular probabilities, I'm
actually going to define the
model in terms of conditional
probabilities.
And we'll see how
this is done.
So here's the story.
There may be an airplane flying
up in the sky, in a
particular sector of the sky
that you're watching.
Sometimes there is one sometimes
there isn't.
And from experience you know
that when you look up, there's
five percent probability that
the plane is flying above
there and 95% probability that
there's no plane up there.
So event A is the event that the
plane is flying out there.
Now you bought this wonderful
radar that's looks up.
And you're told in the
manufacturer's specs that, if
there is a plane out there,
your radar is going to
register something, a
blip on the screen
with probability 99%.
And it will not register
anything with
probability one percent.
So this particular part of the
picture is a self-contained
probability model of what your
radar does in a world where a
plane is out there.
So I'm telling you that the
plane is out there.
So we're now dealing with
conditional probabilities
because I gave you some
particular information.
Given this information that the
plane is out there, that's
how your radar is going to
behave with probability 99% is
going to detect it, with
probability one percent is
going to miss it.
So this piece of the picture
is a self-contained
probability model.
The probabilities
add up to one.
But it's a piece of
a larger model.
Similarly, there's the
other possibility.
Maybe a plane is not up there
and the manufacturer specs
tell you something about
false alarms.
A false alarm is the situation
where the plane is not there,
but for some reason your radar
picked up some noise or
whatever and shows a
blip on the screen.
And suppose that this happens
with probability ten percent.
Whereas with probability
90% your radar
gives the correct answer.
So this is sort of a model of
what's going to happen with
respect to both the plane --
we're given probabilities
about this -- and we're given
probabilities about how the
radar behaves.
So here I have indirectly
specified the probability law
in our model by starting with
conditional probabilities as
opposed to starting with
ordinary probabilities.
Can we derive ordinary
probabilities starting from
the conditional number ones?
Yeah, we certainly can.
Let's look at this event, A
intersection B, which is the
event up here, that there
is a plane and our
radar picks it up.
How can we calculate
this probability?
Well we use the definition of
conditional probabilities and
this is the probability of
A times the conditional
probability of B given A.
So it's 0.05 times 0.99.
And the answer, in
case you care--
It's 0.0495.
OK.
So we can calculate the
probabilities of final
outcomes, which are the leaves
of the tree, by using the
probabilities that
we have along the
branches of the tree.
So essentially, what we ended
up doing was to multiply the
probability of this
branch times the
probability of that branch.
Now, how about the answer
to this question.
What is the probability
that our radar is
going to register something?
OK, this is an event that can
happen in multiple ways.
It's the event that consists
of this outcome.
There is a plane and the radar
registers something together
with this outcome, there is no
plane but the radar still
registers something.
So to find the probability of
this event, we need the
individual probabilities
of the two outcomes.
For the first outcome, we
already calculated it.
For the second outcome, the
probability that this happens
is going to be this probability
95% times 0.10,
which is the conditional
probability for taking this
branch, given that there
was no plane out there.
So we just add the numbers.
0.05 times 0.99 plus 0.95
times 0.1 and the
final answer is 0.1445.
OK.
And now here's the interesting
question.
Given that your radar recorded
something, how likely is it
that there is an airplane
up there?
Your radar registering
something --
that can be caused
by two things.
Either there's a plane there,
and your radar did its job.
Or there was nothing, but your
radar fired a false alarm.
What's the probability that this
is the case as opposed to
that being the case?
OK.
The intuitive shortcut would
be that it should be the
probability--
you look at their relative odds
of these two elements and
you use them to find out how
much more likely it is to be
there as opposed
to being there.
But instead of doing this,
let's just write down the
definition and just use it.
It's the probability of A and
B happening, divided by the
probability of B. This is just
our definition of conditional
probabilities.
Now we have already found
the numerator.
We have already calculated
the denominator.
So we take the ratio of these
two numbers and we find the
final answer --
which is 0.34.
OK.
There's this slightly
curious thing that's
happened in this example.
Doesn't this number feel
a little too low?
My radar --
So this is a conditional
probability, given that my
radar said there is something
out there, that there is
indeed something there.
So it's sort of the probability
that our radar
gave the correct answer.
Now, the specs of our radar
we're pretty good.
In this situation, it gives
you the correct
answer 99% of the time.
In this situation, it gives
you the correct
answer 90% of the time.
So you would think
that your radar
there is really reliable.
But yet here the radar recorded
something, but the
chance that the answer that
you get out of this is the
right one, given that it
recorded something, the chance
that there is an airplane
out there is only 30%.
So you cannot really rely on
the measurements from your
radar, even though the specs of
the radar were really good.
What's the reason for this?
Well, the reason is that false
alarms are pretty common.
Most of the time there's
nothing.
And there's a ten percent
probability of false alarms.
So there's roughly a ten percent
probability that in
any given experiment, you
have a false alarm.
And there is about the five
percent probability that
something out there and
your radar gets it.
So when your radar records
something, it's actually more
likely to be a false
alarm rather than
being an actual airplane.
This has probability ten
percent roughly.
This has probability roughly
five percent
So conditional probabilities
are sometimes
counter-intuitive in terms of
the answers that they get.
And you can make similar
stories about doctors
interpreting the results
of tests.
So you tested positive for
a certain disease.
Does it mean that you have
the disease necessarily?
Well if that disease has been
eradicated from the face of
the earth, testing positive
doesn't mean that you have the
disease, even if the test
was designed to be
a pretty good one.
So unfortunately, doctors do get
it wrong also sometimes.
And the reasoning that
comes in such
situations is pretty subtle.
Now for the rest of the lecture,
what we're going to
do is to take this example where
we did three things and
abstract them.
These three trivial calculations
that's we just
did are three very important,
very basic tools that you use
to solve more general
probability problems.
So what's the first one?
We find the probability of a
composite event, two things
happening, by multiplying
probabilities and conditional
probabilities.
More general version of this,
look at any situation, maybe
involving lots and
lots of events.
So here's a story that event A
may happen or may not happen.
Given that A occurred, it's
possible that B happens or
that B does not happen.
Given that B also happens, it's
possible that the event C
also happens or that event
C does not happen.
And somebody specifies for you
a model by giving you all
these conditional probabilities
along the way.
Notice what we move along
the branches as the tree
progresses.
Any point in the tree
corresponds to certain events
having happened.
And then, given that this
has happened, we specify
conditional probabilities.
Given that this has happened,
how likely is it for that C
also occurs?
Given a model of this kind, how
do we find the probability
or for this event?
The answer is extremely
simple.
All that you do is move along
with the tree and multiply
conditional probabilities
along the way.
So in terms of frequencies, how
often do all three things
happen, A, B, and C?
You first see how often
does A occur.
Out of the times that
A occurs, how
often does B occur?
And out of the times where both
A and B have occurred,
how often does C occur?
And you can just multiply those
three frequencies with
each other.
What is the formal
proof of this?
Well, the only thing we have in
our hands is the definition
of conditional probabilities.
So let's just use this.
And--
OK.
Now, the definition of
conditional probabilities
tells us that the probability
of two things is the
probability of one of them
times a conditional
probability.
Unfortunately, here we have the
probability of three things.
What can I do?
I can put a parenthesis in here
and think of this as the
probability of this and that
and apply our definition of
conditional probabilities
here.
The probability of two things
happening is the probability
that the first happens times
the conditional probability
that the second happens, given
A and B, given that the first
one happened.
So this is just the definition
of the conditional probability
of an event, given
another event.
That other event is a
composite one, but
that's not an issue.
It's just an event.
And then we use the definition
of conditional probabilities
once more to break this apart
and make it P(A), P(B given A)
and then finally,
the last term.
OK.
So this proves the formula
that I have up
there on the slides.
And if you wish to calculate
any other
probability in this diagram.
For example, if you want to
calculate this probability,
you would still multiply the
conditional probabilities
along the different branches
of the tree.
In particular, here in this
branch, you would have the
conditional probability of
C complement, given A
intersection B complement,
and so on.
So you write down probabilities
along all those
tree branches and just multiply
them as you go.
So this was the first skill
that we are covering.
What was the second one?
What we did was to calculate
the total probability of a
certain event B that
consisted of--
was made up from different
possibilities, which
corresponded to different
scenarios.
So we wanted to calculate the
probability of this event B
that consisted of those
two elements.
Let's generalize.
So we have our big model.
And this sample space
is partitioned
in a number of sets.
In our radar example, we had
a partition in two sets.
Either a plane is there, or
a plane is not there.
Since we're trying to
generalize, now I'm going to
give you a picture for the case
of three possibilities or
three possible scenarios.
So whatever happens in the
world, there are three
possible scenarios,
A1, A2, A3.
So think of these as there's
nothing in the air, there's an
airplane in the air, or there's
a flock of geese
flying in the air.
So there's three possible
scenarios.
And then there's a certain event
B of interest, such as a
radar records something or
doesn't record something.
We specify this model by giving
probabilities for the Ai's--
That's the probability of
the different scenarios.
And somebody also gives us the
probabilities that this event
B is going to occur, given
that the Ai-th
scenario has occurred.
Think of the Ai's
as scenarios.
And we want to calculate the
overall probability of the
event B. What's happening
in this example?
Perhaps, instead of this
picture, it's easier to
visualize if I go back to the
picture I was using before.
We have three possible
scenarios, A1, A2, A3.
And under each scenario, B may
happen or B may not happen.
And so on.
So here we have A2 intersection
B. And here we
have A3 intersection B. In the
previous slide, we found how
to calculate the probability
of any event of this kind,
which is done by multiplying
probabilities here and
conditional probabilities
there.
Now we are asked to calculate
the total probability of the
event B. The event B can happen
in three possible ways.
It can happen here.
It can happen there.
And it can happen here.
So this is our event B. It
consists of three elements.
To calculate the total
probability of our event B,
all we need to do is to add
these three probabilities.
So B is an event that consists
of these three elements.
There are three ways
that B can happen.
Either B happens together with
A1, or B happens together with
A2, or B happens together
with A3.
So we need to add the
probabilities of these three
contingencies.
For each one of those
contingencies, we can
calculate its probability by
using the multiplication rule.
So the probability of A1 and
B happening is this--
It's the probability of A1
and then B happening
given that A1 happens.
The probability of this
contingency is found by taking
the probability that A2 happens
times the conditional
probability of A2, given
that B happened.
And similarly for
the third one.
So this is the general rule
that we have here.
The rule is written for the
case of three scenarios.
But obviously, it has a
generalization for the case of
four or five or more
scenarios.
It gives you a way of breaking
up the calculation of an event
that can happen in multiple ways
by considering individual
probabilities for the different
ways that the event
can happen.
OK.
So--
Yes?
AUDIENCE: Does this
have to change for
infinite sample space?
JOHN TSISIKLIS: No.
This is true whether
your sample space
is infinite or finite.
What I'm using in this argument
that we have a
partition into just three
scenarios, three events.
So it's a partition to a finite
number of events.
It's also true if it's a
partition into an infinite
sequence of events.
But that's, I think, one of the
theoretical problems at
the end of the chapter.
You probably may not
need it for now.
OK, going back to
the story here.
There are three possible
scenarios about what could
happen in the world that
are captured here.
Event, under each scenario,
event B may or may not happen.
And so these probabilities tell
us the likelihoods of the
different scenarios.
These conditional probabilities
tell us how
likely is it for B to happen
under one scenario, or the
other scenario, or the
other scenario.
The overall probability of
B is found by taking some
combination of the probabilities
of B in the
different possible
worlds, in the
different possible scenarios.
Under some scenario, B
may be very likely.
Under another scenario, it
may be very unlikely.
We take all of these into
account and weigh them
according to the likelihood
of the scenarios.
Now notice that since A1, A2,
and three form a partition,
these three probabilities
have what property?
Add to what?
They add to one.
So it's the probability of this
branch, plus this branch,
plus this branch.
So what we have here is a
weighted average of the
probabilities of the B's into
the different worlds, or in
the different scenarios.
Special case.
Suppose the three scenarios
are equally likely.
So P of A1 equals 1/3, equals
to P of A2, P of A3.
what are we saying here?
In that case of equally likely
scenarios, the probability of
B is the average of the
probabilities of B in the
three different words, or in the
three different scenarios.
OK.
So to finally, the last step.
If we go back again two slides,
the last thing that we
did was to calculate a
conditional probability of
this kind, probability of
A given B, which is a
probability associated
essentially with
an inference problem.
Given that our radar recorded
something, how likely is it
that the plane was up there?
So we're trying to infer whether
a plane was up there
or not, based on the information
that we've got.
So let's generalize once more.
And we're just going to rewrite
what we did in that
example, but in terms of general
symbols instead of the
specific numbers.
So once more, the model that we
have involves probabilities
of the different scenarios.
These we call them prior
probabilities.
They're are our initial beliefs
about how likely each
scenario is to occur.
We also have a model of our
measuring device that tells us
under that scenario how likely
is it that our radar will
register something or not.
So we're given again these
conditional probabilities.
We're given the conditional
probabilities for these branches.
Then we are told that
event B occurred.
And on the basis of this new
information, we want to form
some new beliefs about the
relative likelihood of the
different scenarios.
Going back again to our radar
example, an airplane was
present with probability 5%.
Given that the radar recorded
something, we're going to
change our beliefs.
Now, a plane is present
with probability 34%.
The radar, since we saw
something, we are going to
revise our beliefs as to whether
the plane is out there
or is not there.
And so what we need to do is to
calculate the conditional
probabilities of the different
scenarios, given the
information that we got.
So initially, we have these
probabilities for the
different scenarios.
Once we get the information,
we update them and we
calculate our revised
probabilities or conditional
probabilities given the
observation that we made.
OK.
So what do we do?
We just use the definition
of conditional
probabilities twice.
By definition the conditional
probability is the probability
of two things happening divided
by the probability of
the conditioning event.
Now, I'm using the definition
of conditional probabilities
once more, or rather I use
the multiplication rule.
The probability of two things
happening is the probability
of the first and the second.
So these are things that
are given to us.
They're the probabilities of
the different scenarios.
And it's the model of our
measuring device, which we
assume to be available.
And how about the denominator?
This is total probability of the
event B. But we just found
that's it's easy to calculate
using the formula in the
previous slide.
To find the overall probability
of event B
occurring, we look at the
probabilities of B occurring
under the different scenario
and weigh them according to
the probabilities of
all the scenarios.
So in the end, we have a formula
for the conditional
probability, A's given B,
based on the data of the
problem, which were
probabilities of the different
scenarios and conditional
probabilities of
B, given the A's.
So what this calculation does
is, basically, it reverses the
order of conditioning.
We are given conditional
probabilities of these kind,
where it's B given A and we
produce new conditional
probabilities, where things
go the other way.
So schematically, what's
happening here is that we have
model of cause and
effect and--
So a scenario occurs and that
may cause B to happen or may
not cause it to happen.
So this is a cause/effect
model.
And it's modeled using
probabilities, such as
probability of B given Ai.
And what we want to do is
inference where we are told
that B occurs, and we want
to infer whether Ai
also occurred or not.
And the appropriate
probabilities for that are the
conditional probabilities
that A occurred,
given that B occurred.
So we're starting with a causal
model of our situation.
It models from a given cause how
likely is a certain effect
to be observed.
And then we do inference, which
answers the question,
given that the effect was
observed, how likely is it
that the world was in this
particular situation or state
or scenario.
So the name of the Bayes rule
comes from Thomas Bayes, a
British theologian back
in the 1700s.
It actually--
This calculation addresses
a basic problem, a basic
philosophical problem, how one
can learn from experience or
from experimental data and
some systematic way.
So the British at that time
were preoccupied with this
type of question.
Is there a basic theory that
about how we can incorporate
new knowledge to previous
knowledge.
And this calculation made an
argument that, yes, it is
possible to do that in
a systematic way.
So the philosophical
underpinnings of this have a
very long history and a lot
of discussion around them.
But for our purposes, it's just
an extremely useful tool.
And it's the foundation of
almost everything that gets
done when you try to do
inference based on partial
observations.
Very well.
Till next time.
