The following content is
provided under a Creative
Commons license.
Your support will help
MIT OpenCourseWare
continue to offer high quality
educational resources for free.
To make a donation or
view additional materials
from hundreds of MIT courses,
visit MIT OpenCourseWare
at ocw.mit.edu.
PROFESSOR: OK, so
welcome to 6.041/6.431,
the class on probability
models and the like.
I'm John Tsitsiklis.
I will be teaching
this class, and I'm
looking forward to this being
an enjoyable and also useful
experience.
We have a fair amount
of staff involved
in this course, your
recitation instructors
and also a bunch of TAs, but
I want to single out our head
TA, Uzoma, who is the
key person in this class.
Everything has to
go through him.
If he doesn't know in
which recitation section
you are, then simply you do not
exist, so keep that in mind.
All right.
So we want to jump
right into the subject,
but I'm going to take
just a few minutes
to talk about a few
administrative details
and how the course is run.
So we're going to have
lectures twice a week
and I'm going to use old
fashioned transparencies.
Now, you get copies of these
slides with plenty of space
for you to keep notes on them.
A useful way of making
good use of the slides
is to use them as a sort
of mnemonic summary of what
happens in lecture.
Not everything that
I'm going to say
is, of course, on the
slides, but by looking them
you get the sense of
what's happening right now.
And it may be a good
idea to review them
before you go to recitation.
So what happens in recitation?
In recitation, your
recitation instructor
is going to maybe review
some of the theory
and then solve some
problems for you.
And then you have
tutorials where
you meet in very small
groups together with your TA.
And what happens in tutorials
is that you actually
do the problem solving
with the help of your TA
and the help of your classmates
in your tutorial section.
Now probability is
a tricky subject.
You may be reading the
text, listening to lectures,
everything makes perfect
sense, and so on,
but until you actually sit
down and try to solve problems,
you don't quite appreciate the
subtleties and the difficulties
that are involved.
So problem solving is a
key part of this class.
And tutorials are extremely
useful just for this reason
because that's
where you actually
get the practice of solving
problems on your own,
as opposed to seeing someone
else who's solving them
for you.
OK but, mechanics, a key part
of what's going to happen today
is that you will turn in
your schedule forms that
are at the end of the handout
that you have in your hands.
Then, the TAs will be working
frantically through the night,
and they're going to be
producing a list of who
goes into what section.
And when that happens,
any person in this class,
with probability
90%, is going to be
happy with their assignment
and, with probability 10%,
they're going to be unhappy.
Now, unhappy people
have an option, though.
You can resubmit
your form together
with your full schedule
and constraints,
give it back to the
head TA, who will then
do some further juggling
and reassign people,
and after that happens,
90% of those unhappy people
will become happy.
And 10% of them will
be less unhappy.
OK.
So what's the probability
that a random person
is going to be unhappy at
the end of this process?
It's 1%.
Excellent.
Good.
Maybe you don't need this class.
OK, so 1%.
We have about 100
people in this class,
so there's going to be
about one unhappy person.
I mean, anywhere you look
in life, in any group
you look at, there's always
one unhappy person, right?
So, what can we do about it?
All right.
Another important
part about mechanics
is to read carefully
the statement
that we have about
collaboration,
academic honesty, and all that.
You're encouraged,
it's a very good idea
to work with other students.
You can consult sources
that are out there, but when
you sit down and
write your solutions
you have to do that by
setting things aside
and just write them on your own.
You cannot copy something that
somebody else has given to you.
One reason is that
we're not going
to like it when it happens,
and then another reason
is that you're not going
to do yourself any favor.
Really the only way to
do well in this class
is to get a lot of practice by
solving problems yourselves.
So if you don't do
that on your own,
then when quiz and
exam time comes,
things are going
to be difficult.
So, as I mentioned
here, we're going
to have recitation
sections, that some of them
are for 6.041 students,
some are for 6.431 students,
the graduate section
of the class.
Now undergraduates can sit
in the graduate recitation
sections.
What's going to happen
there is that things
may be just a little
faster and you
may be covering a problem
that's a little more advanced
and is not covered in
the undergrad sections.
But if you sit in
the graduate section,
and you're an
undergraduate, you're
still just responsible for
the undergraduate material.
That is, you can just do
the undergraduate work
in the class, but
maybe be exposed
at the different section.
OK.
A few words about the
style of this class.
We want to focus on
basic ideas and concepts.
There's going to be
lots of formulas,
but what we try to
do in this class
is to actually
have you understand
what those formulas mean.
And, in a year from now when
almost all of the formulas
have been wiped out
from your memory,
you still have the
basic concepts.
You can understand them, so
when you look things up again,
they will still make sense.
It's not the plug and
chug kind of class
where you're given a list of
formulas, you're given numbers,
and you plug in and
you get answers.
The really hard part is
usually to choose which
formulas you're going to use.
You need judgment,
you need intuition.
Lots of probability problems,
at least the interesting ones,
often have lots of
different solutions.
Some are extremely long,
some are extremely short.
The extremely short ones
usually involve some kind
of deeper understanding of
what's going on so that you
can pick a shortcut and use it.
And hopefully you are
going to develop this skill
during this class.
Now, I could spend a lot of
time in this lecture talking
about why the
subject is important.
I'll keep it short because
I think it's almost obvious.
Anything that happens
in life is uncertain.
There's uncertainty anywhere,
so whatever you try to do,
you need to have some way
of dealing or thinking
about this uncertainty.
And the way to do that
in a systematic way
is by using the
models that are given
to us by probability theory.
So if you're an
engineer and you're
dealing with a communication
system or signal processing,
basically you're facing
a fight against noise.
Noise is random, is uncertain.
How do you model it?
How do you deal with it?
If you're a manager,
I guess you're
dealing with customer demand,
which is, of course, random.
Or you're dealing
with the stock market,
which is definitely random.
Or you play the casino, which
is, again, random, and so on.
And the same goes for
pretty much any other field
that you can think of.
But, independent of which
field you're coming from,
the basic concepts and tools
are really all the same.
So you may see in
bookstores that there
are books, probability
for scientists,
probability for
engineers, probability
for social scientists,
probability for astrologists.
Well, what all those
books have inside
them is exactly the same
models, the same equations,
the same problems.
They just make them somewhat
different word problems.
The basic concepts are
just one and the same,
and we'll take this
as an excuse for not
going too much into specific
domain applications.
We will have
problems and examples
that are motivated,
in some loose sense,
from real world situations.
But we're not really
trying in this class
to develop the skills for
domain-specific problems.
Rather, we're going to try to
stick to general understanding
of the subject.
OK.
So the next slide, of which
you do have in your handout,
gives you a few more
details about the class.
Maybe one thing to
comment here is that you
do need to read the text.
And with calculus
books, perhaps you
can live with a just a
two page summary of all
of the interesting
formulas in calculus,
and you can get by just
with those formulas.
But here, because we want to
develop concepts and intuition,
actually reading words, as
opposed to just browsing
through equations,
does make a difference.
In the beginning, the
class is kind of easy.
When we deal with
discrete probability,
that's the material until our
first quiz, and some of you
may get by without being too
systematic about following
the material.
But it does get substantially
harder afterwards.
And I would keep
restating that you
do have to read the text to
really understand the material.
OK.
So now we can start with the
real part of the lecture.
Let us set the goals for today.
So probability, or
probability theory,
is a framework for
dealing with uncertainty,
for dealing with
situations in which we
have some kind of randomness.
So what we want to do is, by
the end of today's lecture,
to give you anything
that you need
to know how to set up what
does it take to set up
a probabilistic model.
And what are the basic rules
of the game for dealing
with probabilistic models?
So, by the end of
this lecture, you
will have essentially recovered
half of this semester's
tuition, right?
So we're going to talk about
probabilistic models in more
detail--
the sample space,
which is basically
a description of
all the things that
may happen during a
random experiment,
and the probability law,
which describes our beliefs
about which outcomes are
more likely to occur compared
to other outcomes.
Probability laws have to obey
certain properties that we
call the axioms of probability.
So the main part
of today's lecture
is to describe
those axioms, which
are the rules of the
game, and consider
a few really trivial examples.
OK, so let's start
with our agenda.
The first piece in a
probabilistic model
is a description of the
sample space of an experiment.
So we do an experiment,
and by experiment we
just mean that just
something happens out there.
And that something that happens,
it could be flipping a coin,
or it could be rolling
a dice, or it could be
doing something in a card game.
So we fix a
particular experiment.
And we come up with a list of
all the possible things that
may happen during
this experiment.
So we write down a list of
all the possible outcomes.
So here's a list of all
the possible outcomes
of the experiment.
I use the word
"list," but, if you
want to be a little
more formal, it's better
to think of that list as a set.
So we have a set.
That set is our sample space.
And it's a set whose elements
are the possible outcomes
of the experiment.
So, for example, if you're
dealing with flipping a coin,
your sample space would be
heads, this is one outcome,
tails is one outcome.
And this set, which
has two elements,
is the sample space
of the experiment.
OK.
What do we need to
think about when we're
setting up the sample space?
First, the list should
be mutually exclusive,
collectively exhaustive.
What does that mean?
Collectively
exhaustive means that,
no matter what happens
in the experiment,
you're going to get one of
the outcomes inside here.
So you have not forgotten any
of the possibilities of what
may happen in the experiment.
Mutually exclusive means
that if this happens,
then that cannot happen.
So at the end of
the experiment, you
should be able to point out
to me just one, exactly one,
of these outcomes and say, this
is the outcome that happened.
OK.
So these are sort of
basic requirements.
There's another requirement
which is a little more loose.
When you set up
your sample space,
sometimes you do
have some freedom
about the details of how
you're going to describe it.
And the question
is, how much detail
are you going to include?
So let's take this coin
flipping experiment
and think of the
following sample space.
One possible outcome is heads,
a second possible outcome
is tails and it's raining,
and the third possible outcome
is tails and it's not raining.
So this is another possible
sample space for the experiment
where I flip a coin just once.
It's a legitimate one.
These three possibilities
are mutually exclusive
and collectively exhaustive.
Which one is the
right sample space?
Is it this one or that one?
Well, if you think that my
coin flipping inside this room
is completely unrelated
to the weather outside,
then you're going to stick
with this sample space.
If, on the other hand, you
have some superstitious belief
that maybe rain has
an effect on my coins,
you might work with the
sample space of this kind.
So you probably
wouldn't do that,
but it's a legitimate
option, strictly speaking.
Now this example is a little
bit on the frivolous side,
but the issue that
comes up here is
a basic one that
shows up anywhere
in science and engineering.
Whenever you're dealing with
a model or with a situation,
there are zillions of
details in that situation.
And when you come
up with a model,
you choose some of those details
that you keep in your model,
and some that you say,
well, these are irrelevant.
Or maybe there are small
effects, I can neglect them,
and you keep them
outside your model.
So when you go to
the real world,
there's definitely an element
of art and some judgment
that you need to do in order
to set up an appropriate sample
space.
So, an easy example now.
So of course, the
elementary examples
are coins, cards, and dice.
So let's deal with dice.
But to keep the diagram small,
instead of a six-sided die,
we're going to think about the
die that only has four faces.
So you can do that
with a tetrahedron,
doesn't really matter.
Basically, it's a die
that when you roll it,
you get a result which is
one, two, three or four.
However, the experiment that
I'm going to think about
will consist of two
rolls of a dice.
A crucial point here--
I'm rolling the
die twice, but I'm
thinking of this as
just one experiment, not
two different experiments,
not a repetition twice
of the same experiment.
So it's one big experiment.
During that big
experiment various things
could happen, such as
I'm rolling the die once,
and then I'm rolling
the die twice.
OK.
So what's the sample
space for that experiment?
Well, the sample space consists
of the possible outcomes.
One possible outcome
is that your first roll
resulted in two and the
second roll resulted in three.
In which case, the outcome
that you get is this one,
a two followed by three.
This is one possible outcome.
The way I'm describing
things, this outcome
is to be distinguished
from this outcome
here, where a three
is followed by two.
If you're playing
backgammon, it doesn't matter
which one of the two happened.
But if you're dealing
with a probabilistic model
that you want to keep track
of everything that happens
in this composite
experiment, there
are good reasons
for distinguishing
between these two outcomes.
I mean, when this happens,
it's definitely something
different from that happening.
A two followed by a three is
different from a three followed
by a two.
So this is the correct sample
space for this experiment
where we roll the die twice.
It has a total of 16
elements and it's, of course,
a finite set.
Sometimes, instead of
describing sample spaces
in terms of lists, or sets,
or diagrams of this kind,
it's useful to
describe the experiment
in some sequential way.
Whenever you have
an experiment that
consists of multiple
stages, it might
be useful, at least visually,
to give a diagram that shows you
how those stages evolve.
And that's what we do by
using a sequential description
or a tree-based
description by drawing
a tree of the
possible evolutions
during our experiment.
So in this tree, I'm thinking
of a first stage in which I
roll the first die, and there
are four possible results, one,
two, three and four.and 4.
And, given what happened,
let's say in the first roll,
suppose I got a one.
Then I'm rolling
the second dice,
and there are four
possibilities for what
may happen to the second die.
And the possible results are
one, tow, three and four again.
So what's the relation
between the two diagrams?
Well, for example,
the outcome two
followed by three corresponds
to this path on the tree.
So this path corresponds
to two followed by a three.
Any path is associated
to a particular outcome,
any outcome is associated
to a particular path.
And, instead of
paths, you may want
to think in terms of the
leaves of this diagram.
Same thing, think of
each one of the leaves
as being one possible outcome.
And of course we have
16 outcomes here,
we have 16 outcomes here.
Maybe you noticed the subtlety
that I used in my language.
I said I rolled the
first dice and the result
that I get is a two.
I didn't use the word "outcome."
I want to reserve
the word "outcome"
to mean the overall
outcome at the end
of the overall experiment.
So "2, 3" is the outcome
of the experiment.
The experiment
consisted of stages.
Two was the result
in the first stage,
three was the result
in the second stage.
You put all those
results together,
and you get your outcome.
OK, perhaps we are
splitting hairs here,
but it's useful to keep
the concepts right.
What's special
about this example
is that, besides
being trivial, it has
a sample space which is finite.
There's 16 possible
total outcomes.
Not every experiment has
a finite sample space.
Here's an experiment in which
the sample space is infinite.
So you are playing darts and
the target is this square.
And you're perfect at
that game, so you're
sure that your darts will
always fall inside the square.
So, but where exactly your dart
would fall inside that square,
that itself is random.
We don't know what
it's going to be.
It's uncertain.
So all the possible
points inside the square
are possible outcomes
of the experiment.
So a typical outcome of the
experiment is going to a pair
of numbers, x,y, where x and y
are real numbers between zero
and one.
Now there's infinitely
many real numbers,
there's infinitely many
points in the square,
so this is an example
in which our sample
space is an infinite set.
OK, so we're going to revisit
this example a little later.
So these are two examples of
what the sample space might
be in simple experiments.
Now, the more important
order of business
is now to look at
those possible outcomes
and to make some
statements about
their relative likelihoods.
Which outcome is more likely to
occur compared to the others?
And the way we do this is
by assigning probabilities
to the outcomes.
Well, not exactly.
Suppose that all you were to
do was to assign probabilities
to individual outcomes.
If you go back to this example,
and you consider one particular
outcome-- let's say this point--
what would be the probability
that you hit exactly this point
to infinite precision?
Intuitively, that
probability would be zero.
So any individual point in this
diagram in any reasonable model
should have zero probability.
So if you just tell me that
any individual outcome has
zero probability,
you're not really
telling me much to work with.
For that reason, what
instead we're going to do
is to assign probabilities
to subsets of the sample
space, as opposed to
assigning probabilities
to individual outcomes.
So here's the picture.
We have our sample
space, which is omega,
and we consider some
subset of the sample space.
Call it A. And I want
to assign a number,
a numerical probability, to
this particular subset which
represents my belief about how
likely this set is to occur.
OK.
What do we mean "to occur?"
And I'm introducing
here a language
that's being used in
probability theory.
When we talk about subsets
of the sample space,
we usually call them events,
as opposed to subsets.
And the reason is
because it works nicely
with the language that
describes what's going on.
So the outcome is a point.
The outcome is random.
The outcome may be inside
this set, in which case
we say that event A occurred, if
we get an outcome inside here.
Or the outcome may fall
outside the set, in which case
we say that event
A did not occur.
So we're going to assign
probabilities to events.
And now, how should
we do this assignment?
Well, probabilities are meant
to describe your beliefs
about which sets are more likely
to occur versus other sets.
So there's many ways that you
can assign those probabilities.
But there are some ground
rules for this game.
First, we want probabilities to
be numbers between zero and one
because that's the
usual convention.
So a probability
of zero means we're
certain that something
is not going to happen.
Probability of one
means that we're
essentially certain that
something's going to happen.
So we want numbers
between zero and one.
We also want a few other things.
And those few other
things are going
to be encapsulated
in a set of axioms.
What "axioms" means
in this context,
it's the ground rules that any
legitimate probabilistic model
should obey.
You have a choice of what
kind of probabilities you use.
But, no matter
what you use, they
should obey certain consistency
properties because if they
obey those properties,
then you can
go ahead and do
useful calculations
and do some useful reasoning.
So what are these properties?
First, probabilities
should be non-negative.
OK?
That's our convention.
We want probabilities to be
numbers between zero and one.
So they should certainly
be non-negative.
The probability
that event A occurs
should be a non-negative number.
What's the second axiom?
The probability of the entire
sample space is equal to one.
Why does this make sense?
Well, the outcome is certain
to be an element of the sample
space because we set up a sample
space, which is collectively
exhaustive.
No matter what the
outcome is, it's
going to be an element
of the sample space.
We're certain that event
omega is going to occur.
Therefore, we represent
this certainty
by saying that the probability
of omega is equal to one.
Pretty straightforward so far.
The more interesting
axiom is the third rule.
Before getting into it,
just a quick reminder.
If you have two sets, A and
B, the intersection of A and B
consists of those elements
that belong both to A and B.
And we denote it this way.
When you think
probabilistically,
the way to think of intersection
is by using the word "and."
This event, this
intersection, is the event
that A occurred and B occurred.
If I get an outcome inside
here, A has occurred
and B has occurred
at the same time.
So you may find the word "and"
to be a little more convenient
than the word "intersection."
And similarly, we
have some notation
for the union of two events,
which we write this way.
The union of two
sets, or two events,
is the collection
of all the elements
that belong either to the
first set, or to the second,
or to both.
When you talk about events,
you can use the word "or."
So this is the event that
A occurred or B occurred.
And this "or" means that it
could also be that both of them
occurred.
OK.
So now that we
have this notation,
what does the third axiom say?
The third axiom says that if
we have two events, A and B,
that have no common elements--
so here's A, here's
B, and perhaps this
is our big sample space.
The two events have
no common elements.
So the intersection of the
two events is the empty set.
There's nothing in
their intersection.
Then, the total probability
of A together with B
has to be equal to the sum of
the individual probabilities.
So the probability that
A occurs or B occurs
is equal to the probability that
A occurs plus the probability
that B occurs.
So think of probability
as being cream cheese.
You have one pound of cream
cheese, the total probability
assigned to the
entire sample space.
And that cream cheese is
spread out over this set.
The probability of A is
how much cream cheese
sits on top of A. Probability of
B is how much sits on top of B.
The probability of A union B
is the total amount of cream
cheese sitting on
top of this and that,
which is obviously the sum
of how much is sitting here
and how much is sitting there.
So probabilities behave
like cream cheese,
or they behave like mass.
For example, if you think
of some material object,
the mass of this set
consisting of two pieces
is obviously the sum
of the two masses.
So this property is
a very intuitive one.
It's a pretty
natural one to have.
OK.
Are these axioms enough
for what we want to do?
I mentioned a while ago that
we want probabilities to be
numbers between zero and one.
Here's an axiom that tells
you that probabilities
are non-negative.
Should we have
another axiom that
tells us that probabilities
are less than or equal to one?
It's a desirable property.
We would like to
have it in our hands.
OK, why is it not in that list?
Well, the people who are in
the axiom making business
are mathematicians
and mathematicians
tend to be pretty laconic.
You don't say something if
you don't have to say it.
And this is the case here.
We don't need that extra
axiom because we can derive it
from the existing axioms.
Here's how it goes.
One is the probability over
the entire sample space.
Here we're using
the second axiom.
Now the sample space
consists of A together
with the complement of A. OK?
When I write the
complement of A,
I mean the complement of
A inside of the set omega.
So we have omega, here's A,
here's the complement of A,
and the overall set is omega.
OK.
Now, what's the next step?
What should I do next?
Which axiom should I use?
We use axiom three because a set
and the complement of that set
are disjoint.
They don't have any
common elements.
So axiom three
applies and tells me
that this is the
probability of A
plus the probability
of A complement.
In particular, the
probability of A
is equal to one minus the
probability of A complement,
and this is less
than or equal to one.
Why?
Because probabilities
are non-negative,
by the first axiom.
OK.
So we got the conclusion
that we wanted.
Probabilities are always
less than or equal to one,
and this is a simple
consequence of the three axioms
that we have.
This is a really nice
argument because it actually
uses each one of those axioms.
The argument is
simple, but you have
to use all of these
three properties
to get the conclusion
that you want.
OK.
So we can get interesting
things out of our axioms.
Can we get some more
interesting ones?
How about the union
of three sets?
What kind of probability
should it have?
So here's an event
consisting of three pieces.
And I want to say something
about the probability
of A union B union C.
What I would like to say
is that this probability is
equal to the sum of the three
individual probabilities.
How can I do it?
I have an axiom
that tells me that I
can do it for two events.
I don't have an axiom
for three events.
Well, maybe I can
massage things and still
be able to use that axiom.
And here's the trick.
The union of three sets,
you can think of it
as forming the union
of the first two sets
and then taking the
union with the third set.
OK?
So taking unions, you can
take the unions in any order
that you want.
So here we have the
union of two sets.
Now, ABC are disjoint,
by assumption
or that's how I drew it.
So if A, B, and C are
disjoint, then A union B
is disjoint from
C. So here we have
the union of two disjoint sets.
So by the additivity axiom, the
probability of that the union
is going to be the
probability of the first set
plus the probability
of the second set.
And now I can use the
additivity axiom once more
to write that this
is probability
of A plus probability
of B plus probability
of C. So by using this axiom
which was stated for two sets,
we can actually derive
a similar property
for the union of
three disjoint sets.
And then you can repeat
this argument as many times
as you want.
It's valid for the union
of ten disjoint sets,
for the union of a
hundred disjoint sets,
for the union of any
finite number of sets.
So if A1 up to An are
disjoint, then the probability
of A1 union An is equal to
the sum of the probabilities
of the individual sets.
OK.
Special case of this is when
we're dealing with finite sets.
Suppose I have just a
finite set of outcomes.
I put them together
in a set and I'm
interested in the
probability of that set.
So here's our sample space.
There's lots of outcomes,
but I'm taking a few of these
and I form a set out of them.
This is a set consisting of, in
this picture, three elements.
In general, it
consists of k elements.
Now, a finite set,
I can write it
as a union of
single element sets.
So this set here is the union
of this one element set,
together with this one
element set together
with that one element set.
So the total
probability of this set
is going to be the sum of
the probabilities of the one
element sets.
Now, probability
of a one element
set, you need to use
the brackets here
because probabilities
are assigned to sets.
But this gets kind of
tedious, so here one abuses
notation a little bit
and we get rid of those
brackets and just
write probability
of this single,
individual outcome.
In any case, conclusion
from this exercise
is that the total probability
of a finite collection
of possible outcomes,
the total probability
is equal to the sum
of the probabilities
of individual elements.
So these are basically the
axioms of probability theory.
Or, well, they're
almost the axioms.
There are some subtleties
that are involved here.
One subtlety is that this
axiom here doesn't quite
do the job for everything
we would like to do.
And we're going to come back to
this at the end of the lecture.
A second subtlety has
to do with weird sets.
We said that an event is a
subset of the sample space
and we assign
probabilities to events.
Does this mean that we are
going to assign probability
to every possible subset
of the sample space?
Ideally, we would
wish to do that.
Unfortunately, this is
not always possible.
If you take a sample
space, such as the square,
the square has
nice subsets, those
that you can describe by
cutting it with lines and so on.
But it does have some very
ugly subsets, as well,
that are impossible
to visualize,
impossible to imagine,
but they do exist.
And those very weird sets
are such that there's
no way to assign probabilities
to them in a way that's
consistent with the
axioms of probability.
OK.
So this is a very,
very fine point
that you can immediately forget
for the rest of this class.
You will only
encounter these sets
if you end up doing doctoral
work on the theoretical aspects
of probability theory.
So it's just a
mathematical subtlety
that some very weird
sets do not have
probabilities assigned to them.
But we're not going to
encounter these sets
and they do not show
up in any applications.
OK.
So now let's revisit
our examples.
Let's go back to
the die example.
We have our sample space.
Now we need to assign
a probability law.
There's lots of possible
probability laws
that you can assign.
I'm picking one
here, arbitrarily,
in which I say that every
possible outcome has
the same probability of 1/16.
OK.
Why do I make this model?
Well, empirically, if you
have well-manufactured dice,
they tend to behave that way.
We will be coming back
to this kind of story
later in this class.
But I'm not saying that
this is the only probability
law that there can be.
You might have weird dice in
which certain outcomes are
more likely than others.
But to keep things simple,
let's take every outcome
to have the same
probability of 1/16.
OK.
Now that we have in our
hands a sample space
and the probability
law, we can actually
solve any problem there is.
We can answer any question
that could be posed to us.
For example, what's
the probability
that the outcome, which is this
pair, is either 1,1 or 1,2.
We're talking here about this
particular event, 1,1 or 1,2.
So it's an event consisting
of these two items.
According to what we
were just discussing,
the probability of a finite
collection of outcomes
is the sum of their
individual probabilities.
Each one of them has
probability of 1/16,
so the probability
of this is 2/16.
How about the probability of the
event that x is equal to one.
x is the first roll, so
that's the probability
that the first roll
is equal to one.
Notice the syntax
that's being used here.
Probabilities are assigned
to subsets, to sets,
so we think of this as meaning
the set of all outcomes
such that x is equal to one.
How do you answer this question?
You go back to the
picture and you
try to visualize or identify
this event of interest.
x is equal to one corresponds
to this event here.
These are all the outcomes
at which x is equal to one.
There's four outcomes.
Each one has probability
1/16, so the answer is 4/16.
OK.
How about the probability
that x plus y is odd?
OK.
That will take a
little bit more work.
But you go to the
sample space and you
identify all the outcomes at
which the sum is an odd number.
So that's a place where the sum
is odd, these are other places,
and I guess that exhausts
all the possible outcomes
at which we have an odd sum.
We count them.
How many are there?
There's a total
of eight of them.
Each one has probability 1/16,
total probability is 8/16.
And harder question.
What is the probability that
the minimum of the two rolls
is equal to 2?
This is something
that you probably
couldn't do in your head
without the help of a diagram.
But once you have a
diagram, things are simple.
You ask the question.
OK, this is an event, that
the minimum of the two rolls
is equal to two.
This can happen in several ways.
What are the several
ways that it can happen?
Go to the diagram and
try to identify them.
So the minimum is equal to
two if both of them are two's.
Or it could be that x is two
and y is bigger, or y is two
and x is bigger.
OK.
I guess we rediscover that
yellow and blue make green,
so we see here that
there's a total
of five possible outcomes.
The probability of
this event is 5/16.
Simple example,
but the procedure
that we followed in
this example actually
applies to any probability
model you might ever encounter.
You set up your
sample space, you
make a statement that
describes the probability
law over that sample space,
then somebody asks you
questions about various events.
You go to your pictures,
identify those events,
pin them down, and
then start kind
of counting and calculating
the total probability
for those outcomes that
you're considering.
This example is a special
case of what is called
the discrete uniform law.
The model obeys the
discrete uniform law
if all outcomes
are equally likely.
It doesn't have to be that way.
That's just one example
of a probability law.
But when things are that way, if
all outcomes are equally likely
and we have N of them, and you
have a set A that has little n
elements, then each
one of those elements
has probability
one over capital N
since all outcomes
are equally likely.
And for our probabilities
to add up to one,
each one must have
this much probability,
and there's little n elements.
That gives you the probability
of the event of interest.
So problems like the one
in the previous slide
and more generally of
the type described here
under discrete uniform
law, these problems
reduce to just counting.
How many elements are
there in my sample space?
How many elements are there
inside the event of interest?
Counting is generally
simple, but for some problems
it gets pretty complicated.
And in a couple of
weeks, we're going
to have to spend the
whole lecture just
on the subject of how
to count systematically.
Now the procedure we followed
in the previous example
is the same as the
procedure you would
follow in continuous
probability problems.
So, going back to
our dart problem,
we get the random point
inside the square.
That's our sample space.
We need to assign
a probability law.
For lack of imagination, I'm
taking the probability law
to be the area of a subset.
So if we have two subsets
of the sample space
that have equal areas, then
I'm postulating that they
are equally likely to occur.
The probably that they fall here
is the same as the probability
that they fall there.
The model doesn't
have to be that way.
But if I have sort
of complete ignorance
of which points are
more likely than others,
that might be the
reasonable model to use.
So equal areas mean
equal probabilities.
If the area is twice as
large, the probability
is going to be twice as big.
So this is our model.
We can now answer questions.
Let's answer the easy one.
What's the probability that the
outcome is exactly this point?
That of course is zero because
a single point has zero area.
And since this probability
is equal to area,
that's zero probability.
How about the
probability that the sum
of the coordinates of the
point that we got is less than
or equal to 1/2?
How do you deal with it?
Well, you look at the picture
again, at your sample space,
and try to describe the event
that you're talking about.
The sum being less
than 1/2 corresponds
to getting an outcome that's
below this line, where
this line is the line where
x plus y equals to 1/2.
So the intercepts of that line
with the axis are 1/2 and 1/2.
So you describe
the event visually
and then you use
your probability law.
The probability
law that we have is
that the probability of a set is
equal to the area of that set.
So all we need to find is the
area of this triangle, which
is 1/2 times 1/2 times
1/2, half, equals to 1/8.
OK.
Moral from these two
examples is that it's always
useful to have a picture
and work with a picture
to visualize the events
that you're talking about.
And once you have a
probability law in your hands,
then it's a matter
of calculation
to find the probabilities
of an event of interest.
The calculations we did in
these two examples, of course,
were very simple.
Sometimes calculations
may be a lot harder,
but it's a different business.
It's a business of calculus,
for example, or being
good in algebra and so on.
As far as probability
is concerned,
it's clear what
you will be doing,
and then maybe you're faced
with a harder algebraic part
to actually carry
out the calculations.
The area of a triangle
is easy to compute.
If I had put down a
very complicated shape,
then you might need to
solve a hard integration
problem to find the
area of that shape,
but that's stuff that
belongs to another class
that you have presumably
mastered by now.
Good, OK.
So now let me
spend just a couple
of minutes to return to a
point that I raised before.
I was saying that the axiom
that we had about additivity
might not quite be enough.
Let's illustrate what I mean
by the following example.
Think of the experiment where
you keep flipping a coin
and you wait until you obtain
heads for the first time.
What's the sample space
of this experiment?
It might happen
the first flip, it
might happen in the tenth flip.
Heads for the first time might
occur in the millionth flip.
So the outcome of
this experiment
is going to be an
integer and there's
no bound to that integer.
You might have to wait very
much until that happens.
So the natural sample
space is the set
of all possible integers.
Somebody tells you
some information
about the probability law.
The probability that you
have to wait for n flips
is equal to two to the minus n.
Where did this come from?
That's a separate story.
Where did it come from?
Somebody tells this to us,
and those probabilities
are plotted here
as a function of n.
And you're asked to find the
probability that the outcome is
an even number.
How do you go about
calculating that probability?
So the probability of
being an even number
is the probability of
the subset that consists
of just the even numbers.
So it would be a subset
of this kind, that
includes two, four, and so on.
So any reasonable
person would say,
well the probability of
obtaining an outcome that's
either two or four
or six and so on
is equal to the probability
of obtaining a two,
plus the probability
of obtaining a four,
plus the probability of
obtaining a six, and so on.
These probabilities
are given to us.
So here I have to do my algebra.
I add this geometric series
and I get an answer of 1/3.
That's what any reasonable
person would do.
But the person who
only knows the axioms
that they posted just a
little earlier may get stuck.
They would get
stuck at this point.
How do we justify this?
We had this property for
the union of disjoint sets
and the corresponding
property that tells us
that the total probability of
finitely many things, outcomes,
is the sum of their
individual probabilities.
But here we're using it
on an infinite collection.
The probability of
infinitely many points
is equal to the sum
of the probabilities
of each one of these.
To justify this step we need to
introduce one additional rule,
an additional axiom, that tells
us that this step is actually
legitimate.
And this is the countable
additivity axiom,
which is a little stronger,
or quite a bit stronger,
than the additivity
axiom we had before.
It tells us that if we
have a sequence of sets
that are disjoint and we want
to find their total probability,
then we are allowed to add
their individual probabilities.
So the picture might
be such as follows.
We have a sequence of sets,
A1, A2, A3, and so on.
I guess in order to fit them
inside the sample space,
the sets need to get
smaller and smaller perhaps.
They are disjoint.
We have a sequence of such sets.
The total probability
of falling anywhere
inside one of those
sets is the sum
of their individual
probabilities.
A key subtlety
that's involved here
is that we're talking
about a sequence of events.
By "sequence" we mean
that these events
can be arranged in order.
I can tell you the first
event, the second event,
the third event, and so on.
So if you have such a
collection of events
that can be ordered as first,
second, third, and so on,
then you can add
their probabilities
to find the probability
of their union.
So this point is actually
a little more subtle
than you might
appreciate at this point,
and I'm going to return
to it at the beginning
of the next lecture.
For now, enjoy the
first week of classes
and have a good weekend.
Thank you.
