Hi!
My name is Viktor, and I will be taking you
through the Probability for Statistics and
Data Science course.
I am a Business graduate with a strong affinity
for numbers. I love Mathematics and Statistics.
In fact, I enjoy the subject matter so much
that I have competed in a number of competitions
during my years as a student. I graduated
Hamilton College in the state of New York
and developed a deep interest in the real-life
application of mathematics, statistics, and
probability theory. I hope my business training,
combined with my fascination for the application
of probability to solve real-life problems,
will give this course a business twist, which
will be extremely useful for you!
Alright.
In this course, we will cover some of the
fundamental theory you need in order to proceed
further and learn more about fascinating topics
such as machine and deep learning.
First, we will start with the very basics.
We will learn about the different types of
distributions, and the building blocks of
probability theory:
• Favorable Outcome
• Sample Space
• Expected Values
• Frequency
• Complements
We will build a solid foundation, which will
help us create our first probability frequency
distribution. And that’s just in the first
hour!
Once we have the basics, we will go deeper.
Bit by bit, we will start unfolding the power
of Combinatorics. We will explore Permutations,
how to compute them, and how to apply them
to real-life scenarios. We will take some
extra time to clarify the difference between
variations and permutations, which will be
explained mathematically, graphically, and
through real-life examples!
We will go over examples in each field which
are impossible to comprehend without a solid
understanding of the fundamentals of probability.
Sounds great, doesn’t it?
This is going to be an incredible adventure!
Life is filled with uncertain events and often
we must consider the possible outcomes before
deciding. We ask ourselves questions like
“What is the chance of success?” and “What
is the probability that we fail?” to determine
whether the risk is worth taking.
Many CEOs need to make huge decisions when
investing in their research and development
departments or contemplating buyouts or mergers.
By using probability and statistical data,
they can predict how likely each outcome is
and make the right call for their firm.
Some of you might be wondering: “What is
this probability we are talking about?”.
Essentially, probability is the chance of
something happening. A more academic definition
for this would be “the likelihood of an
event occurring”.
The word event has a specific meaning when
talking about probabilities. Simply put, an
event is a specific outcome or a combination
of several outcomes.
These outcomes can be pretty much anything
- getting Heads when flipping a coin, rolling
a 4 on a six-sided die or running a mile in
under 6 minutes.
Take flipping a coin for example. There isn’t
only one single probability involved since
there are two possible outcomes: getting heads
or getting tails. That means we have two possible
events and need to assign probabilities to
each one.
When dealing with uncertain events we are
seldom satisfied by simply knowing whether
an event is likely or unlikely. Ideally, we
want to be able to measure and compare probabilities
in order to know which event is relatively
more likely. To do so, we express probabilities
numerically.
Even though we can express probabilities as
percentages or fractions, conventionally we
write them out using real numbers between
0 and 1. So, instead of using 20% or one fifth,
we prefer 0.2.
All right!
Now let us briefly talk about interpreting
these probability values. Having a probability
of ‘1’ expresses absolute certainty of
the event occurring and a probability of ‘0’
expresses absolute certainty of the event
NOT occurring.
You probably figured this out, but higher
probability values indicate a higher likelihood.
Okay!
As you can imagine, most events we are interested
in would have a probability other than 0 and
1. So, values like 0.2, 0.5, and 0.66 are
what we generally expect to see.
Even without knowing any of this, you can
tell some events are more likely than others.
For instance, your chance of winning the lottery
isn’t as great as winning a coin toss.
That’s why you can think of probability
as a field that is about quantifying exactly
how likely each of those events are on their
own.
And that’s what this course is going to
teach you. So how about we start right away?
Let’s get into it!
Generally, the probability of an event A occurring,
denoted P of A, is equal to the number of
preferred outcomes over the total number of
possible outcomes.
By preferred we mean outcomes that we want
to see happen. A different term people use
for such outcomes is “favourable”. Similarly,
sample space, is a term used to depict all
possible outcomes. Going forward, we shall
use the respective terms interchangeably.
We will go through several examples to ensure
you understand the notion well.
Say, event A is flipping a coin and getting
Heads. In this case, Heads is our only preferred
outcome. Assuming the coin doesn’t just
somehow stay in the air indefinitely, there
are only 2 possible outcomes – heads or
tails. This means that our probability would
be a half, so we write the following:
P of getting Heads, equals one half, which
equals 0.5.
All right!
Now, imagine we have a standard six-sided
die and we want to roll a 4. Once again, we
have a single preferred outcome, but this
time we have a greater number of total possible
outcomes – 6. Therefore, the probability
of this event would look as follows:
P of rolling 4 equals: one sixth, or approximately
0 point one six seven.
Great!
Events can be simple, or a bit more complex.
For example, what if we wanted to roll a number
divisible by 3? That means we need to get
either a 3 or a 6 so the number of preferred
outcomes becomes two. However, the total number
of possible outcomes stays the same since
the die still has 6 sides. Therefore, we conclude
that the probability of rolling a number divisible
by 3 equals:
2 over 6, which is approximately 0.33.
So far so good!
Note that the probability of two independent
events occurring at the same time, is equal
to the product of all the probabilities of
the individual events. For instance, the likelihood
of getting the Ace of Spades equals the probability
of getting an Ace, times the probability of
getting a spade. In a later lecture, we are
going to define what we mean by independent,
but for now let us observe some more examples
of probability.
What about the probability of winning the
US lottery? Even though it sounds like something
that is completely different, it actually
follows the same idea. You take the number
of preferred outcomes and divide it by all
outcomes.
Now, the number of preferred outcomes we have
would be equal to the amount of different
tickets we bought. The total number of possible
outcomes, on the other hand, is just something
we will learn how to calculate less than an
hour from now. For the moment, just assume
that there exist upward of 175 million outcomes
for the US lottery.
Therefore, each individual ticket only has
a probability of winning equal to 1 over 175
million, or approximately 0.000000005.
How would your chances improve if you bought
2 tickets?
How about 5?
I don’t know about you, but I like my odds
of flipping a coin a lot more.
Expected values represent what we expect the
outcome to be if we run an experiment many
times. To fully grasp the concept, we must
first explain what an experiment is.
Okay!
Imagine we don’t know the probability of
getting heads when flipping a coin. We are
going to try to estimate it ourselves, so
we toss a coin several times. After doing
one flip and recording the outcome we complete
a trial.
By completing multiple trials, we are conducting
an experiment.
For example, if we toss a coin 20 times and
record the 20 outcomes, that entire process
is a single experiment with 20 trials.
All right!
The probabilities we get after conducting
experiments are called experimental probabilities,
whereas the ones we introduced earlier were
theoretical or true probabilities.
Generally, when we are uncertain what the
true probabilities are or how to compute them,
we like conducting experiments. The experimental
probabilities we get are not always equal
to the theoretical ones, but are a good approximation.
For instance, eight out of ten times I go
to my local shop, I have to wait in line.
Based on my experience, 80% of the time there
will be a queue and 20% of the time, there
won’t be one. I can try to calculate the
true probability, but it would include far
too many factors. The experimental probability,
on the other hand, is easy to compute and
very useful.
Okay!
The formula we use to calculate experimental
probabilities is similar to the formula applied
for the theoretical ones earlier in the course.
It is simply the number of successful trials
divided by the total number of trials.
Now that we know what an experiment is, we
are ready to dive into expected values!
The expected value of an event A, denoted
E of A, is the outcome we expect to occur
when we run an experiment. To clarify any
confusion around the definition, let us examine
the following example:
We want to know how many times we can get
a spade if we draw a card 20 times. We always
record the value of the card and then return
it to the deck before shuffling.
For an event with categorical outcomes, like
suits, we calculate the expected value by
multiplying the theoretical probability of
the event, P of A, by the number of trials
we carried out, n.
We’ve already seen how to compute the true
probability of drawing a card from a specific
suit. It is equal to one fourth or point twenty-five.
If we repeat this action 20 times, the expected
value would equal 0.25, times 20, which equals
5.
An expected value of 5 means we expect to
get a spade 5 times if we run the experiment.
However, nothing guarantees us getting a spade
EXACTLY 5 times. Realistically, we could get
a spade 4 times, 6 times or even 20 times.
Now, for numerical outcomes we use a slightly
different formula. We take the value for every
element in the sample space and multiply it
by its probability. Then, we add all those
up to get the expected value.
For instance, you are trying to hit a target
with a bow and arrow. The target has 3 layers,
the outermost one is worth 10 points, the
second one is worth 20 points and the bullseye
is worth 100. You have practiced enough to
always be able to hit the target but not so
much that you hit the centre every time. The
probability of hitting each layer is as follows:
0.5 for the outmost, 0.4 for the second and
0.1 for the centre.
The expected value for this example would
be .5 times 10 plus .4 times 20 plus .1 times
100. This is equal to 5 plus 8 plus 10 or
23. Wait, we can never get 23 points with
a single shot. So why is it important to know
what the expected value of an event is?
We can use expected values to make predictions
about the future based on past data. We frequently
make predictions using intervals instead of
specific values due to uncertainty the future
brings.
Meteorologists often use these when forecasting
the weather. They do not know exactly how
much snow, rain, or wind there is going to
be, so they provide us with likely intervals
instead. That is why we often hear statements
like “Expect between 3 and 5 feet of snow
tomorrow morning.” or “Temperatures rising
up to 90° on Wednesday.”.
So far, we learned that the expected value
is used when trying to predict future events.
Sometimes the result of the expected value
is confusing or doesn’t tell us much.
For instance, let us discuss a very famous
example – throwing 2 standard 6-sided dice
and adding up the numbers on top.
We have 6 options for what the result of the
first one could be. Regardless of the number
we roll, we still have 6 different possibilities
for what we can roll on the second die. That
gives us a total of 6, times 6, equals 36
different outcomes for the two rolls.
For clarity, we can write out the results
in a 6 by 6 table, where we write the sum
of the two dice. You can clearly see that
we have repeating entries along the secondary
diagonal and all diagonals parallel to it.
Notice how 7 occurs 6 times in the table.
This means we have 6 favourable outcomes.
As we already mentioned, there are 36 possible
outcomes, so the chance of getting a 7 equals:
six, over 36, or just one sixth.
Let’s also compute the expected value for
this event. Since we are dealing with numerical
data, we should apply the same formula we
used for the archery problem from the last
lecture. To do so, we must assign an appropriate
probability to each unique entry in the table.
Just like with the sum being 7, we do that
based on the number of times the number features
in the table.
If we do so, we are going to get the expected
value which ends up being 7.
But how important is this value if the probability
associated with it is only one sixth?
The sum being equal to 7 might be the most
probable answer, but it is still very unlikely
to occur. Thus, we cannot reasonably bet on
getting a sum of exactly 7.
Moreover, even though we are suggesting 7
is the most probable sum, how can you be sure?
What we can do is to create a probability
frequency distribution.
Simply put, a probability frequency distribution
is a collection of the probabilities for each
possible outcome – that’s how I know that
7 was the most probable sum of two dice. Usually
it is expressed with a graph or a table. To
understand what the probability frequency
distribution looks like, we are going to construct
one right now.
Using the sample space table we already constructed.
For each unique sum, we record the amount
of times it features in the table. This value
is known as the frequency of the outcome.
For example, getting a sum of 8 in 5 different
cases, means that 8 has a frequency of 5.
Okay!
If we write out all the outcomes in ascending
order and the frequency of each one, we construct
a frequency distribution table. By examining
this table, we can easily see how the frequency
changes with the results.
Good job! At this point, we have done most
of the work!
The final step in getting the probability
frequency distribution might be the most intuitive
one.
We need to transform the frequency of each
outcome into a probability. Knowing the size
of the sample space, we can determine the
true probabilities for each outcome.
We simply divide the frequency for each possible
outcome by the size of the sample space. A
collection of all the probabilities for the
various outcomes is called a probability frequency
distribution.
As mentioned earlier, we can express this
probability frequency distribution through
a table, or a graph.
All right!
On the graph, we see the probability frequency
distribution. The X axis depicts the sum of
the two dice , and the Y axis represents the
probability of getting each outcome.
When making predictions, we generally want
our interval to have the highest probability.
We can see that the individual outcomes with
the highest probability are the ones with
the highest bars in the graph. Usually, the
highest bars will form around the expected
value. Thus, the values around it would also
be the values with the highest probability.
This suggests that if we want the interval
with the highest probability, we should construct
it around the expected value.
Before we continue to the next section of
this course, let’s talk about some of the
characteristics of probabilities and events.
For starters, let’s define what a complement
is. Simply put, a complement of an event is
everything the event is not. As the name suggests,
the complement helps complete the rest of
the sample space.
To calculate the probability of the complement
of an event, we need to set up a few things.
For starters, if we add the probabilities
of different events, we get their sum of probabilities.
Now, if we add up all the possible outcomes
of an event, we should always get 1. Remember
that having a probability of 1 is the same
as being 100% certain. We are going to explain
why this is true with several examples.
Okay! Imagine you are tossing a coin.
When it falls, we are guaranteed to get either
heads or tails. Therefore, if we account for
the sum of all the probabilities of getting
heads OR tails, we have completely exhausted
all possible outcomes. We have accounted for
the entire sample space, so we are 100% certain
to get one of the two. Since we are certain
one of these will occur, the sum of their
probabilities should be 1.
So, what would it mean if we have a sum of
probabilities greater than 1?
Recall that probability of 1 expresses absolute
certainty. By definition, we cannot be any
surer than being absolutely sure, so a probability
of 1.5 does not make intuitive sense. Instances
where we can get such a sum of probabilities
is when some of the assumed outcomes can occur
simultaneously. This means we are double-counting
some of the actual possible outcomes. We will
learn how to deal with such issues when we
introduce Bayesian notation less than an hour
from now.
Now, another peculiar case is if we end up
with a sum of probabilities less than 1. Then
we have surely not accounted for one or several
possible outcomes. Probability expresses the
likelihood of an event occurring, so any probability
less than one is not guaranteed to occur.
Therefore, there must be some part of the
sample space we have not yet accounted for.
Great!
Before we move on, we want to tell you that
all events have complements and we denote
them by adding an apostrophe. For example,
the complement of the event “A” is denoted
as “A apostrophe”. It is also worth noting
that the complement of a complement is the
event itself, so “A apostrophe, apostrophe”
would equal “A”!
Now imagine if you are rolling a standard
6-sided die and want to roll an even number.
The opposite of that would be NOT rolling
an even number, which is the same as wanting
to roll an odd number.
Complements are often used when the event
we want to occur is satisfied by many outcomes.
For example, you want to know the probability
of rolling a 1, 2, 4, 5 or 6. That is the
same as the probability of NOT rolling a 3.
This concept is extremely useful and will
definitely come in handy during the next section!
We already said that the sum of the probabilities
of all possible outcomes equals 1, so you
can probably guess how we calculate complements.
The probability of the inverse equals 1 minus
the probability of the event itself. To make
sure you understand the notion well, we will
look at the example we mentioned earlier.
The sum of probabilities of getting one, two,
four, five or six is equal to the sum of the
separate probabilities. The likelihood of
each outcome is equal to one sixth, so the
sum of their probabilities adds up to five
sixths.
Now, another way of describing getting “one,
two, four, five or six” is “not getting
a three”.
Let us calculate the probability of not getting
a 3. This is the complement of getting a 3,
so we know the two should add up to 1. Therefore,
the probability of not getting a 3 equals
1 minus the probability of getting a 3. We
know that P of 3 equals one sixth, so the
probability of not getting three is equal
to 1 minus one sixth. Therefore, the probability
of not getting 3 is five sixths.
This shows that the probability of getting
one, two, four, five or six is equal to the
probability of not getting a three.
In this section of the course, we are going
to examine Combinatorics and how it relates
to probability.
For starters, Combinatorics deals with combinations
of objects from a specific finite set. In
addition, we will also consider certain restrictions
that can be applied to form combinations.
These restrictions can be in terms of repetition,
order or a different criterion.
We will explore the three integral parts of
Combinatorics: Permutations, Variations and
Combinations. Then we would use each of these
parts to determine the number of favourable
outcomes or the number of all elements in
a sample space.
In a previous lecture, we talked about the
likelihood of winning the lottery without
explaining the whole computational process.
By the end of the section you will be able
to perform such calculations on your own!
In this lecture, we are going to look at one
of the most commonly used parts of Combinatorics
– Permutations.
Permutations represent the number of different
possible ways we can arrange a set of elements.
These elements can be digits, letters, objects
or even people. To clear any confusion, let’s
look at an example.
Imagine you haven’t watched the latest Formula
1 race, but your friend spoiled who the 3
drivers at the podium are – Lewis, Max and
Kimi. A permutation of 3, denoted P of 3,
would express the total number of different
ways these drivers could split the medals
among one another.
Suppose Lewis won the race. Then we have two
possible scenarios – Max finished second
and Kimi finished third, or Kimi finished
second and Max finished third.
Now, suppose that Max won. Once again, we
have two possible outcomes, but this time
it is Lewis and Kimi who have to split the
silver and bronze medals. Either Kimi got
silver and Lewis got bronze or the other way
around.
Not surprisingly, if Kimi won the race, we
would have 2 more ways the drivers can be
arranged on the podium. Either Max gets silver
and Lewis gets bronze or Lewis gets silver
and Max gets bronze. In total, this leaves
us with 6 unique ways these 3 drivers can
split the top 3 spots.
We call these 6 ways permutations and we are
going to show you how to compute the number
of permutations for a finite set of any size
and in different situations!
Great!
Now, let’s discuss the intuition behind
computing the total number of permutations
for a set of n-many elements. We start filling
out the positions one by one.
The order in which we fill them out is completely
up to us. For convenience we usually start
with the first slot, which represents the
race winner in our example. Since anybody
out of the n-many drivers in the set could
have won the race, we have n different possible
winners.
After that, we have “n minus one” possible
drivers left and any one of those can finish
second. Regardless of which out of the n elements
we chose to take the first slot, we have “n
minus one” many possibilities for the second
slot.
Similarly, we would have “n minus two”
possible outcomes for who finishes third and
so on. Generally, the further down the ranking
we go, the more options we exhaust, and the
more options we exhaust, the fewer options
we have left. This trend will continue until
we get to the last element, for which we will
only have a single option available.
Therefore, mathematically, the number of permutations
would equal the product of n, n minus 1, n
minus 2 and so on until 1.
We denote this product as “n factorial.
This is going to be a short lecture explaining
factorials.
The notation ‘n factorial’ is used to
express the product of the natural numbers
from 1 to n. This means that n factorial equals
1, times 2, times 3, all the way up to n.
For instance, 3 factorial is equal to 6, since:
1, times 2, times 3, equals 6
Simple enough, right?
For the remainder of the lecture, we are going
to explain some important properties of factorial
mathematics.
Before we get into the more complicated concepts,
you should know that there is one odd characteristic:
negative numbers don’t have a factorial,
and zero factorial is equal to 1 by definition.
All right. Let’s explore the first property.
For any natural number n, we know that:
n factorial equals, n minus 1, factorial,
times n
Similarly, n plus one factorial, equals n
factorial, times, n plus one.
For example, six factorial, equals, five factorial
times 6. In the same way, 7 factorial equals
six factorial, times 7.
This notion can be expanded further to express
n plus k factorial, and n minus k factorial
In mathematical terms, this is equivalent
to:
N plus k factorial, equals, n factorial, times,
n plus 1, times, n plus 2, and so on, up to
n plus k
Similarly,
n minus k factorial, equals, n factorial,
over n minus k plus 1, times, n minus k plus
2, all the way up to n minus k plus k, which
equals n.
For instance, if n is 5 and k is 2, then:
Five plus two factorial, equals 7 factorial,
or, 5 factorial, times, 6, times 7, and also:
Five minus two factorial, equals 3 factorial,
or, 5 factorial, over 4, times 5.
Ok. Great!
An important observation is that, if we have
two natural numbers k and n, where n is the
greater number, then
n factorial, over k factorial, equals, k plus
1, times, k plus 2, all the way up to n.
Let us look at an example, where n is 7 and
k is 4.
Then 7 factorial, over 4 factorial, equals
the product of the numbers between 1 and 7,
over the product of numbers between 1 and
4. We can simplify this by crossing out 1,
2, 3 and 4 since they occur in both parts
of the fraction. Doing so leaves us with 5,
times 6, times 7.
Great!
Now you have the tools that would allow you
to handle factorial operations
So far, we explained what permutations are
and how to compute them. In this lecture,
we are going to focus on a similar, but not
too similar concept – variations.
Variations express the total number of ways
we can pick and arrange some elements of a
given set. For example, imagine you went on
vacation and forgot the code for the combination
lock on your carry-on.
Luckily for you, the lock requires a two-letters
code using only the letters A, B and C to
unlock it.
We can approach the problem, as we did before
and explore the different positions in a specific
order. Let us start with the first letter.
We have 3 different options– A, B or C.
Suppose we chose A and move on to the next
letter. Since we can repeat values, we once
again have the same 3 options for the second
letter – A, B or C. This indicates that
there are 3 different variations if we decide
to start with A. Now, if we put B in the first
position, again we would have 3 options for
what we choose for the second letter.
In general, regardless of which one of the
3 letters we decide to start with, we are
going to have 3 different options for the
second letter.
Therefore, the total number of variations
we can get is “3, times 3, equals 9”.
The formula we use to calculate variations
with repetition is the following:
“V bar, of n and p, equals n to the power
of p”, where n is the total number of elements,
we have available, and p is the number of
positions we need to fill.
The way we interpret this notation is “The
number of variations with repetition when
picking p-many elements out of n elements,
is equal to n to the power of p.”
If we apply this to the combination lock example,
we would write “V bar of 3 and 2, equals
3 to the power of 2, which is equal to 9”.
We interpret this as "There are 9 different
variations of 2-letter passcodes consisting
of A, B or C only”.
What happens if the lock could use any of
the 26 letters? We would have 26 to the power
of 2, which is 676, different variations.
We’ve already covered how to calculate variations
for events with repetition. In this lecture,
we are going to focus on events where we apply
variations without repetition.
Okay, let’s begin!
Imagine you are a track and field coach and
need to choose which 4 members of your team
run the relay and in what order. The team
consists of 5 people: Tom, Eric, David, Kevin
and Josh and you must decide who starts, who
anchors and who runs in between.
Okay!
We have 5 members in the team, which means
5 different scenarios for who we want to start.
Let’s say we know that David is the best
guy for the job. That decision leaves us with
only 4 options for who gets the second position,
namely - Tom, Eric, Kevin or Josh.
Suppose, we choose Josh to run after David.
That leaves us with only 3 options for who
runs third – Tom, Eric or Kevin.
Now, if we pick Kevin, then we know one of
either Tom or Eric finishes the race. Not
surprisingly, if we chose Eric to run third
instead, we once again have two choices for
who runs last – Tom or Kevin. Finally, if
we choose Tom to run third, we would have
two options available – Eric and Kevin.
This means we can have 6 different variations
for who runs the last two positions if we
have chosen David and Josh to run first and
second respectively. Using a similar logic,
if we pick somebody different than Josh to
run second, we would still have 6 possible
variations for who runs third and fourth.
In fact, regardless of who out of the 4 members
of the team we choose to run second, we would
always have 6 options for who fills out the
remaining spots on the team.
That suggests there are 4, times 6, or 24
different ways to arrange the 3 remaining
positions if we knew David starts.
What if we aren’t sure that David is the
best runner to start? Well, if somebody out
of the remaining 4 people started, we would
still have 24 ways of filling out the remaining
spots on the team.
What happens is, the further down the order
we go, the fewer options we are left with,
since nobody is permitted to run multiple
legs. This is what variations without repetition
is about. We cannot use the same element,
or in this case, person – twice.
In terms of numbers, we have: 5, times 4,
times 3, times 2.
This makes 120 different options of how to
arrange our team for the competition.
That wasn’t too bad, right?
It is time to introduce you to the formula
for calculating variations without repetition.
The number of variations without repetition
when arranging p elements out of a total of
n, is equal to “n factorial, over, n minus
p factorial”.
Applying this formula to our example, the
total number of variations without repetition
would be equal to 5 factorial, over 1 factorial.
Using the properties we introduced in the
bonus lecture about factorials, this is equal
to “2, times 3, times 4, times 5”, which
equals 120.
In this lecture we are going to introduce
you to combinations, what they are and when
to use them.
Combinations represent the number of different
ways we can pick certain elements of a set.
Imagine you are trying to pick 3 people to
represent your company on a very important
technology-related conference. There are 10
people working in the office, so how many
different combinations are there?
If you calculated this as a variation, your
answer would be 720 but you would be counting
every group of 3 people several times over.
This is because, picking Alex, Sarah and Dave
to go to the conference is the same as picking
Alex, Dave and Sarah.
As you just saw, variations don’t take into
account double counting elements. That is
where combinations step in.
This next sentence might sound confusing at
first, so please play close attention.
We can say that all the different permutations
of a single combination are different variations.
Let us look at the Sarah, Alex and Dave example.
Choosing those 3 to represent the company
is a single combination. Since the order in
which we pick them is not relevant, choosing
Sarah, Alex and Dave is exactly the same as
choosing “Sarah, Dave and Alex”, “Dave,
Sarah and Alex”, “Dave, Alex and Sarah”,
“Alex, Dave and Sarah”, or “Alex, Sarah
and Dave”.
Any of the 6 permutations we wrote is a different
variation, but NOT a different combination.
That is what we meant when we said that combinations
take into account double-counting.
Okay!
Recall that the formula for calculating permutations
of n-many elements is simply n factorial.
Since n is 3 in this case, there would be
a total of 6 permutations for choosing Alex,
Dave and Sarah.
Since variations count these 6 as separate,
we are going to have 6 variations for any
combination. This means that we are going
to end up with 6 times fewer combinations
than variations. Using the formulas we already
know, there are 10, times, 9, times 8, or
720 variations. In terms of combinations we
have: 720, divided by 6, or 120 ways of choosing
who represents the company.
Whew! Good job!
Let’s repeat the sentence we used before
once again. We can say that all the different
permutations of a single combination are different
variations. There are 6 permutations, 120
combinations, and 720 variations. Hope it
is much clearer now!
Okay.
From here you can already imagine what the
formula is. Let’s construct it together!
What’s the number of combinations for choosing
p-many elements out of a sample space of n
elements? As you saw in the last example,
the number of combinations equals the number
of variations, over the number of permutations.
Mathematically, we would write that as C,
equals, V over P.
If we plug in the formulas associated with
variations and permutations, we get the following
formula:
n factorial, over, p factorial, times, n minus
p factorial.
Let’s apply the formula to our example!
The number of combinations equals: “10 factorial,
over 3 factorial, times 7 factorial”. After
simplifying, we get “8, times 9, times 10,
over, 1, times 2, times 3”. This is equal
to 720 over 6, which is 120. Exactly what
we got earlier.
Just to make sure everything is on the same
page, let’s go through another example.
What if we had to choose 4 out of 10 people
to go to the conference?
According to the formula, we are going to
have 10 factorial, over 4 factorial, times
6 factorial combinations of doing so. After
some simplifications, this would equal “7,
times 8, times 9, times 10, over 1, times
2, times 3, times 4”. After computing the
values, this equals 5040 over 24, which is
210.
Good job, everyone!
This is going to be a short lecture where
we focus on the symmetry of combinations.
Unlike permutations and variations, picking
more elements can lead to having fewer combinations.
Imagine you are going on a picnic and have
6 pieces of fruit you want to take with you.
However, your basket only has room for 4 of
them. Using the combinations formula, we are
going to have 15 possible choices.
Therefore, you go out and buy a bigger basket,
but it turns out it can only fit 5 pieces
of fruit. According to the formula, there
are just 6 ways of picking the five fruits.
What if you get an even bigger basket which
can fit 6 fruits? Well, in how many ways can
you pick 6 fruits out of 6 fruits? Only 1
– picking all of them!
So, we see that in this case, picking more
elements leads to having fewer combinations.
This is because, we can construct the question
in a different way: instead of picking which
pieces of fruit to take, we can choose the
pieces to leave behind.
Therefore, picking 4 fruits out of 6, is the
same as choosing 2 fruits that will be left
out. Mathematically, selecting 2 elements
out of 6 equals: 6 factorial, over, 2 factorial,
times 4 factorial, or 15.
How about 5 out of 6 fruits? It is the same
as choosing which 1 out of the 6 to leave
behind.
In the general case, we can pick p-many elements
in as many ways as we can pick n minus p many
elements. This shows us that when it comes
to combinations, the number of possible ways
in which p-many elements can be selected is
symmetric with respect to n over 2. And this
symmetry is the topic of this lesson.
So far so good!
Recall the example where we had to select
3 of our 10 employees to represent the company
at a conference. As we already showed, there
are 120 possible selections. What if instead,
of choosing 3 we had to pick 7 people to go
to the conference?
Well, using the formula we defined in the
last lecture, the number of combinations we
would have is 10 factorial, over 7 factorial,
times 3 factorial. That is equivalent to 8,
times 9, times 10, over 1, times 2, times
3, or 720 divided by 6, which is 120.
Thus, we would also have 120 different ways
of picking the 7 employees.
That’s because picking 7 out of 10 employees
to take to the conference is the same as choosing
3 out of 10 to leave behind.
To sum up, when p is greater than n over 2,
n minus p would be smaller than p. In such
instances we can apply symmetry to avoid calculating
factorials of large numbers. Generally, we
use symmetry to simplify the calculations
we need to make.
This is going to be a short lecture, where
we introduce a new type of combinations. Sometimes,
a Combination can be a mixture of different
smaller individual events.
Imagine the following scenario. The diner
near work just introduced a lunch menu, which
consists of a sandwich, a drink and a side.
Assuming you go there every day, how long
will it take for you to be able to try out
every possible item in the menu?
To solve this, you need to know what is included
in their lunch deal. Each menu consists of
a sandwich, a side and a drink. They offer
3 types of sandwiches – a panini, a Philly
cheese stake and a veggie wrap. The sides
they have available are only fries and onion
rings and the drinks they offer are cola or
water.
The way to tackle such problems is by thinking
about the different parts of the menu as separate
positions. If we start by choosing a sandwich
first, we have 3 options. For each of them,
we can pick one of two sides – fries or
rings. To complete our menu, we would also
have to add a drink, which can either be water
or coke. Therefore, for any combination of
sandwich and side, we have two ways of completing
the menu.
Thus, we would have a total of 3 dishes, times
2 sides, times 2 drinks, or 12 different lunch
menus at the Diner.
That wasn’t that hard, was it?
Good!
In online marketing you often need to try
out several versions of an online ad before
you decide which one is the best. Imagine
you are using Facebook’s promotional advertisements.
Your ad consists of 4 parts - a heading, a
thumbnail, a post description and a clickable
button.
Now, you have 3 different headings, 5 thumbnails,
3 post descriptions and 2 buttons. How many
different ads would you have to generate to
make sure you have tried all possibilities?
Using the method, we showed earlier, that
would require you to test out, 3, times 5,
times 3, times 2, equals 90 different ads.
This is important because it shows us how
many different possibilities there are, despite
the choices for each part seeming limited.
Furthermore, this allows project managers
to determine the appropriate amount of time
it would take for such a task to be completed.
When the components are simply too many, they
can flat out remove several of the options
to tremendously decrease the workload.
Good job, everyone!
The way of calculating the total number of
combinations for these kinds of questions
is by simply multiplying the number of options
available for each individual event.
In the beginning of the course, we talked
about a person’s chances of winning the
lottery, but we didn’t go through any actual
calculations.
For starters, let us define the rules of this
lottery. We need to pick 5 numbers between
1 and 69 and a “Powerball” number between
1 and 26.
As we mentioned in the beginning of the course,
the likelihood of two independent events occurring
simultaneously equals the product of their
individual probabilities. In this case, one
event would be guessing the “Powerball”
number and the other would be getting the
correct 5 numbers.
Okay, let’s start with the simpler part
– picking our “Powerball” number. Since
there is only 1 favourable outcome, the chance
of picking the correct ‘’Powerball’’
number is one twenty-sixth.
Now, how about picking 5 numbers out of 69.
As you know, order does not matter with lottery
numbers, so we would have to use combinations.
Obviously, we cannot have the same value twice.
This means that we are dealing with combinations
without repetition.
Therefore, let’s apply the relevant formula.
This suggests the total number of possible
ways to pick our 5 numbers is 69 factorial,
over 5 factorial, times, 69 minus 5, which
is 64, factorial.
The result is over 11 million. Therefore,
we would have a lower chance than 1 in 11
million of correctly guessing the 5 numbers.
To win the grand prize we would also have
to correctly guess the "Powerball” number,
so guessing all the numbers becomes 26 times
less likely. Therefore, there are almost 300
million different possible outcomes.
Okay.
To get the probability of winning we need
two pieces of information: number of favourable
outcomes and the number of all possible outcomes.
We already have the latter.
What about the number of favourable outcomes?
Well, they are going to be equal to the number
of tickets we buy. Assuming we participate
with a single ticket, we can calculate the
probability of winning.
Using the ‘favourable over all’ formula,
we get approximately 1 over 300 million. In
other words, the probability of winning the
lottery with a single ticket is approximately
equal to a figure that is slightly above zero
point zero, zero, zero, zero, zero, zero,
zero, zero, three.
Great!
In this case, we had two events that needed
to happen concurrently for us to win the big
jackpot. The first one consisted of choosing
the 5 numbers and the second one was picking
the correct “Powerball” number.
With this notion we more or less exhaust the
Permutations, Variations, and Combinations
topic.
The purpose of this video is to briefly summarize
what we learned about combinatorics in this
part of the course.
Okay!
We use permutations and variations when we
must arrange a set of objects. In such cases,
the order in which we pick them is crucial.
The major difference between the two is that
in permutations you always arrange the entire
set of elements in the sample space.
For instance, we would use permutations when
we need to arrange the 4 runners we have already
chosen for our relay. We have 4 runners and
4 positions, so we rely on permutations. If,
however, we had to pick 4 out of the 6 people
on the team and then decide who runs which
leg, we would require using variations. Alternatively,
if we only care about which 4 out of the 6
runners made it into the team, we would be
dealing with combinations. In this instance,
we do not care who runs which leg, so order
is irrelevant.
Perfect.
Another important topic we discussed is that
there are two types of variations and combinations
– with and without repetition. When we explore
those without repetition, we see a clear relationship
between permutations, variations and combinations:
“The number of combinations equals the number
of variations, divided by the number of permutations.”
That is because we count all the permutations
of a given set of numbers as a single combination,
but as separate variations.
We also defined formulas we use to compute
their values.
“P equals n factorial”, “V equals n
factorial, over n minus p factorial” and
“C equals n factorial, over p factorial,
times n minus p, factorial”.
Recall that these formulas change when we
include repeating values.
“V bar equals n to the power of p” and
“C bar equals, n plus p minus 1, factorial,
over, p factorial, times, n minus 1, factorial”.
Great!
Furthermore, we talked about how combinations
are symmetric. Choosing p-many elements out
of a set of n-many elements can be done in
the exact same way as choosing n minus p-many
elements. The reason is we can reverse the
problem and actually choose the elements to
be omitted.
This video is going to be a practical example,
where we showcase and apply most of the knowledge
gained in this section of the course.
More precisely, we are going to use our newly-acquired
understanding of Combinatorics to aid us in
our quest of ordering something nobody has
ordered before.
So, you and your friend, let’s call her
Amy, want to get some Dominos Pizza for your
night in. You open the Dominos app on your
phone and see the store which delivers to
your home offers several deals – a “One
Person Combo Pack”, a “Family Deal”
and “Double Delight”.
The first one consists of a medium pizza,
a small drink and a dessert for $15.99. The
family deal includes 2 medium-sized pizzas,
a small drink, a large drink and 2 desserts
for $25.99. Lastly, “Double Delight” also
includes two pizzas, two drinks and two desserts
and costs $27.99. The only difference being,
the two drinks are both large.
Before picking a specific deal, you pull up
the menu to explore the options. You see they
offer a variety of 26 different pizzas, 4
distinct drinks as well as 7 types of desserts.
Initially, Amy says she isn’t that hungry,
so you look at the “One Person Combo Pack”.
Now, to construct such a menu you need to
combine 3 elements – a pizza, a small drink
and a dessert. Since all of these have distinct
sample spaces, we consider every part of the
menu as a separate position.
Dominos offer 26 pizzas, so there are 26 different
ways we can choose our single pizza. Similarly,
we have 4 ways of picking our preferred beverage
and 7 ways to pick our dessert.
This scenario is identical to the diner example
we explored a few lectures ago, where we had
to construct a lunch menu out of a sandwich,
a side and a drink. Therefore, we know how
to find the total number of different combinations.
We simply multiply the size of the sample
space for each position. This would result
in 26, times 4, times 7 or a total of 728
possible ways to fill out this deal. Now,
we use multiplication here because regardless
of which of the 26 pizzas we get, we can accompany
it with any of the 4 beverages. Following
the same logic, for any of those “pizza
& drink” combos, we have 7 different dessert
options.
Fascinated by the huge variety, your friend
reconsiders and decides to get some food too.
Not because she’s hungry, because she wants
to play the combinations game and increase
the number of options you have. Thus, you
are compelled to get one of the two larger
deals – the “Family Deal” or the “Double
Delight”.
Amy states that since you are getting 2 pizzas
now, you can get different ones and exchange
slices. Furthermore, she proclaims that “Since
we are ordering two of each of the 3 parts
of the meal, we now have 8 times as many options”.
Having recently mastered combinatorics, you
decide to test if her statement holds true.
To do so, you must compute the number of different
menus you can order for each of the other
two deals.
You examine the “Family deal” they offer
first. Recall that it includes 2 pizzas, a
small drink, a large drink and 2 desserts
for the price of $25.99.
Once again, you decided to get different pizzas
to get the most out of your deal. Since the
drinks vary in size and desserts are small,
you decide to make no such commitments about
the other two ingredients.
Alright!
Let’s first look at the pizzas you can choose.
Since you are splitting the two among one
another, no order is involved in this decision.
Thus, you would need to use combinations without
repetition to determine the correct amount
of pizza options this deal provides.
You apply the formulas you learned already
which results in 26 factorial, over 24 factorial,
times 2 factorial. This simplifies to 25,
times 26, over 1, times 2 or 325.Your conclusion
is that there are 325 different ways of picking
the pizzas.
Woah! That is a much greater variety than
the 26 we had available earlier.
Now, for the drinks.
You can both order the same one, so repetition
is allowed. Furthermore, since it matters
which drink is larger, this situation requires
using variations. Therefore, we apply the
formula for variations with repetition for
picking 2 out of the 4 available options.
By plugging values into the formula, we see
that we have 4, to the power of 2, or 16 options
for the drinks.
Alternatively, you can think of these in terms
of combinations of two events with separate
sample spaces. By doing so, we get two events,
both of which have sample spaces of size 4.
As we can expect, the answer would remain
the same since 4, times 4 also equals 16.
We once again have much more variety compared
to the first deal.
Lastly, you reach the desserts.
Just like with the drinks, you did not agree
to get different ones, so you can both get
ice-cream. If you end up ordering different
ones, it matters to you who gets each one.
For instance, if Amy wants the “Choco Pizza”
and you want the “Brownie Bites” it is
crucial who gets each one. Therefore, we must
use variations with repetition here as well,
right?
Well, not really. Dominos doesn’t need to
know who the two desserts are for. They just
put them in the bag and let you distribute
them among yourselves. In such a case, you
would order the “Choco Pizza” and the
“Brownie Bites” regardless of who prefers
each dessert. Therefore, we are dealing with
combinations with repetition instead of variations
with repetition.
Recall the formulas from the bonus lecture
on combinations with repetition. The number
of different ways we can choose “p” elements
out of “n”, with possibly recurring values
equals “n plus p, minus 1” factorial,
over “p” factorial, times “n minus 1”
factorial.
Since we are picking 2 desserts out of a possible
7, we can plug in these values to get 7, plus
2, minus 1, or 8, factorial, over 2 factorial,
times 6 factorial. After some simplifications,
this equals 7, times 8, over 1, times 2, or
28. Therefore, there are 28 different ways
to pick our 2 desserts out of the menu.
Right!
So far, we found out we have 325 ways of choosing
our pizzas, 16 options for our drinks and
28 combinations of desserts we can order.
Therefore, we need to multiply these to get
a total of 325, times 16, times 28, or 145,600
different options for this deal.
Woaw! That is a lot of variety!
That is definitely more than 8 times greater
than 728 options you could get with the first
deal. So, Amy was wrong after all.
Before you place your order, you still go
through the third deal to make sure you are
choosing the offer with the greatest variety.
The “Double Delight” contains the same
number of pizzas and desserts as the “Family
Deal”. Therefore, you already know you have
325 ways of choosing the pizzas and another
28 ways of picking the desserts.
Thus, you only have to estimate the number
of ways to pick the beverages that go with
your order. The drinks are large so just like
with the pizzas, you decide to get different
ones since you can share.
Coca-Cola is still running the campaign where
they put names on the labels of each bottle,
so it matters who gets each drink. For instance,
if you want a coke and your sister wants a
sprite, you would not want the Coca-Cola label
to have her name instead of yours.
Therefore, it is important how you order the
two beverages and the special directions you
leave for the delivery. This suggests you
must use variations without repetition to
determine the appropriate number of possibilities.
According to the formula we introduced earlier,
the number of ways of choosing your beverages
would equal 4 factorial, over 2 factorial.
This results in you having 12 different ways
of getting your drinks.
Since there are 325 ways of picking the pizzas,
12 ways of choosing the drinks and 28 ways
of selecting the desserts, that makes up a
total of 109,200 distinct possible orders
you can make.
Just like with the “Family Deal” offer,
the number of possible menus you can get is
more than 8 times greater than the 728 available
for the “One Person Combo Pack”. Thus,
you showed your friend that ordering twice
as much food, results in having much greater
variety than she had initially thought.
Because 109,200 is less than 145,600 you decide
to go with the “Family Deal” instead of
the “Double Delight” because of the greater
variety it provides. Even though the only
difference between the two deals is the size
of one drink, it results in us having 36,400
more options. This is a perfect example of
how small changes can severely expand the
possible number of outcomes we have. This
only goes to show that when using combinatorics,
the devil is in the detail because every small
difference matters.
Great job, everybody!
In this long practical example, we managed
to go over the various different parts of
combinatorics and when we use each one. We
put all the formulas from the section to good
use and made sure we provide logical arguments
before deciding which one to apply. The goal
of this practical example was to show you
how to apply a probabilistic approach to everyday
scenarios and get into an analytical mindset.
That was also the end of this section.
Thanks for watching!
