Hi, this is Stylianos from the Tesseract Academy.
In this video, we going to talk about data
science in marketing.
Marketing produces lots of data. So have data
around performance, such as the performance
of ads or products. But it also have data
that describe a user, their interactions,
their past interactions with the brand, the
demographics, etc. It makes sense that many
of techniques from data science coming either
from statistics or machine learning have their
applications in a field like marketing that
produces so much data. In this presentation,
we're going to talk about some use cases around
the use of machine learning and statistics
in the context of marketing.
The use of data science in marketing, it's
a relatively new thing. So there are many
things which are not standardized. There are
many applications that might overlap with
traditional business analytics. We're in a
situation where people in the space of marketing
are still trying to get in terms with things
such as predictive modeling, optimization,
or reinforcement learning. I hope that this
video will help you understand some of these
terms better.
So let's start, and let's see our first use
case. We're going to talk about conjoint analysis.
That's a very traditional application of statistical
analysis in marketing, and there are three
main steps in this process. The marketing
analyst will present different product profiles
to consumers, and the product profiles are
defined by their attributes. The customers
then, they rank or rate these products in
terms of preferences. Our goal is to find
how its feature affects the preferences of
the users.
So here is an example of this technique in
practice. You can see here different phones
and mobile packages, and there are some variables,
such as the brand, the startup cost, the monthly
cost, etc. You can give this to someone and
ask them to rank the different services in
terms of preference, which one they would
first go with, which one would be second,
etc.
So the goal of conjoint analysis is to analyze
each individual variable and then produce
a score for each one. The scores are centered
on zero. This means that we want to see how
much a variable adds or subtracts from the
ranking. So the reason we do this, the reason
we center at zero, is interpretability. Doing
this makes it easier to understand the results.
We can see, for example, in this graph the
[inaudible 00:02:30] monthly cost of $100,
in this case, is a very positive factor. It
increases the ranking of package. Having a
high cost, as it would be expected, reduces
the ranking by around three places.
So conjoint analysis can be a powerful tool
in order for you to understand how different
products, packages, are perceived by your
customers.
Another interesting application of data science
in marketing is predicting the engagement
of online ads. So there are three different
ways to do this. We can just predict whether
individual user will click on an ad or the
proportion of clicks over impressions an ad
will get. Or we can try to find the best ad
using some techniques like reinforcement learning,
which we're going to cover later.
So predicting whether someone will click on
an ad is a pretty common application of machine
learning. Facebook has done lots of research
in this area. Here is the link if you want
to learn more. You don't really have to read
the whole paragraph. I've placed in bold the
most important parts. The most important thing
to take away from here is that according to
Facebook, and they have conducted the most
important part of a pipeline that predicts
clicks on online ads, is having the right
features. This makes the most difference.
And what does Facebook mean by the term right
features? Well, they're mainly talking about
features that contain historical information
around a user. So this can be things such
as the click through rate of an ad for a given
time window, let's say last week, or the average
click through rate of a user.
Another interesting application of data science
in marketing is choosing the best performing
ads. This is related to the multi-armed bandits
problem in machine learning. So I'm sure all
of you must be familiar with those bandit
machines in casinos. You just pull a lever
and sometimes you win some coins.
So the multi-armed bandits problem describes
a situation where there are multiple different
bandits and each bandit has a different probability
of success. However, you don't know this probability
beforehand. So you have to experiment and
find, as fast as possible and with as high
as confidence as possible, which bandit has
the highest probability of success and go
with it.
So this is closely related to ads, because
we face a similar scenario. We face a scenario
where we have different ads. We don't know
which ones are going to perform best. We show
them to users. We get some clicks. We need
to decide as fast as possible which ad seems
to be the most engaging one. So we stick with
this one.
There are different algorithms for multi-armed
bandits. The ε-Greedy Algorithm is one of
the most popular ones where we start by taking
random actions and then we stick with the
best choice so far. But still, there is an
ε probability of us taking a random action.
So this algorithm tries to balance exploitation
with exploration.
Another approach is Thompson sampling, which
is based on Bayesian statistics. According
to this approach, we have to set a prior distribution,
which it can be uniform across all bandits
that we have in our knowledge around which
bandit, or in this case online ad, is the
best. Then we choose bandits in proportion
to their success and we update the parameters.
Again, this is another approach that balances
exploration with exploitation.
Here is a graph of the ε-Greedy approach.
You can see that in this approach we start
and we click on various bandits and we try
to figure out which one is the best. But after
a point, the algorithm converts this to the
best one. Now, when will this happen? It depends
on the problem and also on the parameter ε.
But at least, there's a guarantee that if
you try long enough, then eventually you're
going to convert to something useful.
Here is an example of Thompson sampling. We
don't have to stay in this for too long. It's
just an example just so you have from iteration
after iteration. We start with the uniform
prior across all bandits, 10 bandits in this
case. Then as we start iterating the algorithm,
form some opinion about the probability that
each bandit offers of succeeding. Then as
we run more and more experiments, you see
that eventually we get a very accurate picture
of the most successful bandit. Then the algorithm
focuses on this and tries to exploit this
as much as possible and [inaudible 00:07:05]
the rest of the bandits down there.
Something that I get asked very often is,
"Are bandits really better than A/B testing?"
A/B testing is an alternative to multi-armed
bandits. There's a trade off between the time
taken for statistical significance and how
fast and effective a method is. So A/B testing
can provide results that are highly significant.
If you run an A/B test, then you can be very
confident of the results you get. However,
you might lose many conversions until this
happens. Because when you do an A/B test,
you have to assign two different possibilities
for two groups of users, and one is most likely
inferior.
Whereas, with multi-armed bandits, if a choice
is clearly inferior, then the algorithm will
very quickly convert to the optimal solution.
However, as you show in the previous graphs,
you need a large number of iterations before
multi-armed bandits offer some large degree
of statistical significance.
