Unlike a machine
learning project,
the output of a data
science project is
often a set of
actionable insights,
a set of insights that may cause
you to do things differently.
So, data science projects have
a different workflow than
machine learning projects.
Let's take a look
at one of the steps
of a data science project.
As our running example,
let's say you want to
optimize a sales funnel.
Say you run a e-commerce
or a online shopping
website that sells
coffee mugs and so for
a user to buy
a coffee mug from you,
there's a sequence of steps
they'll usually follow.
First, they'll visit
your website and take a look
at the different coffee mugs
on offer, then eventually,
they have to get
to a product page,
and then they'll have to put
it into their shopping cart,
and go to the shopping cart page,
and then they'll finally
have to check out.
So, if you want to optimize
the sales funnel to make
sure that as many people as
possible get through
all of these steps,
how can you use data science
to help with this problem?
Let's look at the key steps
of a data science project.
The first step is
to collect data.
So, on a website
like the one we saw,
you may have a data
set that stores when
different users go to
different web pages.
In this simple example,
I'm assuming that
you can figure out
the country that the users
are coming from, for example,
by looking at
their computers' address,
called an IP address,
and figuring out
what is the country
from which they're originating.
But in practice,
you can usually get
quite a bit more data about
users than just what
country they're from.
The second step is to
then analyze the data.
Your data science team may
have a lot of ideas about what
is affecting the performance
of your sales funnel.
For example, they may think that
overseas customers
are scared off by
the international shipping costs
which is why a lot
of people go to
the checkout page but
don't actually check out.
If that's true then
you might think
about whether to put
part of shipping costs
into the actual product costs
or your data science team
may think there are blips
in the data whenever
there's a holiday.
Maybe more people
will shop around
the holidays because
they're buying gifts or
maybe fewer people will shop
around the holidays
because they're staying
home rather than sometimes
shopping from
their work computers.
In some countries, there
may be time-of-day
blips where in countries
that observe a siesta,
so a time of rest like
an afternoon rest,
there may be fewer shoppers
online and so
your sales may go down.
They may then suggest
that you should spend
fewer advertising dollars during
the period of siesta because
fewer people will go online
to buy at that time.
So, a good data science team may
have many ideas and so they try
many ideas or will say iterate
many times to get good insights.
Finally, the data science team
will distill these insights
down to a smaller number
of hypotheses
about ideas of what could
be going well and what
could be going poorly
as well as a smaller number
of suggested actions such as
incorporating shipping costs
into the product
costs rather than having it
as a separate line item.
When you take some of
these suggested actions
and deploy these changes
to your website,
you then start to get new data
back as users behave
differently now
that you advertise
differently at the time of
siesta or have a different
check out policy.
Then your data science team
can continue to collect
data and we analyze the new data
periodically to see if
they can come up with
even better hypotheses or
even better actions over time.
So the key steps of
a data science project
are to collect the data,
to analyze the data,
and then to suggest
hypotheses and actions,
and then to continue
to get the data back
and reanalyze the data
periodically.
Let's take this framework and
apply it to a new problem,
to optimizing a
manufacturing line.
So we'll take these three steps
and use them on
the next slide as well.
Let's say you run
a factory that's
manufacturing thousands
of coffee mugs a
month for sale and you want to
optimize the manufacturing line.
So, these are the key steps
in manufacturing coffee mugs.
Step one is to mix the clay,
so make sure the appropriate
amount of water is added.
Step two is take this clay
and to shape the mugs.
Then you have to add the glaze,
so add the coloring,
a protective cover.
Then you have to heat
this mug and we call
that firing the kiln.
Finally, you would inspect
the mug to make sure there aren't
dents in the mug and it isn't
cracked before you
ship it to customers.
So, a common problem
in manufacturing is
to optimize the yield of
this manufacturing line to
make sure that as few damaged
coffee mugs get produced as
possible because those are
coffee mugs you
have to throw away,
resulting in time
and material waste.
What's the first step of
a data science project?
I hope you remember from
the last slide that
the first step is
to collect data.
So for example, you
may save data about
the different batches of
clay that you've mixed,
such as who supplied
the clay and how
long did you mix it,
or maybe how much moisture
was in the clay,
how much water did you add.
You might also collect data about
the different batches
of mugs you made.
So how much humidity
was in that batch?
What was the temperature
in the kiln and
how long did you
fire it in the kiln?
Given all this data
you would then ask
the data science team to analyze
the data and they would,
as before, iterate many times
to get good insights.
So, they may find
that, for example,
that whenever the humidity is too
low and the kiln
temperature is too hot that
there are cracks in
the mug or they may
find out that because
it's warmer in
the afternoon that
you need to adjust
the humidity and temperature
depending on the time of day.
Based on the insights from
your data science team you get
suggestions for hypotheses
and actions on how to
change the operations and
manufacturing line in order to
improve the productivity
of the line.
When you deploy the changes,
you then get new data back
that you can reanalyze
periodically
so they can keep on optimizing
the performance of
your manufacturing line.
To summarize, the key steps of
a data science project
are to collect the data,
to analyze the data,
and then to suggest
hypotheses and actions.
In this video and the last video
you saw some examples
of machine learning projects
and data science projects.
It turns out that machine
learning and data science are
affecting almost every
single job function.
What I want to do in
the next video is show you how
these ideas are affecting
many job functions,
including perhaps yours and
certainly that of many
of your colleagues.
Let's go on to the next video.
