Did you know that US retail giant Walmart
generates 2.5 petabytes of data from approximately
1 million customers every hour?
And in case you’re wondering how much is
a petabyte, as I did when I first read this,
it is equal to 1 million gigabytes. The equivalent
of 13.3 years of HD video.
Considering that Walmart locations are open
for business for more than 10 hours a day,
we get a staggering 130 years of HD video
and 25 petabytes of data collected on a daily
basis!
Yes, there aren’t many companies like Walmart.
But even smaller enterprises nowadays generate
huge amounts of data, so, it becomes increasingly
more challenging to take advantage of such
information abundance.
And yes, data science is at the heart of all
that. But before we can apply data science,
we must do justice to another crucial player
– the cloud and cloud computing in general.
That’s exactly what we will focus on in
this video: Why cloud computing is essential
for data science in the 2020s.
But before we continue, let me tell you about
something else we’ve put together:
We’ve created ‘The 365 Data Science Program’
to help people enter the field of data science,
regardless of their background. We have trained
more than 350,000 people around the world
and are committed to continue doing so. If
you are interested to learn more, follow the
link in the description. It will also give
you 20% off all plans if you want to start
learning from an all-around data science training.
Now, back to cloud computing.
To understand the advantages cloud computing
provides when it comes to data science, let’s
imagine a world with as much data as we have
today, but without servers.
In such an unfortunate scenario, firms would
need databases that run locally, right?
So, every time when you, as a data scientist,
want to engage in new analyses or refresh
an existing algorithm, you’d have to transfer
information to your machine from the central
database, and then proceed to operate locally.
This unfortunate world would have several
main drawbacks:
Manual intervention would be necessary to
retrieve data
Your machine becomes a single point of failure
for the analyses you have worked on locally
Processing speed would be equivalent to the
computing power of your computer
Chances are you will be able to work with
a limited amount of data due to the limited
computing resources at your disposal
Moreover, under this setup, you wouldn’t
be able to leverage real-time data to build
recommender systems or any type of machine
learning algorithms that require ‘live’
data
Doesn’t sound like the perfect scenario,
does it?
Well, that’s why we invented servers. And
then these servers had drawbacks of their
own.
The most obvious one is that a server needs
space to be stored. A Cloud is basically somebody
else’s server, so their storage problem
Server infrastructure is expensive to buy
and set up. Cloud infrastructure is already
there and is simply awaiting your server consumption
In-house data storing requires you to have
backups and ideally – have them in different
locations. Clouds offer data everywhere, anytime,
usually backed up on many different servers
across the world
Servers need planning. For fast-growing companies,
server needs could be unpredictable even for
the current quarter. With in-house servers,
you usually end up buying more servers than
you actually need at a given time. With cloud
– you pay as much as you use.
You see my point, right?
Fortunately, we now have clouds. They overshadow
local servers in almost every conceivable
aspect. And, in fact, data scientists should
be focused on developing great algorithms,
testing hypothesis, taking advantage of all
available data without having to wait hours
to see the results of the tests they are performing
and certainly without having to worry how
much memory space they have left on their
computer. And yes, sometimes data scientists
do end up waiting for long hours for an algorithm
to train, but with a cloud, they have the
option to pay more and get the job done faster.
That’s yet another advantage of cloud computing
over servers.
That being said, the biggest winners are smaller
entities, as they get cheap access to the
same tools as enormous corporations. And this
is why cloud technologies are a huge enabler.
They create a level playing field and allow
small players to compete with much bigger
ones.
If you think about it, this technological
progress changed a number of businesses in
a way similar to how the Internet changed
commerce.
Remember when, all of a sudden, people around
the world were able to open e-commerce stores
and compete on a global scale with the established
firms?
Well, in the same way, cloud technologies
democratized data analysis and data science.
The fact that data scientists and data analysts
can rely on data stored on the cloud truly
makes their life so much easier!
In addition, most cloud providers allow data
scientists to access readily installed open-source
frameworks right away. This is not only super
convenient but can also be a huge time saver.
Alternatively, if you wanted to use Apache
Spark in the conventional way you would have
to:
• Start by installing java,
• Then continue by installing Scala
• After which you’ll be able to download
Apache Spark and install it.
That’s the setup you need to go through
if you are working on your own pc. However,
if you are using a cloud service, you’ll
be able to start working with the Apache Spark
framework right away! Yep, it’s been already
installed for you. The same is valid for many
different open-source frameworks.
This type of easy-to-access, easy-to-use infrastructure
is very attractive and potentially applies
to all sorts of applications data analysts
and data scientists use in their work.
Over the last few years, Amazon Web Services,
Microsoft Azure, and Google Cloud have tried
to boost their cloud services in terms of
capability to run machine learning algorithms.
The Big 3 of cloud services focused on this
area extensively, as they realized it could
be an important source of competitive advantage
in the long run. And, in case you’re wondering,
one of the biggest sell points of cloud machine
learning is that it allows small and medium
enterprises to access a machine learning infrastructure
they otherwise wouldn’t be able to afford.
For example, thanks to cloud-based machine
learning, a small e-commerce retailer could
run a real-time recommender system algorithm
to improve the product offering shown to customers
based on the products they have already added
to their cart. In this type of business, every
website click can be interpreted as a particular
type of intention and signal, and hence the
real-time updated algorithm operating in the
cloud will be able to make a suggestion that
improves the chances of making a conversion
and maximizing revenues.
Without cloud-based machine learning, setting
up the necessary infrastructure to perform
this type of analysis would be really costly
and difficult to execute for small and medium
enterprises.
It is still unclear who will win the cloud
war between giants like AWS, Microsoft Azure,
and Google Cloud. But one thing is certain.
This is a service that benefits greatly small
and medium-sized businesses, enabling them
to level the playing field when competing
against large multinationals with superior
IT infrastructure.
If you liked this video, don’t forget to
give it a like, or a share!
And if data science is what you’d like to
learn more about, subscribe to our channel
- you’ll find plenty of data science insights
and data science career advice.
Thanks for watching!
