♪ I love it when you call me Big Data ♪
Welcome to the Dr. Data show!
I'm Eric Siegel.
Data science, big data, what
the hell do these buzzwords
really specifically mean?
Are they just cockamamie?
Intentionally vague jargon that
overhypes and overpromises?
Or are these terms actually helpful?
Do they somehow designate,
like, the most profound impact
of the Information Age?
Well, I'll start with the
vague and overhyping side
and then circle back
to why these buzzwords
may matter after all.
It's time for the Dr.
Data Buzzword Smackdown.
There are a lotta
problems with these words.
First, data scientist is redundant.
It's like calling a
librarian a book librarian.
If you're doing science,
it involves data, duh.
Furthermore, and don't
tell anyone I said this,
but real sciences like
physics and chemistry
don't have science in their name.
Your science is trying too hard
if it has to call itself a science.
Social science, political
science, data science,
and I gotta say, even though
I have three degrees in it
and was a professor of it,
computer science is an
arbitrarily defined field.
It's just the amalgam of
everything to do with computers,
as a concept and as an appliance,
from the engineering of how to build them
and the deep mathematics about
their theoretical limitations
to how to make them more user friendly,
and even business strategies
for managing a team of programmers.
Universities might as well also have
a toaster science department,
which covers the engineering
of better toasters
as well as the culinary arts
on how to best cook with them.
But I digress.
Okay, next buzzword, big data.
First of all, it's just
grammatically incorrect.
It's like looking at the Pacific
Ocean and going big water.
It should be a lotta
data or plenty of data.
But the real problem with big data
is that it emphasizes the size.
'Cause what's exciting about data isn't
how much of it there is per se.
It's about how quickly it's growing
which is amazing, by the way.
There's always so much more data today
than there was yesterday.
So we're gonna run out of
adjectives really quickly.
Big data, bigger data, even
bigger data, the biggest data.
Actually, there's been a
long-running conference called the
International Conference on
Very Large Databases since 1975.
I'm not joking.
That's before the first
Star Wars movie came out.
Now, in some cases, people
use the terms data science
and big data just to
refer to machine learning,
i.e. when computers
learn from the experience
encoded in data.
That's the topic of most
episodes of this program,
The Dr. Data Show.
It's a show about machine learning,
which is a well-defined field
and, by the way, is also often
called predictive analytics,
especially when you're
talking about its deployment
in the private or public sector.
I would urge folks to use
the well-defined terms
machine learning or predictive analytics
if in fact that's what you're
specifically talking about.
But as for data science and big data,
in their general usage they suffer from
a terrible case of vagueness.
They have a wide range of
subjective definitions,
which compete and conflict.
Basically, they're often used
to mean nothing more specific
than some clever use of data.
The terms don't necessarily refer
to any particular technology,
method, or value proposition.
They're just plain subjective.
You can use them to mean
whichever technology you'd like.
Machine learning, data visualization,
or even just basic reporting.
But much worse than that,
this vagueness often serves
to mislead and misrepresent
by alluding to capabilities
that don't exist.
For example, the popular press,
as well certain analytics vendors,
sometimes use data science to denote
some whole collection of methods
that includes machine learning
as well as some other advanced methods.
The problem is, those other
advanced methods are implied
but often actually don't really exist.
They're vaporware.
This confusion is sometimes inadvertent,
such as when journalists
aren't fully knowledgeable
of the topic yet want it to
sound as powerful as possible
but, either way, the end
result is souped-up hype
that overpromises and
circulates misinformation.
All these issues, by the way,
also apply to the
older-school term data mining,
also totally subjective.
Besides, calling it data mining is like
instead of gold mining,
saying dirt mining.
Malfunction, failed analogy.
'Cause we aren't searching for data,
we're searching within data.
So now you're probably asking yourself,
how could Dr. Data come
down so hard on these words
if he loves data so much?
Well, no, Dr. Data
doesn't hate these words,
only the misleading ways in
which they're often used.
Dr. Data's love for data is fully intact.
After all, he named himself after it.
Anyway, let's talk data for a moment.
These buzzwords are all
data this and data that.
So what exactly is all
the fuss about data?
I mean, most people couldn't
be less interested in data.
The non-geeks out there
think it's the driest,
most boring word ever.
The word data is a deal-killer
at cocktail parties.
I know from personal experience.
I have the data.
And data just grows like a weed anyway.
It's so indiscriminately
collected and warehoused,
like some bland, uninteresting residue
that companies dump into the cloud
as they transactionally
churn away endlessly.
But, no, that's wrong.
Actually, let me make a correction.
It isn't indiscriminate.
The stuff logged into
all these memory banks
are exactly the things that matter.
That's why they're being recorded.
People think data's boring
because they're overlooking
the fact that data is experience.
It's a long list of prior events
from which it's possible
to analytically learn.
In fact, we could say
that data is powerful
and all-encompassing
for the very same reason
that it's misconstrued as boring,
which is that it's very abstract.
Data can mean anything and everything.
In its most abstract, it
means nothing in particular,
but in the particular, it always means
something valuable and interesting.
Every medical diagnosis,
medical procedure,
credit application, phone
call, Facebook post,
movie viewing, ad click,
fraudulent transaction,
spammy e-mail, traffic
camera passed, flight taken,
earthquake, purchase,
successful or failed sales call,
each positive and negative
outcome of any significance
is encoded as data somewhere.
There are quintillions
and quintillions of bytes.
That's my Carl Sagan impersonation.
Data grows by an estimated
2.5 quintillion bytes per day.
A quintillion is a one
with 18 zeros after it.
And here's the big win.
We can improve everything
with this data.
All the main functions and
day-to-day operational decisions
of companies and governments are exactly
what these data streams are recording.
Therefore, data records exactly the right,
relevant experience to
apply predictive analytics
where it's needed most.
We have just the right
data for this technology
to learn how to streamline
the major operations
behind financial risk management,
fraud detection,
marketing, law enforcement,
healthcare, and manufacturing.
Boom!
This is major.
We're witnessing an
epic, fundamental shift
in how technology integrates with, alters,
and improves society and its functions.
And so data isn't the
most boring after all.
In fact, it's the most
sexy?
The Harvard Business Review
declared data scientist
the sexiest job of the 21st century.
I mean, really?
Data people are the most sexy?
That's great news!
Geek is the new chic.
It's hip to be square.
You know, I had always
assumed the sexiest profession
was firefighters, but who knows.
Maybe it's just the hard hat.
This is a picture of me dressed up
as a data miner for Halloween.
Actually, the New York
City Fire Department
uses predictive analytics to triage
and prioritize the
inspections of buildings
with the highest risk of fire.
Yet another priceless
application of machine learning.
Anyway, we actually
produced a rap music video
about predictive analytics
and how being a data geek
affects your social life.
It's the the best ever
educational predictive analytics
rap music video ever created ever, period.
And also the only one.
Just three and a half minutes long.
You can check it out at PredictThis.org.
In conclusion, there's a
lot to be excited about
when it comes to the data explosion
and what we can do with it.
The buzzwords are kinda
inane when viewed up close.
Perhaps an equally appropriate
and less misleading buzzword
for all this would be datapalooza,
but, in any case, the terms
really allude to a culture
of smart people doing creative things
to make value of all this data.
Today's totally historic
advent of having data
about everything and
using data for everything
is mind-blowingly profound and important.
I'm Eric Siegel, thanks for watching.
Hit like and share this video
if you think your friends
were also wondering what
the hell data science
and big data really mean.
And for access to the entire web series,
go to TheDoctorDataShow.com.
♪ Who's your data? ♪
♪ Provide me the data to improve ♪
♪ And I'll apply the computation ♪
♪ I love it when you call me Big Data ♪
♪ Predictive analytics can
help you with decisions ♪
♪ You can call, mail, credit,
or hire with precision ♪
♪ On law, love, and life,
you can prognosticate ♪
♪ Whom to investigate, incarcerate, ♪
♪ Set up on a date, or medicate ♪
♪ Charlie Brown never gets his kicks ♪
♪ That's why every old dog
needs a brand new trick ♪
♪ If you get sick of chasing sticks ♪
♪ Or clicks with just a quick fix ♪
♪ You need to learn to predict ♪
♪ I can predict your every move ♪
♪ Just gimme all your information ♪
♪ Who's your data? ♪
♪ Provide me the data to improve ♪
♪ And I'll apply the computation ♪
♪ I love it when you call me big data ♪
