>> Announcer: Check the host mike.
>> [Announcer] Check the host mike.
>> [Dan] Hey good evening
ladies and gentlemen.
I'm Dan Kaiser, I'm your
assistant stage manager
for the program today.
And welcome to Data Science for All:
It's a Whole New Game.
Now, this is going to be a live webcast.
You're going to see all the moving parts
that are going on up here.
We have cameras in the front of the stage
and our speakers are
going to be addressing
those cameras instead of you guys,
the audience.
So, while we encourage
you guys to be responsive,
this is a live TV broadcast.
And we ask that you take
side conversations outside.
So, we'll be getting
going in just a couple.
We'll see you in just a little bit.
Thank you.
(upbeat dance music)
Okay, back again.
Ladies and gentlemen
this would be a good time
to ensure that all your cell phones
and pages are silent.
We've got just one minute 'til
we're going to start the show.
Just one minute 'til show.
We'll see you guys in just a minute.
(upbeat techno music)
Music ready.
Playback ready.
Ready to cue video.
Five,
four,
three,
two,
one.
(upbeat dance music)
>> There's a movement that's sweeping
across businesses everywhere
here in this country
and around the world.
And it's all about data.
Today businesses are
being inundated with data.
To the tune of over two and
a half million gigabytes
that'll be generated in
the next 60 seconds alone.
What do you do with all that data?
To extract insights you typically
turn to a data scientist.
But not necessarily anymore.
At least not exclusively.
Today the ability to
extract value from data
is becoming a shared mission.
A team effort that spans the organization
extending far more
widely than ever before.
Today, data science is being democratized.
Data Sciences for All:
It's a Whole New Game.
Welcome everyone, I'm Katie Linendoll.
I'm a technology expert writer
and I love reporting on all things tech.
My fascination with
tech started very young.
I began coding when I was 12.
Received my networking certs by 18
and a degree in IT and new media
from Rochester Institute of Technology.
So as you can tell,
technology has always been
a sure passion of mine.
Having grown up in the digital age,
I love having a career that
keeps me at the forefront
of science and technology innovations.
I spend equal time in
the field being hands on
as I do on my laptop
conducting in depth research.
Whether I'm diving underwater
with NASA astronauts,
witnessing the new ways
which mobile technology
can help rebuild the Philippine's economy
in the wake of super typhoons,
or sharing a first look
at the newest iPhones
on The Today Show, yesterday,
I'm always on the hunt for the latest
and greatest tech stories.
And that's what brought me here.
I'll be your host for the next hour
and as we explore the
new phenomenon that is
taking businesses around
the world by storm.
And data science continues
to become democratized
and extends beyond the
domain of the data scientist.
And why there's also a
mandate for all of us
to become data literate.
Now that data science for
all drives our AI culture.
And we're going to be able
to take to the streets
and go behind the scenes
as we uncover the factors
that are fueling this phenomenon
and giving rise to a
movement that is reshaping
how businesses leverage data.
And putting organizations
on the road to AI.
So coming up,
I'll be doing interviews
with data scientists.
We'll see real world demos
and take a look at how
IBM is changing the game
with an open data science platform.
We'll also be joined by legendary
statistician Nate Silver,
founder and editor-in-chief
of FiveThirtyEight.
Who will shed light on how a
data driven mindset is changing
everything from business to our culture.
We also have a few people who
are joining us in our studio,
so thank you guys for joining us.
Come on, I can do better than that, right?
(audience clapping and cheering)
Live studio audience,
the fun stuff.
And for all of you during the program,
I want to remind you to
join that conversation
on social media using
the hashtag DSforAll,
it's data science for all.
Share your thoughts on
what data science and AI
means to you and your business.
And, let's dive into a whole
new game of data science.
Now I'd like to welcome my co-host
General Manager IBM Analytics, Rob Thomas.
>> Hello, Katie.
>> Come on guys.
>> Yeah, seriously.
(audience clapping and cheering)
>> No one's allowed to be
quiet during this show, okay?
>> Right.
>> Or, I'll start calling
people out (laughing).
So Rob, thank you so much.
I think you know this conversation,
we're calling it a data
explosion happening right now.
And it's nothing new.
And when you and I chatted about it.
You've been talking about this for years.
You have to ask,
is this old news at this point?
>> Yeah, I mean, well first of all,
the data explosion is not coming,
it's here.
And everybody's in the
middle of it right now.
What is different is the
economics have changed.
And the scale and complexity of the data
that organizations are having
to deal with has changed.
And to this day, 80% of
the data in the world
still sits behind corporate firewalls.
So, that's becoming a problem.
It's becoming unmanageable.
IT struggles to manage it.
The business can't get
everything they need.
Consumers can't consume it when they want.
So we have a challenge here.
>> It's challenging in
the world of unmanageable.
Crazy complexity.
If I'm sitting here as an
IT manager of my business,
I'm probably thinking to myself,
this is incredibly frustrating.
How in the world am I going to get control
of all this data?
And probably not just me thinking it.
Many individuals here as well.
>> Yeah, indeed.
Everybody's thinking about
how am I going to put data
to work in my organization in
a way I haven't done before.
Look, you've got to have
the right expertise,
the right tools.
The other thing that's happening
in the market right now
is clients are dealing with
multi cloud environments.
So data behind the
firewall in private cloud,
multiple public clouds.
And they have to find a way.
How am I going to pull
meaning out of this data?
And that brings us to data science and AI.
That's how you get there.
>> I understand the data science part
but I think we're all starting
to hear more about AI.
And it's incredible that
this buzz word is happening.
How do businesses adopt to this AI growth
and boom and trend that's
happening in this world right now?
>> Well, let me define it this way.
Data science is a discipline.
And machine learning is one technique.
And then AI puts both machine
learning into practice
and applies it to the business.
So this is really about
how getting your business
where it needs to go.
And to get to an AI future,
you have to lay a data foundation today.
I love the phrase,
"there's no AI without IA."
That means you're not going to get to AI
unless you have the right
information architecture
to start with.
>> Can you elaborate though in terms
of how businesses can really adopt AI
and get started.
>> Look, I think there's
four things you have to do
if you're serious about AI.
One is you need a strategy
for data acquisition.
Two is you need a modern
data architecture.
Three is you need pervasive automation.
And four is you got to expand job roles
in the organization.
>> Data acquisition.
First pillar in this you just discussed.
Can we start there and
explain why it's so critical
in this process?
>> Yeah, so let's think
about how data acquisition
has evolved through the years.
15 years ago, data acquisition was about
how do I get data in and
out of my ERP system?
And that was pretty much solved.
Then the mobile revolution happens.
And suddenly you've got structured
and non-structured data.
More than you've ever dealt with.
And now you get to where we are today.
You're talking terabytes,
petabytes of data.
>> [Katie] Yottabytes, I
heard that word the other day.
>> I heard that too.
>> Didn't even know what it meant.
>> You know how many zeros that is?
>> I thought we were in Star Wars.
>> Yeah, I think it's a lot of zeroes.
>> Yodabytes, it's new.
(audience laughing)
>> So, it's becoming more and more complex
in terms of how you acquire data.
So that's the new data landscape
that every client is dealing with.
And if you don't have a strategy for
how you acquire that and manage it,
you're not going to get to that AI future.
>> So a natural segue,
if you are one of these businesses,
how do you build for the data landscape?
>> Yeah, so the question I
always hear from customers
is we need to evolve our data architecture
to be ready for AI.
And the way I think about that
is it's really about moving
from static data repositories
to more of a fluid data layer.
>> And we continue with the architecture.
New data architecture is an
interesting buzz word to hear.
But it's also one of the four pillars.
So if you could dive in there.
>> Yeah, I mean it's a new twist on
what I would call some
core data science concepts.
For example, you have to leverage tools
with a modern, centralized data warehouse.
But your data warehouse can't be stagnant
to just what's right there.
So you need a way to federate data
across different environments.
You need to be able to bring
your analytics to the data
because it's most efficient that way.
And ultimately,
it's about building an
optimized data platform
that is designed for data science and AI.
Which means it has to
be a lot more flexible
than what clients have had in the past.
>> All right.
So we've laid out what you
need for driving automation.
But where does the
machine learning kick in?
>> Machine learning is
what gives you the ability
to automate tasks.
And I think about machine learning.
It's about predicting and automating.
And this will really change
the roles of data professionals
and IT professionals.
For example, a data scientist
cannot possibly know
every algorithm or every
model that they could use.
So we can automate the process
of algorithm selection.
Another example is things
like automated data matching.
Or metadata creation.
Some of these things may not be exciting
but they're hugely practical.
And so when you think
about the real use cases
that are driving return
on investment today,
it's things like that.
It's automating the mundane tasks.
>> Let's go ahead and
come back to something
that you mentioned earlier
because it's fascinating to be
talking about this AI journey,
but also significant is the new job roles.
And what are those other participants
in the analytics pipeline?
>> Yeah I think we're just at the start
of this idea of new job roles.
We have data scientists.
We have data engineers.
Now you see machine learning engineers.
Application developers.
What's really happening
is that data scientists
are no longer allowed to
work in their own silo.
And so the new job roles is about
how does everybody have
data first in their mind?
And then they're using tools
to automate data science,
to automate building machine
learning into applications.
So roles are going to change
dramatically in organizations.
>> I think that's confusing though because
we have several organizations who saying
is that highly specialized roles,
just for data science?
Or is it applicable to
everybody across the board?
>> Yeah, and that's the
big question, right?
Cause everybody's thinking
how will this apply?
Do I want this to be just
a small set of people
in the organization that will do this?
But, our view is data
science has to for everybody.
It's about bring data science to everybody
as a shared mission
across the organization.
Everybody in the company
has to be data literate.
And participate in this journey.
>> So overall, group effort,
has to be a common goal,
and we all need to be data literate
across the board.
>> Absolutely.
>> Done deal.
But at the end of the day,
it's kind of not an easy task.
(laughing)
>> It's not.
It's not easy but it's maybe not as big
of a shift as you would think.
Because you have to put
data in the hands of people
that can do something with it.
So, it's very basic.
Give access to data.
Data's often locked up in a
lot of organizations today.
Give people the right tools.
Embrace the idea of choice or diversity
in terms of those tools.
That gets you started on this path.
>> It's interesting to
hear you say essentially
you need to train everyone though
across the board when it
comes to data literacy.
And I think people that are
coming into the work force
don't necessarily have
a background or a degree
in data science.
So how do you manage?
>> Yeah, so in many cases that's true.
I will tell you some universities
are doing amazing work here.
One example, University
of California Berkeley.
They offer a course for all majors.
So no matter what you're majoring in,
you have a course on
foundations of data science.
How do you bring data
science to every role?
So it's starting to happen.
We at IBM provide data science courses
through CognitiveClass.ai.
It's for everybody.
It's free.
And look, if you want to
get your hands on code
and just dive right in,
you go to datascience.ibm.com.
The key point is this though.
It's more about attitude
than it is aptitude.
I think anybody can figure this out.
But it's about the attitude to say
we're putting data first and
we're going to figure out
how to make this real in our organization.
>> I also have to give a
shout out to my alma mater
because I have heard
that there is an offering
in MS in data analytics.
And they are always on the
forefront of new technologies
and new majors and on trend.
And I've heard that the
placement behind those jobs,
people graduating with the MS is high.
>> I'm sure it's very high.
>> So go Tigers.
All right, tangential.
Let me get back to something
else you touched on earlier
because you mentioned that a
number of customers ask you
how in the world do I get started with AI?
It's an overwhelming question.
Where do you even begin?
What do you tell them?
>> Yeah, well things
are moving really fast.
But the good thing is
most organizations I see,
they're already on the path,
even if they don't know it.
They might have a BI practice in place.
They've got data warehouses.
They've got data lakes.
Let me give you an example.
AMC Networks.
They produce a lot of the shows
that I'm sure you watch Katie.
>> [Katie] Yes, Breaking
Bad, Walking Dead,
any fans?
(audience cheering)
>> [Rob] Yeah, we've got a few.
>> [Katie] Well you taught me
something I didn't even know.
Because it's amazing how we have all
these different industries,
but yet media in itself is impacted too.
And this is a good example.
>> Absolutely.
So, AMC Networks, think about it.
They've got ads to place.
They want to track viewer behavior.
What do people like?
What do they dislike?
So they have to optimize
every aspect of their business
from marketing campaigns to promotions
to scheduling to ads.
And their goal was transform
data into business insights
and really take the burden
off of their IT team
that was heavily burdened by
obviously a huge increase in data.
So their VP of BI took the approach
of using machine learning to process
large volumes of data.
They used a platform that was designed
for AI and data processing.
It's the IBM analytics system
where it's a data warehouse,
data science tools are built in.
It has in memory data processing.
And just like that,
they were ready for AI.
And they're already seeing that impact
in their business.
>> Do you think a movement of that nature
kind of presses other
media conglomerates and
organizations to say
we need to be doing this too?
>> I think it's inevitable that everybody,
you're either going to be playing,
you're either going to be leading,
or you'll be playing catch up.
And so, as we talk to clients
we think about how do you
start down this path now,
even if you have to iterate over time?
Because otherwise you're going to wake up
and you're going to be behind.
>> One thing worth noting
is we've talked about
analytics to the data.
It's analytics first to the data,
not the other way around.
>> Right.
So, look.
We as a practice,
we say you want to bring
data to where the data sits.
Because it's a lot more
efficient that way.
It gets you better outcomes in
terms of how you train models
and it's more efficient.
And we think that leads
to better outcomes.
Other organization will say,
"Hey move the data around."
And everything becomes a
big data movement exercise.
But once an organization
has started down this path,
they're starting to get predictions,
they want to do it where it's really easy.
And that means analytics applied
right where the data sits.
>> And worth talking about the role
of the data scientist in all of this.
It's been called the
hot job of the decade.
And a Harvard Business
Review even dubbed it
the sexiest job of the 21st century.
>> Yes.
>> I want to see this
on the cover of Vogue.
Like I want to see the
first data scientist.
Female preferred, on the cover of Vogue.
(audience laughing)
That would be amazing.
(audience clapping)
>> Perhaps you can.
>> People agree.
So what changes for them?
Is this challenging in terms of
we talk data science for all.
Where do all the data science,
is it data science for everyone?
And how does it change everything?
>> Well, I think of it this way.
AI gives software super powers.
It really does.
It changes the nature of software.
And at the center of
that is data scientists.
So, a data scientist has a set of powers
that they've never had
before in any organization.
And that's why it's a hot profession.
Now, on one hand,
this has been around for a while.
We've had actuaries.
We've had statisticians
that have really transformed industries.
But there are a few
things that are new now.
We have new tools.
New languages.
Broader recognition of this need.
And while it's important to recognize
this critical skill set,
you can't just limit it to a few people.
This is about scaling it
across the organization.
And truly making it accessible to all.
>> So then do we need
more data scientists?
Or is this something
you train like you said,
across the board?
>> Well, I think you want
to do a little bit of both.
We want more.
But, we can also train more
and make the ones we have more productive.
The way I think about it is
there's kind of two markets here.
And we call it clickers and coders.
>> [Katie] I like that.
That's good.
>> So, let's talk about what that means.
So clickers are basically
somebody that wants to use tools.
Create models visually.
It's drag and drop.
Something that's very intuitive.
Those are the clickers.
Nothing wrong with that.
It's been valuable for years.
There's a new crop of data scientists.
They want to code.
They want to build with the
latest open source tools.
They want to write in Python or R.
These are the coders.
And both approaches are viable.
Both approaches are critical.
Organizations have to have
a way to meet the needs
of both of those types.
And there's not a lot of
things available today
that do that.
>> Well let's keep going on that.
Because I hear you talking
about the data scientists role
and how it's critical to success,
but with the new tools,
data science and analytics
skills can extend beyond
the domain of just the data scientist.
>> That's right.
So look, we're unifying
coders and clickers
into a single platform,
which we call IBM Data Science Experience.
And as the demand for data
science expertise grows,
so does the need for these kind of tools.
To bring them into the same environment.
And my view is if you
have the right platform,
it enables the organization
to collaborate.
And suddenly you've changed the nature
of data science from an
individual sport to a team sport.
>> So as somebody that,
my background is in IT,
the question is really is
this an additional piece
of what IT needs to do in 2017 and beyond?
Or is it just another
line item to the budget?
>> So I'm afraid that some
people might view it that way.
As just another line item.
But, I would challenge that and say
data science is going to reinvent IT.
It's going to change the nature of IT.
And every organization
needs to think about
what are the skills that are critical?
How do we engage a
broader team to do this?
Because once they get there,
this is the chance to reinvent
how they're performing IT.
>> [Katie] Challenging or not?
>> Look it's all a big challenge.
Think about everything IT
organizations have been through.
Some of them were late
to things like mobile,
but then they caught up.
Some were late to cloud,
but then they caught up.
I would just urge people,
don't be late to data science.
Use this as your chance to reinvent IT.
Start with this notion
of clickers and coders.
This is a seminal moment.
Much like mobile and cloud was.
So don't be late.
>> And I think it's
critical because it could
be so costly to wait.
And Rob and I were even chatting earlier
how data analytics is just moving
into all different kinds of industries.
And I can tell you even personally
being effected by how
important the analysis is
in working in pediatric cancer
for the last seven years.
I personally implement
virtual reality headsets
to pediatric cancer
hospitals across the country.
And it's great.
And it's working phenomenally.
And the kids are amazed.
And the staff is amazed.
But the phase two of this project
is putting in little
metrics in the hardware
that gather the breathing, the heart rate
to show that we have data.
Proof that we can hand
over to the hospitals
to continue making this program a success.
So just in--
>> That's a great example.
>> An interesting example.
>> Saving lives?
>> Yes.
>> That's also applying a
lot of what we talked about.
>> Exciting stuff in the
world of data science.
>> Yes.
Look, I just add this
is an existential moment
for every organization.
Because what you do in this area
is probably going to
define how competitive
you are going forward.
And think about if you don't do something.
What if one of your competitors goes
and creates an application
that's more engaging with clients?
So my recommendation is start small.
Experiment.
Learn.
Iterate on projects.
Define the business outcomes.
Then scale up.
It's very doable.
But you've got to take the first step.
>> First step always critical.
And now we're going to get to the fun
hands on part of our story.
Because in just a moment we're
going to take a closer look
at what data science can deliver.
And where organizations
are trying to get to.
(upbeat dance music)
All right.
Thank you Rob and now we've
been joined by Siva Anne
who is going to help
us navigate this demo.
First, welcome Siva.
Give him a big round of applause.
(audience clapping)
Yeah.
All right, Rob break down what
we're going to be looking at.
You take over this demo.
>> All right.
So this is going to be pretty interesting.
So Siva is going to take us through.
So he's going to play the
role of a financial adviser.
Who wants to help better serve clients
through recommendations.
And I'm going to really
illustrate three things.
One is how do you federate data
from multiple data sources?
Inside the firewall,
outside the firewall.
How do you apply machine learning
to predict and to automate?
And then how do you move analytics
closer to your data?
So, what you're seeing here
is a custom application
for an investment firm.
So, Siva, our financial adviser, welcome.
So you can see at the top,
we've got market data.
We pulled that from an external source.
And then we've got Siva's
calendar in the middle.
He's got clients on the right side.
So page down,
what else do you see down there Siva?
>> [Siva] I can see
the recent market news.
And in here I can see that JP Morgan
is calling for a US dollar rebound
in the second half of the year.
And, I have upcoming
meeting with Leo Rakes.
I can get--
>> [Rob] So let's go in there.
Why don't you click on Leo Rakes.
So, you're sitting at your desk,
you're deciding how you're
going to spend the day.
You know you have a meeting with Leo.
So you click on it.
You immediately see, all right,
so what do we know about him?
We've got data governance implemented.
So we know his age,
we know his degree.
We can see he's not that
aggressive of a trader.
Only six trades in the last few years.
But then where it gets interesting
is you go to the bottom.
You start to see predicted
industry affinity.
Where did that come from?
How do we have that?
>> [Siva] So these green
lines and red arrows
here indicate the trending
affinity of Leo Rakes
for particular industry stocks.
What we've done here is we've
built machine learning models
using customer's demographic data,
his stock portfolios,
and browsing behavior to build a model
which can predict his affinity
for a particular industry.
>> [Rob] Interesting.
So, I like to think of this,
we call it celebrity experiences.
So how do you treat every
customer like they're a celebrity?
So to some extent, we're reading his mind.
Because without asking him,
we know that he's going to have
an affinity for auto stocks.
So we go down.
Now we look at his portfolio.
You can see okay,
he's got some different holdings.
He's got Amazon, Google, Apple,
and then he's got RACE,
which is the ticker for Ferrari.
You can see that's done incredibly well.
And so, as a financial adviser,
you look at this and you say,
all right, we know he loves auto stocks.
Ferrari's done very well.
Let's create a hedge.
Like what kind of security
would interest him
as a hedge against his
position for Ferrari?
Could we go figure that out?
>> [Siva] Yes.
Given I know that he's gotten
an affinity for auto stocks,
and I also see that Ferrari
has got some terminus gains,
I want to lock in these gains by hedging.
And I want to do that
by picking a auto stock
which has got negative
correlation with Ferrari.
>> [Rob] So this is
where we get to the idea
of in database analytics.
Cause you start clicking that
and immediately we're
getting instant answers
of what's happening.
So what did we find here?
We're going to compare Ferrari and Honda.
>> [Siva] I'm going to
compare Ferrari with Honda.
And what I see here instantly is that
Honda has got a negative
correlation with Ferrari,
which makes it a perfect
mix for his stock portfolio.
Given he has an affinity for auto stocks
and it correlates negatively with Ferrari.
>> [Rob] These are very powerful tools
at the hand of a financial adviser.
You think about it.
As a financial adviser,
you wouldn't think about federating data,
machine learning,
pretty powerful.
>> [Siva] Yes.
So what we have seen here is that using
the common SQL engine,
we've been able to federate queries
across multiple data sources.
Db2 Warehouse in the cloud,
IBM's Integrated Analytic System,
and Hortonworks powered Hadoop
platform for the new speeds.
We've been able to use machine learning
to derive innovative insights about
his stock affinities.
And drive the machine
learning into the appliance.
Closer to where the data resides
to deliver high performance analytics.
>> [Rob] At scale?
>> [Siva] We're able to run
millions of these correlations
across stocks, currency, other factors.
And even score hundreds of customers
for their affinities on a daily basis.
>> That's great.
Siva, thank you for playing
the role of financial adviser.
(audience clapping)
So I just want to recap briefly.
Cause this really powerful technology
that's really simple.
So we federated,
we aggregated multiple data sources
from all over the web
and internal systems.
And public cloud systems.
Machine learning models were built
that predicted Leo's affinity
for a certain industry.
In this case, automotive.
And then you see when you deploy analytics
next to your data,
even a financial adviser,
just with the click of a button
is getting instant answers
so they can go be more
productive in their next meeting.
This whole idea of celebrity
experiences for your customer,
that's available for everybody,
if you take advantage of
these types of capabilities.
Katie, I'll hand it back to you.
>> Good stuff.
Thank you Rob.
Thank you Siva.
Powerful demonstration on
what we've been talking about
all afternoon.
And thank you again to Siva
for helping us navigate.
Should be give him one
more round of applause?
(audience clapping)
We're going to be back in just a moment
to look at how we
operationalize all of this data.
But in first, here's a message from me.
If you're a part of a line of business,
your main fear is disruption.
You know data is the new goal
that can create huge amounts of value.
So does your competition.
And they may be beating you to it.
You're convinced there
are new business models
and revenue sources
hidden in all the data.
You just need to figure
out how to leverage it.
But with the scarcity of data scientists,
you really can't rely solely on them.
You may need more people
throughout the organization
that have the ability to
extract value from data.
And as a data science
leader or data scientist,
you have a lot of the same concerns.
You spend way too much time looking for,
prepping, and interpreting data
and waiting for models to train.
You know you need to
operationalize the work you do
to provide business value faster.
What you want is an easier
way to do data prep.
And rapidly build models
that can be easily deployed,
monitored and automatically updated.
So whether you're a data scientist,
data science leader,
or in a line of business,
what's the solution?
What'll it take to
transform the way you work?
That's what we're going to explore next.
All right, now it's time to delve deeper
into the nuts and bolts.
The nitty gritty of
operationalizing data science
and creating a data driven culture.
How do you actually do that?
Well that's what these experts
are here to share with us.
I'm joined by Nir Kaldero,
who's head of data science at Galvanize,
which is an education and
training organization.
Tricia Wang, who is
co-founder of Sudden Compass,
a consultancy that helps
companies understand
people with data.
And last, but certainly not least,
Michael Li, founder and CEO
of Data Incubator, which is
a data science train company.
All right guys.
Shall we get right to it?
>> All right.
>> So data explosion happening right now.
And we are seeing it across the board.
I just shared an example
of how it's impacting
my philanthropic work
in pediatric cancer.
But you guys each have
so many unique roles
in your business life.
How are you seeing it just
blow up in your fields?
Nir, your thing?
>> Yeah, for example like in Galvanize
we train many Fortune 500 companies.
And just by looking at the demand
of companies that wants us to help them
go through this digital transformation
is mind-blowing.
Data point by itself.
>> Okay.
Well what we're seeing what's
going on is that data science
like as a theme, is that it's
actually for everyone now.
But what's happening is that
it's actually meeting
non technical people.
But what we're seeing is that
when non technical people
are implementing these tools
or coming at these tools
without a base line of data literacy,
they're often times using it in ways
that distance themselves
from the customer.
Because they're implementing
data science tools
without a clear purpose,
without a clear problem.
And so what we do at Sudden Compass
is that we work with companies
to help them embrace and understand
the complexity of their customers.
Because often times they
are misusing data science
to try and flatten their
understanding of the customer.
As if you can just do more
traditional marketing.
Where you're putting people into boxes.
And I think the whole ROI of data
is that you can now understand
people's relationships
at a much more complex level
at a greater scale before.
But we have to do this
with basic data literacy.
And this has to involve technical
and non technical people.
>> Well you can have all
the data in the world,
and I think it speaks to,
if you're not doing the
proper movement with it,
forget it.
It means nothing at the same time.
>> No absolutely.
I mean, I think that when you look
at the huge explosion in data,
that comes with it a huge
explosion in data experts.
Right, we call them data scientists,
data analysts.
And sometimes they're people
who are very, very talented,
like the people here.
But sometimes you have people
who are maybe re-branding
themselves, right?
Trying to move up their title one notch
to try to attract that higher salary.
And I think that that's one of the things
that customers are
coming to us for, right?
They're saying, hey look,
there are a lot of people
that call themselves data scientists,
but we can't really distinguish.
So, we have sort of run a fellowship
where you help companies hire
from a really talented group of folks,
who are also truly data scientists
and who know all those kind
of really important data science tools.
And we also help companies internally.
Fortune 500 companies
who are looking to grow
that data science practice that they have.
And we help clients like McKinsey,
BCG, Bain, train up their customers,
also their clients,
also their workers to
be more data talented.
And to build up that data
science capabilities.
>> And Nir, this is something
you work with a lot.
A lot of Fortune 500 companies.
And when we were speaking earlier,
you were saying many of these companies
can be in a panic.
>> Yeah.
>> Explain that.
>> Yeah, so you know, not
all Fortune 500 companies
are fully data driven.
And we know that the winners in this
fourth industrial revolution,
which I like to call the
machine intelligence revolution,
will be companies who
navigate and transform
their organization to unlock
the power of data science
and machine learning.
And the companies that are not like that.
Or not utilize data science
and predictive power well,
will pretty much get shredded.
So they are in a panic.
>> Tricia, companies have to deal
with data behind the firewall and
in the new multi cloud world.
How do organizations start
to become driven right to the core?
>> I think the most urgent
question to become data driven
that companies should be asking
is how do I bring the complex reality
that our customers are experiencing
on the ground in to a corporate office?
Into the data models.
So that question is critical
because that's how you actually prevent
any big data disasters.
And that's how you leverage big data.
Because when your data
models are really far
from your human models,
that's when you're going to do things
that are really far off from how,
it's going to not feel right.
That's when Tesco had their
terrible big data disaster
that they're still recovering from.
And so that's why I think
it's really important
to understand that when
you implement big data,
you have to further embrace thick data.
The qualitative,
the emotional stuff,
that is difficult to quantify.
But then comes the
difficult art and science
that I think is the next
level of data science.
Which is that getting non
technical and technical people
together to ask how do we
find those unknown nuggets
of insights that are
difficult to quantify?
Then, how do we do the next step
of figuring out how do
you mathematically scale
those insights into a data model?
So that actually is reflective
of human understanding?
And then we can start
making decisions at scale.
But you have to have that first.
>> That's absolutely right.
And I think that when we
think about what it means
to be a data scientist, right?
I always think about it in
these sort of three pillars.
You have the math side.
You have to have that kind of stats,
hardcore machine learning background.
You have the programming side.
You don't work with small amounts of data.
You work with large amounts of data.
You've got to be able to type the code
to make those computers run.
But then the last part
is that human element.
You have to understand
the domain expertise.
You have to understand what it is
that I'm actually analyzing.
What's the business proposition?
And how are the clients,
how are the users actually
interacting with the system?
That human element that
you were talking about.
And I think having
somebody who understands
all of those and not just in isolation,
but is able to marry that understanding
across those different topics,
that's what makes a data scientist.
>> But I find that we don't have people
with those skill sets.
And right now the way I
see teams being set up
inside companies is that
they're creating these isolated
data unicorns.
These data scientists that have graduated
from your programs,
which are great.
But, they don't involve the people
who are the domain experts.
They don't involve the designers,
the consumer insight people,
the people,
the salespeople.
The people who spend
time with the customers
day in and day out.
Somehow they're left out of the room.
They're consulted,
but they're not a stakeholder.
>> Can I actually
>> Yeah, yeah please.
>> Can I actually give a quick example?
So for example, we at
Galvanize train the executives
and the managers.
And then the technical people,
the data scientists and the analysts.
But in order to actually see
all of the RY behind the data,
you also have to have a
creative fluid conversation
between non technical
and technical people.
And this is a major trend now.
And there's a major gap.
And we need to increase awareness
and kind of like create a new,
kind of like environment
where technical people
also talks seamlessly
with non technical ones.
>> [Tricia] We call--
>> That's one of the
things that we see a lot.
Is one of the trends in--
>> A major trend.
>> data science training
is it's not just for
the data science technical experts.
It's not just for one type of person.
So a lot of the training we do is
sort of data engineers.
People who are more on the
software engineering side
learning more about the stats of math.
And then people who are
sort of traditionally
on the stat side learning
more about the engineering.
And then managers and
people who are data analysts
learning about both.
>> Michael, I think you said something
that was of interest too
because I think we can look
at IBM Watson as an example.
And working in healthcare.
The human component.
Because often times we talk
about machine learning and AI,
and data and you get worried
that you still need that human component.
Especially in the world of healthcare.
And I think that's a very strong point
when it comes to the data analysis side.
Is there any particular example
you can speak to of that?
>> So I think that there was
this really excellent paper
a while ago talking about
all the neuro net stuff
and trained on
textual data.
So looking at sort of different corpuses.
And they found that
these models were highly,
highly sexist.
They would read these corpuses
and it's not because neuro
nets themselves are sexist.
It's because they're reading
the things that we write.
And it turns out that we
write kind of sexist things.
And they would sort of find
all these patterns in there
that were sort of latent,
that had a lot of sort of things
that maybe we would cringe
at if we sort of saw.
And I think that's one
of the really important
aspects of the human element, right?
It's being able to come
in and sort of say like,
okay, I know what the
biases of the system are,
I know what the biases of the tools are.
I need to figure out how to
use that to make the tools,
make the world a better place.
And like another area where this comes up
all the time is lending, right?
So the federal government has said,
and we have a lot of clients
in the financial services space,
so they're constantly
under these kind of rules
that they can't make
discriminatory lending practices
based on a whole set of
protected categories.
Race, sex,
gender, things like that.
But, it's very easy when you train a model
on credit scores to pick that up.
And then to have a model
that's inadvertently
sexist or racist.
And that's where you
need the human element
to come back in and say okay, look,
you're using the classic
example would be zip code,
you're using zip code as a variable.
But when you look at it,
zip codes actually highly
correlated with race.
And you can't do that.
So you may inadvertently by
sort of following the math
and being a little
naive about the problem,
inadvertently introduce
something really horrible
into a model and that's where
you need a human element
to sort of step in and say, okay hold on.
Slow things down.
This isn't the right way to go.
>> And the people who have --
>> I feel like,
I can feel her ready to respond.
>> Yes, I'm ready.
>> She's like let me have at it.
>> And the people here it is.
And the people who are really great
at providing that human intelligence
are social scientists.
We are trained to look for bias
and to understand bias in data.
Whether it's quantitative or qualitative.
And I really think that
we're going to have less
of these kind of problems if
we had more integrated teams.
If it was a mandate from leadership
to say no data science
team should be without
a social scientist, ethnographer,
or qualitative researcher of some kind,
to be able to help see these biases.
>> The talent piece is
actually the most crucial--
>> Yeah.
>> one here.
If you look about how to
enable machine intelligence
in organization there are the pillars
that I have in my head
which is the culture,
the talent and the
technology infrastructure.
And I believe and I saw
in working very closely with
the Fortune 100 and 200 companies
that the talent piece is
actually the most important
crucial hard to get.
>> [Tricia] I totally agree.
>> It's absolutely true.
Yeah, no I mean I think that's
sort of like how we came
up with our business model.
Companies were basically saying hey,
I can't hire data scientists.
And so we have a fellowship where we
get 2,000 applicants each quarter.
We take the top 2% and then
we sort of train them up.
And we work with hiring
companies who then want
to hire from that population.
And so we're sort of helping
them solve that problem.
And the other half of it
is really around training.
Cause with a lot of industries,
especially if you're sort of
in a more regulated industry,
there's a lot of nuances
to what you're doing.
And the fastest way to develop
that data science or AI talent
may not necessarily be to
hire folks who are coming
out of a PhD program.
It may be to take folks
internally who have a lot
of that domain knowledge that you have
and get them trained up on
those data science techniques.
So we've had large insurance
companies come to us
and say hey look, we
hire three or four folks
from you a quarter.
That doesn't move the needle for us.
What we really need is
take the thousand actuaries
and statisticians that we have
and get all of them trained
up to become a data scientist
and become data literate in
this new open source world.
>> [Katie] Go ahead.
>> All right, ladies first.
>> Go ahead.
>> Are you sure?
>> No please, fight first.
(laughing)
>> Go ahead.
>> Go ahead Nir.
>> So this is actually a
trend that we have been seeing
in the past year or so that
companies kind of like start
to look how to upscale
and look for talent within
the organization.
So they can actually move
them to become more literate
and navigate 'em from
analyst to data scientist.
And from data scientist
to machine learner.
So this is actually a
trend that is happening
already for a year or so.
>> Yeah, but I also find that
after they've gone through that training
in getting people skilled
up in data science,
the next problem that I get
is executives coming to say
we've invested in all of this.
We're still not moving the needle.
We've already invested in the right tools.
We've gotten the right skills.
We have enough scale of
people who have these skills.
Why are we not moving the needle?
And what I explain to them is look,
you're still making
decisions in the same way.
And you're still not involving enough
of the non technical people.
Especially from marketing,
which is now,
the CMO's are much more responsible
for driving growth in their companies now.
But often times it's so hard
to change the old way of marketing,
which is still like very segmentation.
You know, demographic variable based,
and we're trying to move people to say
no, you have to understand
the complexity of customers
and not put them in boxes.
>> And I think underlying
a lot of this discussion
is this question of culture, right?
>> Yes.
>> Absolutely.
>> How do you build a data driven culture?
And I think that that culture question,
one of the ways that
comes up quite often in
especially in large,
Fortune 500 enterprises,
is that they are very,
they're not very comfortable
with sort of example,
open source architecture.
Open source tools.
And there is some sort of residual bias
that that's somehow dangerous.
So security vulnerability.
And I think that that's part
of the cultural challenge
that they often have in
terms of how do I build
a more data driven organization?
Well a lot of the talent really wants
to use these kind of tools.
And I mean, just to give you an example,
we are partnering with one
of the major cloud providers
to sort of help make open
source tools more user friendly
on their platform.
So trying to help them attract the best
technologists to use their platform
because they want and
they understand the value
of having that kind of
open source technology
work seamlessly on their platforms.
So I think that just
sort of goes to show you
how important open source
is in this movement.
And how much large companies
and Fortune 500 companies
and a lot of the ones we work
with have to embrace that.
>> Yeah, and I'm seeing it in our work.
Even when we're working
with Fortune 500 companies,
is that they've already
gone through the first phase
of data science work.
Where I explain it was all about the tools
and getting the right tools
and architecture in place.
And then companies started moving
into getting the right skill set in place.
Getting the right talent.
And what you're talking about with culture
is really where I think we're talking
about the third phase of data science,
which is looking at communication
of these technical frameworks
so that we can get non technical people
really comfortable in the same
room with data scientists.
That is going to be the phase,
that's really where I see the pain point.
And that's why at Sudden Compass,
we're really dedicated to
working with each other
to figure out how do we
solve this problem now?
>> And I think that communication between
the technical stakeholders
and management and leadership.
That's a very critical piece of this.
You can't have a successful
data science organization
without that.
>> Absolutely.
>> And I think that actually some of
the most popular trainings
we've had recently are
from managers and executives
who are looking to say,
how do I become more data savvy?
How do I figure out what
is this data science thing
and how do I communicate
with my data scientists?
>> You guys made this way too easy.
I was just going to get some popcorn
and watch it play out.
(laughing)
>> Nir, last 30 seconds.
I want to leave you
with an opportunity to,
anything you want to add
to this conversation?
>> I think one thing to conclude is to say
that companies that are not data driven
is about time to hit refresh
and figure how they transition
the organization to become data driven.
To become agile and nimble
so they can actually
see what opportunities
from this important industrial revolution.
Otherwise, unfortunately
they will have hard time to survive.
>> [Katie] All agreed?
>> [Tricia] Absolutely, you're right.
>> Michael, Trish, Nir, thank you so much.
Fascinating discussion.
And thank you guys again for joining us.
We will be right back
with another great demo.
Right after this.
>> Thank you Katie.
(audience clapping)
(upbeat dance music)
>> Once again, thank you
for an excellent discussion.
Weren't they great guys?
(audience clapping and cheering)
And thank you for everyone who's tuning in
on the live webcast.
As you can hear,
we have an amazing studio audience here.
(audience clapping and cheering)
And we're going to keep things moving.
I'm now joined by Daniel
Hernandez and Siva Anne.
And we're going to turn our
attention to how you can deliver
on what they're talking about
using data science experience
to do data science faster.
>> Thank you Katie.
Siva and I are going to
spend the next 10 minutes
showing you how you can deliver
on what they were saying
using the IBM Data Science Experience
to do data science faster.
We'll demonstrate through new features
we introduced this week
how teams can work
together more effectively
across the entire analytics life cycle.
How you can take advantage
of any and all data
no matter where it is and what it is.
How you could use your favorite
tools from open source.
And finally how you could
build models anywhere
and employ them close
to where your data is.
Remember the financial
adviser app Rob showed you?
To build an app like that,
we needed a team of data scientists,
developers,
data engineers,
and IT staff to collaborate.
We do this in the Data Science Experience
through a concept we call projects.
When I create a new project,
I can now use the new
Github integration feature.
We're doing for data science
what we've been doing
for developers for years.
Distributed teams can work
together on analytics projects.
And take advantage of
Github's version management
and change management features.
This is a huge deal.
Let's explore the project we created
for the financial adviser app.
As you can see,
our data engineer Joane,
our developer Rob,
and others are collaborating this project.
Joane got things started
by bringing together
the trusted data sources
we need to build the app.
Taking a closer look at the data,
we see that our customer and profile data
is stored on our recently announced
IBM Integrated Analytics System,
which runs safely behind our firewall.
We also needed macro economic data,
which she was able to find
in the Federal Reserve.
And she stored it in our
Db2 Warehouse on Cloud.
And finally, she selected stock news data
from NASDAQ.com and landed
that in a Hadoop cluster,
which happens to be
powered by Hortonworks.
We added a new feature to
the Data Science Experience
so that when it's
installed with Hortonworks,
it automatically uses a need
of security and governance
controls within the cluster
so your data is always secure and safe.
Now we want to show you the news data
we stored in the Hortonworks cluster.
This is the mean administrative console.
It's powered by an open
source project called Ambari.
And here's the news data.
It's in parquet files stored in HDFS,
which happens to be a
distributive file system.
To get the data from
NASDAQ into our cluster,
we used IBM's BigIntegrate and BigQuality
to create automatic data pipelines
that acquire, cleanse, and ingest
that news data.
Once the data's available,
we use IBM's Big SQL to query that data
using SQL statements
that are much like the ones we would use
for any relation of data,
including the data that we have
in the Integrated Analytics System
and Db2 Warehouse on Cloud.
This and the federation
capabilities that Big SQL offers
dramatically simplifies data acquisition.
Now we want to show you how
we support a brand new tool
that we're excited about.
Since we launched last summer,
the Data Science Experience
has supported Jupyter and R
for data analysis and visualization.
In this week's update,
we deeply integrated another
great open source project
called Apache Zeppelin.
It's known for having great
visualization support,
advanced collaboration features,
and is growing in popularity amongst
the data science community.
This is an example of Apache Zeppelin
and the notebook we created through it
to explore some of our data.
Notice how wonderful and easy
the data visualizations are.
Now we want to walk you
through the Jupyter notebook
we created to explore our
customer preference for stocks.
We use notebooks to
understand and explore data.
To identify the features that
have some predictive power.
Ultimately, we're trying to assess
what ultimately is driving
customer stock preference.
Here we did the analysis
to identify the attributes
of customers that are likely
to purchase auto stocks.
We used this understanding
to build our machine learning model.
For building machine learning models,
we've always had tools integrated into
the Data Science Experience.
But sometimes you need to use
tools you already invested in.
Like our very own SPSS as well as SAS.
Through new import feature,
you can easily import those models
created with those tools.
This helps you avoid vendor lock-in,
and simplify the development,
training, deployment, and
management of all your models.
To build the models we used in app,
we could have coded,
but we prefer a visual experience.
We used our customer profile data
in the Integrated Analytic System.
Used the Auto Data Preparation
to cleanse our data.
Choose the binary
classification algorithms.
Let the Data Science Experience evaluate
between logistic regression
and gradient boosted tree.
It's doing the heavy work for us.
As you can see here,
the Data Science Experience
generated performance metrics
that show us that the
gradient boosted tree
is the best performing algorithm
for the data we gave it.
Once we save this model,
it's automatically deployed and available
for developers to use.
Any application developer
can take this endpoint and
consume it like they would
any other API inside
of the apps they built.
We've made training and
creating machine learning models
super simple.
But what about the operations?
A lot of companies are struggling
to ensure their model performance
remains high over time.
In our financial adviser app,
we know that customer
data changes constantly,
so we need to always
monitor model performance
and ensure that our models are retrained
as is necessary.
This is a dashboard that
shows the performance
of our models and lets our teams monitor
and retrain those models
so that they're always
performing to our standards.
So far we've been showing you
the Data Science Experience
available behind the firewall
that we're using to
build and train models.
Through a new publish feature,
you can build models and
deploy them anywhere.
In another environment,
private,
public,
or anywhere else with just a few clicks.
So here we're publishing our model
to the Watson machine learning service.
It happens to be in the IBM cloud.
And also deeply integrated with
our Data Science Experience.
After publishing and switching
to the Watson machine learning service,
you can see that our
stock affinity and model
that we just published is
there and ready for use.
So this is incredibly important.
I just want to say it again.
The Data Science Experience
allows you to train models
behind your own firewall,
take advantage of your
proprietary and sensitive data,
and then deploy those models
wherever you want with ease.
So summarize what we just showed you.
First, IBM's Data Science
Experience supports all teams.
You saw how our data engineer
populated our project
with trusted data sets.
Our data scientists developed,
trained, and tested a
machine learning model.
Our developers used APIs
to integrate machine
learning into their apps.
And how IT can use our Integrated
Model Management dashboard
to monitor and manage model performance.
Second, we support all data.
On premises,
in the cloud,
structured,
unstructured,
inside of your firewall,
and outside of it.
We help you bring analytics and governance
to where your data is.
Third, we support all tools.
The data science tools that you depend on
are readily available
and deeply integrated.
This includes capabilities
from great partners
like Hortonworks.
And powerful tools like
our very own IBM SPSS.
And fourth, and finally,
we support all deployments.
You can build your models anywhere,
and deploy them right next
to where your data is.
Whether that's in the public cloud,
private cloud,
or even on the world's most
reliable transaction platform,
IBM z.
So see for yourself.
Go to the Data Science Experience website,
take us for a spin.
And if you happen to be ready right now,
our recently created
Data Science Elite Team
can help you get started
and run experiments
alongside you with no charge.
Thank you very much.
(audience clapping)
>> Thank you very much Daniel.
It seems like a great time to get started.
And thanks to Siva for
taking us through it.
Rob and I will be back in just a moment
to add some perspective right after this.
(upbeat dance music)
All right, once again
joined by Rob Thomas.
And Rob obviously we got
a lot of information here.
>> Yes, we've covered a lot of ground.
>> This is intense.
You got to break it down for me cause
I think we zoom out and
see the big picture.
What better data science
can deliver to a business?
Why is this so important?
I mean we've heard it through and through.
>> Yeah, well, I heard it a couple times.
But it starts with
businesses have to embrace
a data driven culture.
And it is a change.
And we need to make data accessible
with the right tools
in a collaborative culture
because we've got diverse skill sets
in every organization.
But data driven companies succeed
when data science tools are
in the hands of everyone.
And I think that's a new thought.
I think most companies think
just get your data scientist
some tools, you'll be fine.
This is about tools in
the hands of everyone.
I think the panel did a
great job of describing about
how we get to data science for all.
Building a data culture,
making it a part of your
everyday operations,
and the highlights of what
Daniel just showed us,
that's some pretty cool features
for how organizations can get to this,
which is you can see IBM's
Data Science Experience,
how that supports all teams.
You saw data analysts,
data scientists,
application developer,
IT staff,
all working together.
Second, you saw how we support all tools.
And your choice of tools.
So the most popular data science libraries
integrated into one platform.
And we saw some new capabilities
that help companies avoid lock-in,
where you can import existing models
created from specialist tools
like SPSS or others.
And then deploy them and manage them
inside of Data Science Experience.
That's pretty interesting.
And lastly, you see we
continue to build on this
best of open tools.
Partnering with companies like H2O,
Hortonworks, and others.
Third, you can see how you use all data
no matter where it lives.
That's a key challenge every
organization's going to face.
Private,
public,
federating all data sources.
We announced new integration
with the Hortonworks data platform
where we deploy machine learning models
where your data resides.
That's been a key theme.
Analytics where the data is.
And lastly, supporting
all types of deployments.
Deploy them in your Hadoop cluster.
Deploy them in your
Integrated Analytic System.
Or deploy them in z,
just to name a few.
A lot of different options here.
But look, don't believe anything I say.
Go try it for yourself.
Data Science Experience,
anybody can use it.
Go to datascience.ibm.com and look,
if you want to start right now,
we just created a team that we call
Data Science Elite.
These are the best data
scientists in the world
that will come sit down with you
and co-create solutions, models,
and prove out a proof of concept.
>> Good stuff.
Thank you Rob.
So you might be asking what
does an organization look like
that embraces data science for all?
And how could it transform your role?
I'm going to head back to
the office and check it out.
Let's start with the perspective
of the line of business.
What's changed?
Well, now you're starting to
explore new business models.
You've uncovered opportunities
for new revenue sources
and all that hidden data.
And being disrupted is no
longer keeping you up at night.
As a data science leader,
you're beginning to collaborate
with a line of business
to better understand and
translate the objectives
into the models that are being built.
Your data scientists are
also starting to collaborate
with the less technical
team members and analysts
who are working closest
to the business problem.
And as a data scientist,
you stop feeling like
you're falling behind.
Open source tools are keeping you current.
You're also starting to
operationalize the work that you do.
And you get to do more of what you love.
Explore data,
build models,
put your models into production,
and create business impact.
All in all,
it's not a bad scenario.
(audience clapping)
Thanks.
All right.
We are back and coming up next,
oh this is a special time right now.
Cause we got a great guest speaker.
New York Magazine called
him the spreadsheet psychic
and number crunching prodigy who went
from correctly forecasting baseball games
to correctly forecasting
presidential elections.
He even invented a
proprietary algorithm called
PECOTA for predicting future performance
by baseball players and teams.
And his New York Times bestselling book,
The Signal and the Noise
was named by Amazon.com
as the number one best
non-fiction book of 2012.
He's currently the Editor in Chief
of the award winning
website, FiveThirtyEight
and appears on ESPN as
an on air commentator.
Big round of applause.
My pleasure to welcome Nate Silver.
(audience clapping)
>> Thank you.
We met backstage.
>> Yes.
>> It feels weird to re-shake your hand,
but you know, for the audience.
>> I had to give the intense firm grip.
>> Definitely.
>> The ninja grip.
So you and I have crossed paths kind
of digitally in the past,
which it really interesting,
is I started my career at ESPN.
And I started as a production assistant,
then later back on air
for sports technology.
And I go to you to talk
about sports because--
>> Yeah.
>> Wow, has ESPN upped their game in terms
of understanding the importance
of data and analytics.
And what it brings.
Not just to MLB,
but across the board.
>> No, it's really infused
into the way they present
the broadcast.
You'll have win probability
on the bottom line.
And they'll incorporate
FiveThirtyEight metrics
into how they cover college
football for example.
So, ESPN ...
Sports is maybe the perfect,
if you're a data scientist,
like the perfect kind of test case.
And the reason being that sports consists
of problems that have rules.
And have structure.
And when problems have
rules and structure,
then it's a lot easier to work with.
So it's a great way to
kind of improve your skills
as a data scientist.
Of course, there are also
important real world problems
that are more open ended,
and those present different
types of challenges.
But it's such a natural fit.
The teams.
Think about the teams playing
the World Series tonight.
The Dodgers and the Astros are
both like very data driven,
especially Houston.
Golden State Warriors, the NBA Champions,
extremely data driven.
New England Patriots,
relative to an NFL team,
it's shifted a little bit,
the NFL bar is lower.
But the Patriots are
certainly very analytical
in how they make decisions.
So, you can't talk about sports
without talking about analytics.
>> And I was going to save the
baseball question for later.
Cause we are moments away from game seven.
>> Yeah.
>> Is everyone else watching game seven?
It's been an incredible series.
Probably one of the best of all time.
>> Yeah, I mean--
>> You have a prediction here?
>> You can mention that too.
So I don't have a prediction.
FiveThirtyEight has the Dodgers
with a 60% chance of winning.
(audience clapping)
>> [Katie] LA Fans.
(audience booing)
(laughing)
>> So you have two teams
that are about equal.
But the Dodgers pitching
staff is in better shape
at the moment.
The end of a seven game series.
And they're at home.
>> But the statistics behind the two teams
is pretty incredible.
>> Yeah.
It's like the first World
Series in I think 56 years
or something where you have
two 100 win teams facing one another.
There have been a lot
of parity in baseball
for a lot of years.
Not that many offensive
overall juggernauts.
But this year,
and last year with the Cubs
and the Indians too really.
But this year,
you have really spectacular
teams in the World Series.
It kind of is a showcase
of modern baseball.
Lots of home runs.
Lots of strikeouts.
>> [Katie] Lots of extra innings.
>> Lots of extra innings.
Good defense.
Lots of pitching changes.
So if you love the modern baseball game,
it's been about the best
example that you've had.
If you like a little bit more contact,
and fewer strikeouts,
maybe not so much.
But it's been a spectacular
and very exciting World Series.
It's amazing to talk.
MLB is huge with analysis.
I mean, hands down.
But across the board,
if you can provide a few examples.
Because there's so many
teams in front offices
putting such an,
just a heavy intensity
on the analysis side.
And where the teams are going.
And if you could provide
any specific examples
of teams that have really blown your mind.
Especially over the last year or two.
Because every year it gets
more exciting if you will.
I mean, so a big thing in
baseball is defensive shifts.
So if you watch tonight,
you'll probably see a couple of plays
where if you're used to watching baseball,
a guy makes really solid contact.
And there's a fielder there
that you don't think should be there.
But that's really very data
driven where you analyze
where's this guy hit the ball.
That part's not so hard.
But also there's game theory involved.
Because you have to adjust
for the fact that he knows
where you're positioning the defenders.
He's trying therefore to make
adjustments to his own swing
and so that's been a major innovation
in how baseball is played.
You know, how bullpens are used too.
Where teams have realized
that actually having a guy,
across all sports pretty much,
realizing the importance of rest.
And of fatigue.
And that you can be the
best pitcher in the world,
but guess what?
After four or five innings,
you're probably not as good as a guy
who has a fresh arm necessarily.
So I mean, it really is like,
these are not subtle things anymore.
It's not just oh, on base
percentage is valuable.
It really effects kind of every
strategic decision in baseball.
The NBA, if you watch an NBA game tonight,
see how many three point shots are taken.
That's in part because of data.
And teams realizing hey,
three points is worth more than two,
once you're more than about
five feet from the basket,
the shooting percentage gets really flat.
And so it's revolutionary, right?
Like teams that will shoot
almost half their shots
from the three point range nowadays.
Larry Bird, who wound up being one
of the greatest three
point shooters of all time,
took only eight three pointers
his first year in the NBA.
It's quite noticeable if you
watch baseball or basketball
in particular.
>> Not to focus too much on sports.
One final question.
In terms of Major League Soccer,
and now in NFL,
we're having the analysis
and having wearables
where it can now showcase
if they wanted to on screen,
heart rate and breathing
and how much exertion.
How much data is too much data?
And when does it ruin the sport?
>> So, I don't think, I mean, again,
it goes sport by sport a little bit.
I think in basketball you actually have
a more exciting game.
I think the game is more open now.
You have more three pointers.
You have guys getting
higher assist totals.
But you know, I don't know.
I'm not one of those
people who thinks look,
if you love baseball or basketball,
and you go in to work for the Astros,
the Yankees or the Knicks,
they probably need some help, right?
(laughing)
You really have to be
passionate about that sport.
Because it's all based on
what questions am I asking?
As I'm a fan or I guess
an employee of the team.
Or a player watching the game.
And there isn't really any
substitute I don't think
for the insight and intuition
that a curious human has
to kind of ask the right questions.
So we can talk at great
length about what tools
do you then apply when
you have those questions,
but that still comes from people.
I don't think machine
learning could help with
what questions do I
want to ask of the data.
It might help you get the answers.
>> If you have a mid-fielder
in a soccer game though,
not exerting, only 80%,
and you're seeing that
on a screen as a fan,
and you're saying could
that person get fired
at the end of the day?
One day, with the data?
>> So we found that actually
some in soccer in particular,
some of the better players
are actually more still.
So Leo Messi, maybe the
best player in the world,
doesn't move as much as
other soccer players do.
And the reason being that
A) he kind of knows
how to position himself
in the first place.
B) he realizes that you make a run,
and you're out of position.
That's quite fatiguing.
And particularly soccer, like basketball,
is a sport where it's
incredibly fatiguing.
And so, sometimes the guys
who conserve their energy,
that kind of old school mentality,
you have to hustle at every moment.
That is not helpful to the team
if you're hustling on an irrelevant play.
And therefore, on a critical play,
can't get back on defense, for example.
>> Sports, but also data
is moving exponentially
as we're just speaking about today.
Tech, healthcare, every
different industry.
Is there any particular that's
a favorite of yours to cover?
And I imagine they're
all different as well.
>> I mean, I do like sports.
We cover a lot of politics too.
Which is different.
I mean in politics I think
people aren't intuitively
as data driven as they might
be in sports for example.
It's impressive to
follow the breakthroughs
in artificial intelligence.
It started out just as
kind of playing games
and playing chess and poker
and Go and things like that.
But you really have seen a lot
of breakthroughs in the
last couple of years.
But yeah, it's kind of infused
into everything really.
>> You're known for your
work in politics though.
Especially presidential campaigns.
>> Yeah.
>> This year, in particular.
Was it insanely challenging?
What was the most notable thing
that came out of any of your predictions?
>> I mean, in some ways,
looking at the polling was the
easiest lens to look at it.
So I think there's kind of a
myth that last year's result
was a big shock and it wasn't really.
If you did the modeling in the right way,
then you realized that number one,
polls have a margin of error.
And so when a candidate
has a three point lead,
that's not particularly safe.
Number two, the outcome between
different states is correlated.
Meaning that it's not
that much of a surprise
that Clinton lost Wisconsin and Michigan
and Pennsylvania and Ohio.
You know I'm from Michigan.
Have friends from all those states.
Kind of the same types of
people in those states.
Those outcomes are all correlated.
So what people thought was
a big upset for the polls
I think was an example of how data science
done carefully and correctly
where you understand probabilities,
understand correlations.
Our model gave Trump a
30% chance of winning.
Others models gave him a 1% chance.
And so that was interesting
in that it showed
that number one,
that modeling strategies
and skill do matter
quite a lot.
When you have someone
saying 30% versus 1%.
I mean, that's a very very big spread.
And number two,
that these aren't like
solved problems necessarily.
Although again, the
problem with elections is
that you only have one
election every four years.
So I can be very confident
that I have a better model.
Even one year of data doesn't
really prove very much.
Even five or 10 years doesn't
really prove very much.
And so, being aware of the
limitations to some extent
intrinsically in elections
when you only get one kind
of new training example every four years,
there's not really any way around that.
There are ways to be more robust
to sparce data environments.
But if you're identifying different types
of business problems to solve,
figuring out what's a solvable problem
where I can add value with data science
is a really key part of what you're doing.
>> You're such a leader in this space.
In data and analysis.
It would be interesting to
kind of peek back the curtain,
understand how you operate but
also how large is your team?
How you're putting together information.
How quickly you're putting it out.
Cause I think in this right now world
where everybody wants things instantly--
>> Yeah.
>> There's also, you want to be first too
in the world of journalism.
But you don't want to be inaccurate
because that's your credibility.
>> We talked about this before, right?
I think on average,
speed is a little bit overrated
in journalism.
>> [Katie] I think it's a
big problem in journalism.
>> Yeah.
>> Especially in the tech world.
You have to be first.
You have to be first.
And it's just pumping out, pumping out.
And there's got to be
more time spent on stories
if I can speak subjectively.
>> Yeah, for sure.
But at the same time,
we are reacting to the news.
And so we have people that come in,
we hire most of our people
actually from journalism.
>> [Katie] How many people
do you have on your team?
>> About 35.
But, if you get someone who comes
in from an academic track for example,
they might be surprised
at how fast journalism is.
That even though we might be slower
than the average website,
the fact that there's a
tragic event in New York,
are there things we
have to say about that?
A candidate drops out of
the presidential race,
are things we have to say about that.
In periods ranging from minutes to days
as opposed to kind of
weeks to months to years
in the academic world.
The corporate world moves faster.
What is a little
different about journalism
is that you are expected
to have more precision
where people notice
when you make a mistake.
In corporations, you have
maybe less transparency.
If you make 10 investments and
seven of them turn out well,
then you'll get a lot of
profit from that, right?
In journalism, it's a little different.
If you make kind of seven
predictions or say seven things,
and seven of them are very accurate
and three of them aren't,
you'll still get criticized
a lot for the three.
Just because that's kind of
the way that journalism is.
And so the kind of combination of needing,
not having that much
tolerance for mistakes,
but also needing to be fast.
That is tricky.
And I criticize other
journalists sometimes
including for not being
data driven enough,
but the best excuse any journalist has,
this is happening really
fast and it's my job
to kind of figure out in real time
what's going on and
provide useful information
to the readers.
And that's really difficult.
Especially in a world where literally,
I'll probably get off the
stage and check my phone
and who knows what President
Trump will have tweeted
or what things will have happened.
But it really is a kind of 24/7.
>> Well because it's 24/7
with FiveThirtyEight,
one of the most well known sites for data,
are you feeling
micromanagey on your people?
Because you do have to hit this balance.
You can't have something come
out four or five days later.
>> Yeah, I'm not --
>> Are you overseeing everything?
>> I'm not by nature a micromanager.
And so you try to hire well.
You try and let people make mistakes.
And the flip side of this is that
if a news organization that
never had any mistakes,
never had any corrections,
that's raw, right?
You have to have some tolerance for error
because you are trying to
decide things in real time.
And figure things out.
I think transparency's a big part of that.
Say here's what we think,
and here's why we think it.
If we have a model to say it's
not just the final number,
here's a lot of detail
about how that's calculated.
In some case we release the code
and the raw data.
Sometimes we don't because
there's a proprietary advantage.
But quite often we're saying
we want you to trust us
and it's so important that you trust us,
here's the model.
Go play around with it yourself.
Here's the data.
And that's also I think
an important value.
>> That speaks to open source.
And your perspective on that in general.
>> Yeah, I mean, look,
I'm a big fan of open source.
I worry that I think sometimes the trends
are a little bit away from open source.
But by the way,
one thing that happens
when you share your data
or you share your thinking
at least in lieu of the data,
and you can definitely do both is that
readers will catch embarrassing
mistakes that you made.
By the way, even having open
sourceness within your team,
I mean we have editors and copy editors
who often save you from
really embarrassing mistakes.
And by the way, it's
not necessarily people
who have a training in data science.
I would guess that of our 35 people,
maybe only five to 10 have
a kind of formal background
in what you would call data science.
>> [Katie] I think that
speaks to the theme here.
>> Yeah.
>> [Katie] That everybody's
kind of got to be data literate.
>> But yeah, it is like
you have a good intuition.
You have a good BS detector basically.
And you have a good intuition for
hey, this looks a little
bit out of line to me.
And sometimes that can be based
on domain knowledge, right?
We have one of our copy editors,
she's a big college football fan.
And we had an algorithm we
released that tries to predict
what the human being
selection committee will do,
and she was like, why
is LSU rated so high?
Cause I know that LSU sucks this year.
(laughing)
And we looked at it,
and she was right.
There was a bug where it
had forgotten to account
for their last game
where they lost to Troy
or something and so --
>> That also speaks to
the human element as well.
>> It does.
In general as a rule,
if you're designing a kind
of regression based model,
it's different in machine
learning where you have more,
when you kind of build in
the tolerance for error.
But if you're trying to
do something more precise,
then so much of it is just debugging.
It's saying that looks wrong to me.
And I'm going to investigate that.
And sometimes it's not wrong.
Sometimes your model
actually has an insight
that you didn't have yourself.
But fairly often,
it is.
And I think kind of
what you learn is like,
hey if there's something that bothers me,
I want to go investigate that now
and debug that now.
Because the last thing you
want is where all of a sudden,
the answer you're putting out there
in the world hinges on a mistake
that you made.
Cause you never know if
you have so to speak,
1,000 lines of code
and they all perform
something differently.
You never know when you
get in a weird edge case
where this one decision you made
winds up being the difference
between your having a good
forecast and a bad one.
In a defensible position
and a indefensible one.
So we definitely are quite
diligent and careful.
But it's also kind of knowing like,
hey, where is an approximation good enough
and where do I need more precision?
Cause you could also drive yourself crazy
in the other direction where you know,
it doesn't matter if the
answer is 91.2 versus 90.
And so you can kind of
go 91.2, three, four
and it's like kind of
A) false precision and
B) not a good use of your time.
So that's where I do still spend
a lot of time is thinking about
which problems are
"solvable" or approachable
with data and which ones aren't.
And when they're not by the way,
you're still allowed to report on them.
We are a news organization
so we do traditional reporting as well.
And then kind of figuring out
when do you need precision
versus when is being pointed
in the right direction good enough?
>> I would love to get inside your brain
and see how you operate
on just like an everyday
walking to Walgreens movement.
(laughing)
It's like oh, if I
cross the street in .2--
>> It's not, I mean--
>> Is it like maddening in there?
>> No, not really.
(audience laughing)
I mean, I'm like--
>> This is an honest question.
>> If I'm looking for airfares,
I'm a little more careful.
But no, part of it's like you
don't want to waste time on
unimportant decisions, right?
I will sometimes, if I
can't decide what to eat
at a restaurant, I'll flip a coin.
If the chicken and the pasta
both sound really good--
>> That's not high tech Nate.
We want better. (laughing)
>> But that's the point, right?
It's like both the chicken and the pasta
are going to be really darn good, right?
So I'm not going to waste my
time trying to figure it out.
I'm just going to have an
arbitrary way to decide.
>> Serious and business,
how organizations in the
last three to five years
have just evolved with this data boom.
How are you seeing it as from
a consultant point of view?
Do you think it's an exciting time?
Do you think it's a you must act now time?
>> I mean, we do know
that you definitely see
a lot of talent among the
younger generation now.
That so FiveThirtyEight has
been at ESPN for four years now.
And man, the quality of the interns we get
has improved so much in four years.
The quality of the kind of young hires
that we make straight out of college
has improved so much in four years.
So you definitely do
see a younger generation
for which this is just
part of their bloodstream
and part of their DNA.
And also, particular fields
that we're interested in.
So we're interested in people
who have both a data and
a journalism background.
We're interested in people
who have a visualization
and a coding background.
A lot of what we do is very
much interactive graphics
and so forth.
And so we do see those
skill sets coming into play
a lot more.
And so the kind of shortage of talent
that had I think frankly been
a problem for a long time,
I'm optimistic based on the
young people in our office,
it's a little anecdotal
but you can tell that there
are so many more programs that
are kind of teaching students
the right set of skills that
maybe weren't taught as much
a few years ago.
>> But when you're seeing
these big organizations,
ESPN as perfect example,
moving more towards data and analytics
than ever before.
>> Yeah.
>> You would say that's obviously true.
>> Oh for sure.
>> If you're not moving that direction,
you're going to fall behind quickly.
>> Yeah and the thing is,
if you read my book or I guess people
have a copy of the book.
In some ways it's saying
hey, there are lot of ways
to screw up when you're using data.
And we've built bad models.
We've had models that were
bad and got good results.
Good models that got bad results
and everything else.
But the point is that the
reason to be out in front
of the problem is so you
give yourself more runway
to make errors and mistakes.
And to learn kind of what
works and what doesn't
and which people to put on the problem.
I sometimes do worry that a company says
oh we need data.
And everyone kind of agrees on that now.
We need data science.
Then they have some big test case.
And they have a failure.
And they maybe have a failure
because they didn't know really
how to use it well enough.
But learning from that
and iterating on that.
And so by the time that
you're on the third generation
of kind of a problem that
you're trying to solve,
and you're watching everyone
else make the mistake
that you made five years ago,
I mean, that's really powerful.
But that doesn't mean that
getting invested in it now,
getting invested both in technology
and the human capital side is important.
>> Final question for you
as we run out of time.
2018 beyond, what is your biggest project
in terms of data gathering
that you're working on?
>> There's a midterm election coming up.
That's a big thing for us.
We're also doing a lot
of work with NBA data.
So for four years now,
the NBA has been collecting
player tracking data.
So they have 3D cameras in every arena.
So they can actually kind
of quantify for example
how fast a fast break is, for example.
Or literally where a player
is and where the ball is.
For every NBA game now for
the past four or five years.
And there hasn't really
been an overall metric
of player value that's
taken advantage of that.
The teams do it.
But in the NBA,
the teams are a little bit ahead
of journalists and analysts.
So we're trying to have a really
truly next generation stat.
It's a lot of data.
Sometimes I now more oversee things
than I once did myself.
And so you're parsing through
many, many, many lines of code.
But yeah, so we hope to have that out
at some point in the next few months.
>> Anything you've personally
been passionate about
that you've wanted to
work on and kind of solve?
>> I mean, the NBA thing,
I am a pretty big basketball fan.
>> You can do better than that.
Come on, I want something real personal
that you're like I got
to crunch the numbers.
(audience laughing)
>> You know, we tried to figure out where
the best burrito in America
was a few years ago.
(audience laughing)
>> I'm going to end it there.
>> Okay.
>> Nate, thank you so much
for joining us. (laughing)
It's been an absolute pleasure.
Thank you.
>> Cool, thank you.
(audience cheering and clapping)
(laughing)
>> I thought we were going to
chat World Series, you know.
Burritos, important.
I want to thank everybody
here in our audience.
Let's give him a big round of applause.
(audience cheering and clapping).
>> [Nate] Thank you everyone.
>> Perfect way to end the day.
And for a replay of today's program,
just head on over to ibm.com/dsforall.
I'm Katie Linendoll.
And this has been Data Science for All:
It's a Whole New Game.
(upbeat dance music)
Test one, two.
One, two, three.
Hi guys, I just want
to quickly let you know
as you're exiting.
A few heads up.
Downstairs right now there's going to be
a meet and greet with Nate.
And we're going to be doing
that with clients and customers
who are interested.
So I would recommend
before the game starts,
and you lose Nate,
head on downstairs.
And also the gallery is
open until eight p.m.
with demos and activations.
And tomorrow, make sure to come back too.
Because we have exciting stuff.
I'll be joining you as your host.
And we're kicking off at nine a.m.
So bye everybody,
thank you so much.
>> [Announcer] Ladies and gentlemen,
thank you for attending
this evening's webcast.
If you are not attending all cloud
and cognitive summit tomorrow,
we ask that you recycle your name badge
at the registration desk.
Thank you.
Also, please note there are two exits
on the back of the room on
either side of the room.
Have a good evening.
(upbeat techno music)
Ladies and gentlemen,
the meet and greet will be on stage.
Thank you.
(upbeat techno music)
