I'm a 
data scientist by training.
I used to run data science monetization at
Foursquare.
Before that I was the first data scientist
residence at Andreessen Horowitz.
The company I now run is The Data Incubator,
which I founded.
It's a Cornell Tech backed data science education
company.
We have offices in New York, D.C., and S.F.
We do data science training for corporates
who are looking to have a customized, in-house
data science training offering.
We also help companies with their data science
hiring.
My role at AIG, Michael, was about fundamentally
helping reshape the role of human and machine
intelligence and decision-making across sales,
underwriting, and claims.
In that capacity, I had the privilege of building
what is among the first few C-suite data science
functions at a mature, large firm spanning
the globe.
Here at BCG, at the Boston Consulting Group,
I'm working in particular with the insurance
practice, but generally across industries
with the specific intent of building some
IP in the space of analytics as a service.
If you think about the traditional construct
of many of the high-end strategy consulting
firms, they tend to be much more oriented
toward defined engagements that are time bound
and more people intensive.
My aspiration in joining BCG is to help them
develop some intellectual property through
analytics as a service.
I think we need to start this conversation
with some background about the insurance industry
to give us context around how data science
is used.
Murli, share with us about the insurance industry,
and what do we need to know in the context
of data science?
Certainly.
The core challenge for the insurance sector
is similar to some of financial services.
In insurance you're trying to predict your
cost of goods sold at the point of sale.
Getting that right is absolutely critical
in your ability to achieve margins down the
road.
Anything and everything that you can do to
understand that at its core will give you
a significant competitive advantage.
Now if you zoom out from that problem statement,
in general there are many similarities in
insurance other industries around the role
of data science and machine learning in augmenting
human intelligence and making better decisions--more
structured, granular, sophisticated, consistent
decisions--in sales and marketing, as well
as in pricing, underwriting, and in claims,
which is a significant part of the fulfillment
of the promise that insurance carriers make
to their customers.
What we call data science today is really
part of a long history of the application
of mathematics and computing to industry.
When I joined the industry, and I started
my world in finance at Wall Street, back then
we used to call these jobs quant roles.
You would figure out how to trade in capital
markets, make predictions about which way
the stock price would move.
I think what we've seen is that the tools
and the technologies that we used there were
then really adopted in Silicon Valley, really
turbocharged, frankly made, actually, much
more usable.
Then the cost of computing made it so that
you could apply this not just to a few select
problems on Wall Street, but all over main
street, all over the rest of the financial
services industry.
Really, if we zoom out, as Michael was just
describing, can you talk about some of the
similarities between data science in the insurance
industry and other non-insurance data science
applications as well, since it seems there
are a lot of commonalities there?
Most certainly.
The first big dissimilarity, so to speak,
when comparing insurance to other sectors
is that the role of the actuarial profession
dates back to the early days when insurance
was actually created as a sector.
The role of analytics in insurance has largely
been driven by the actuarial function, which
brings a certain set of nuanced competencies
and capabilities that are relevant to insurance.
The challenge has been that if you were to
think about the broader role that data science
could play in particular in the world that
we live in today in insurance, you can actually
fundamentally reshape human judgment when
it comes to sales, when it comes to underwriting
judgment, and even when it comes to claims
through the lens of data and technology in
ways that might not have been feasible 10,
12, 15 years ago.
The similarity lies in the fact that, much
like many other sectors, in insurance you've
got a sales or distribution channel.
You've got a product channel that is around
pricing the product.
Some of that is around your cost of goods
sold, and some of that is trying to understand
the market's appetite and the customers' demands,
so to speak, or demand elasticity, if you
would.
Last, but not the least, you've got the fulfillment
of that promise that you've made that is very,
very data rich, so if you break down that
value chain to its core elements, there are
similarities to other sectors.
Now the difference could be that if you think
about healthcare, for instance, healthcare
is much more of a transaction, data rich industry
perhaps compared to insurance because you're
engaging with the customers on a very consistent
basis, just as you are in financial services,
in banking, and credit cards and such.
The different perhaps between insurance and
these other sectors is, while certainly getting
your cost of goods sold right early on is
absolutely critical, you're not necessarily
as data rich, as transaction data rich, as
some other sectors are.
Right, but you see this with retail.
You see this through the smart phone, and
we were doing a lot of that when I was at
Foursquare trying to make that retail brick
and mortar experience a bit more digital through
your smart phone.
You see this all over the place.
I think that that's going to be a major driver
of a lot of consumer electronics that you're
going to see coming up is the need for companies
to have data is going to drive a lot of those
interactions onto smart phones, tablets, [and]
wearables.
To build on what you just said, Michael, if
you were to contextualize that to insurance,
where I see the big leap in innovation happening
in the next two to three years is around this
notion of making much more granular, real
time decisions on the basis of machine learning
and by really defining data not just in the
traditional internal structured terms, but
thinking of it in four quadrants: internal
and external on one dimension, and structured
and unstructured on the other dimension.
The ability to build machine learning algorithms
on some of these platforms will reshape what
humans do in terms of decision-making and
judgment and where models harmonize or balance
human judgment with machine intelligence.
The way I would frame it is oftentimes people
think of it as an either/or.
But if you were to re-paraphrase machine intelligence
as nothing but the collective experience of
the institution manifested through some data,
what it does is brings more consistency and
granularity to decision-making.
That's not to say that it would obviate the
role of human judgment completely, but it
is to say that that balance, that harmony
should and will look dramatically different
two years, three years from now than it has
for the last decade and before that.
The next big step-change that I see for this
sector as a whole is evolving from a predictor
of risk to an actual risk partner that can
actually mitigate outcomes through the power
of real time insights.
The most obvious example of that is the role
that sensors can play in providing real-time
feedback to drivers of vehicles in a way that
hopefully reduces risky driving and mitigates
the likelihood of accidents.
To me that is the true power of data science
in insurance.
The beauty of that is not only does it mitigate
accidents from happening, or adverse events
from happening, but what it does in doing
so is reduces the cost of insurance and expands
the reach of insurance to a much broader population,
both in the developed and developing world.
To me, that's a beautiful thing if you think
about society having a much higher level of
financial protection across every aspect of
our lives.
If we think about what's new in data science,
that is, why is data science different from
or how does data science expand upon things
like the actuarial tradition, like statisticians,
the quants of [indiscernible, 00:09:27], I
think it really does kind of come down to
this idea that, one, we're using not just
structure data, so it's not just SQL queries
any more, but it's semistructured and unstructured
data.
How do you start handling things when they
don't come in nice tables that you can load
into Excel or that you can put into SQL?
We are also in a world where data is much
larger.
You mentioned telematics.
If you were taking a reading off of every
car every second, that's a lot of numbers
you've got to store, and that's a very different
paradigm for computation.
You start having to think about, how do you
store this data?
How do you deal with data now that it's stored
across multiple computers?
How do you think about computation in that
context?
Then of course the last thing is always this
idea around real time data.
I think that analytics has historically been--you
might call it--kind of a batch process.
Run it once; generate a report; show it to
people; you're done.
Now it's a continuous process.
You run it; you have to instantly find the
latest trends; put that into production so
that you can adapt to that in an intelligent
way; and then do that again the next hour,
the next minute.
That's kind of where competition is driving
you.
If you look at what Silicon Valley has been
doing, it is very much your server is constantly
learning from user behavior and then able
to adjust how it interacts with users in a
way that--to borrow their expression--delights
the user.
I think that we're seeing that.
Traditional companies, that is non-tech-based
companies, are having to kind of emulate that
kind of level of customer service and satisfaction.
I think a lot of that comes down to big data
and being able to have a team that's capable
of understanding how to manipulate this new
type of data faster, more data, different
kinds of data in a world that's rapidly evolving.
That's right, Michael.
If you think about the historic definition
of transactional data in healthcare and banking,
we know that that's been at the core of how
they think about analytics for quite a while
now.
Traditionally, most of insurance has not had
that version.
But if you were to zoom out and define data
in a much broader sense that includes images,
that includes audio, that includes all sorts
of unstructured data, now insurance has its
own version layered on top with IoT and such.
Insurance has its own version of transactional
data.
The ability to harness that and dramatically
change the cycle time of decision-making,
as well as the granularity of decision-making,
is where the goldmine is for insurance in
the coming five years or so.
I was going to say, no, that's right.
Just to give you one example, we work with
a large consumer bank, both on their trading
side and to help them hire their data science
talent.
One of the really cool applications they've
been able to develop is around merging data
that they get from multiple channels, so from
the Web, from their mobile, from tablets,
from even in-store visits and phone calls
to customer service.
Right?
[They can] bring all that data together so
that their customer service representatives
can see that in one clear, simple visualization.
They know that when a customer calls in; they
instantly get this information and they know
that they've been having trouble opening a
checking account, [for example].
[Then] they can directly target the question
that the customer would like, and then solve
that problem for them.
That's even to the point where, if the customer
has been browsing around trying to get the
answer to a question, the answer might actually
just be populated from their knowledge base
straight onto their screen so that they don't
have to have this awkward process of asking
the customer of the question, then slowly
searching for it for themselves, but the customer
service rep is able just to see that and answer
the question right away.
That creates a much more pleasant customer
experience.
It certainly makes the customer service reps
at the bank seem far more knowledgeable.
I think that that's just one example of how
you can have so much more data in something
where none of that that I just talked about
was the traditional data of transactions and
moving money in and out of your bank account.
This is all a new type of data and from new
sources that we're talking about.
Murli, we have an interesting comment from
Arsalan Khan on Twitter who is asking about
the question of bias because, of course, sensors
have no bias; but when creating models and
selecting data, people do.
How do we deal with that issue?
All models are incorrect; some are useful;
and so the question to me is not about whether
the model is perfection personified, but rather,
how much of an improvement is it relative
to the status quo?
What I've seen very consistently in the sectors
that I've been exposed to in my career is
that if you were to use that litmus test,
models are vastly superior to the judgment-oriented
decision-making that occurs certainly in a
good chunk of the insurance value chain today,
but also in other sectors too.
What we've got to teach ourselves is to not
be naïve to the data gods and assume that
the models are perfection personified, to
understand where they're prone to bias or
error, but also to realize that if we were
to hold human judgment to the same standards
of bias and objectivity that we'd like to
hold models to, it would not be a competition
at any level whatsoever.
Mm-hmm.
The question, to me, isn't whether the model
is biased or not because models do have an
inherent bias because they tend to be biased
by historical experiences, but to ask the
question where and when and to what extent
are they biased.
The even more important question is, how much
of a step change are they from the caliber
of a decision-making that is the current status
quo in that particular part of the value chain?
We often of end up wanting to criticize models
for their imperfections in a way that we may
not hold up the same scrutiny to, say, human
judgments.
There was famous article that just came out
in the last few months about how neural nets
that were being trained on large corpuses
of human text were inherently sexist.
They would say things; they would have gender
associations with, for example, occupations,
or perhaps certain derogatory phrases that
we might really cringe at.
I think that that comes down to the models
are being a mirror, right?
They are holding up to society the data that
we're feeding into them.
They're showing it back to us.
That's not just an idle, philosophical point.
I think it's a very real point that, for example,
in an industry that's highly regulated like
finance, there are lots of laws around equal
opportunity for lending.
You can't make judgments that are disproportionately
negatively impact people of a certain race,
sex, or various other protected categories.
I think that that's one of those places where
we have to be careful as we train these models
that they haven't then picked up some of the
biases that may be inherent in society that
we don't want to keep.
I think that that's part of the reason why
I don't really buy this whole scare mongering
that these computers will take over all of
our jobs because, in the end, there is this
human judgment that comes in where we say,
"Well, okay.
That model probably did learn something about
our society, and we would rather that not
exist, and so we're going to tweak that a
little bit and find ways to mitigate those
effects."
Murli, this issue of bias, when you're inside
an organization like an insurance company,
what are the practical implications of that
and how do you ensure that the model is as
fair as possible and that it doesn't embody
preconceived bias?
Even recognizing what you were just saying
that the question is comparing to what the
human would do, it's still something that
I'm sure you have to be concerned about.
Most certainly.
I think the imperative is to go into this
process and effort with eyes wide open.
I'll give you a classic example of what you
just described.
Oftentimes historically, whether it's in insurance
or even in financial services, healthcare,
credit cards, and such, the ability to detect,
sniff out fraudulent activity has typically
depended on human judgment.
That's not to say that human judgment isn't
valuable.
It's extremely valuable.
However, when you then try to augment that
human intelligence with machine intelligence,
effectively what you are doing is actually
propagating a little bit of the historic bias
that the human judgment has had because you're
using historic data to be able to predict
future fraud based on human judgment in the
past.
The way to break that cycle is two-fold.
Frankly, when you do use algorithms, it'll
tease out noise in human judgment.
The beauty of noise in human judgment is that
it truly is noise, and it's very inconsistent.
Models have that ability to overcome that
inconsistency in judgment in the past, which
is actually a very good thing.
The second thing that I would do in that particular
instance, and there are many other analogies
to this, is also be very purposeful in creating
a particular random sample that stretches
the range of the predictions of some of these
models.
That allows you to go to the periphery and
assess how well the models are or are not
working based on their predictions so that
you create a virtuous cycle where you're actually
challenging the assumptions of your models.
You've got some human beings at the other
end of that spectrum coming up with their
own judgment and throwing out the gauntlet
to create a feedback loop that will allow
the models to get better because they're invariably
going to miss insights that human beings might
have.
How you choose your sample size or your sample
is really, really important for this, for
data.
Just to cite an example from my old field
of being a quant, I think we all know about
the 2008 financial crisis and how, if you
train a bunch of models on the bull years,
those models may not apply very well in a
situation that's a bear year.
You have to be very careful to realize that
just because the last 5, 10, 15 years of macroeconomic
data have been bullish doesn't mean that next
year's will be.
You have to really think about: How do I stress
test my model?
How do I give it examples that may be things
that I expect will happen even if I haven't
quite seen them in my data or in the data
I've been training?
How do you select that data set correctly
so that you do find representative elements
so that your data set isn't horribly biased
and, therefore, giving you a badly biased
model?
That's right, Michael.
I think the secret that you and I know quite
well that perhaps is not more widely understood
is that there's quite a bit of art in the
science of data science.
Absolutely.
That art is absolutely critical because the
biggest risk is one of the blind leading the
blind and the data scientists not really appreciating
the context of the historic data and not having
a basis in which they could test the efficacy
of the predictions in a different environment
than the one that the model was actually built
on.
A classic example of that, in addition to
the financial crisis in 2008, is the o-ring
debacle with the Challenger space shuttle
where they didn't actually test the effectiveness
of the o-rings in a different temperature
setting, and they ended up extrapolating.
Yes.
I think that's really where the magic of human
judgment and machine intelligence actually
comes in.
As important as the science of the data science
is, the art of the data science is perhaps
equally and sometimes even more critical depending
on the consequences of your errors in prediction,
whether they're false positives or false negatives.
That's really where understanding the context;
making sure that you're asking the right questions
and framing them appropriately; and understand
what data you have, what you don't have, and
how that could bias your understanding of
the future; is absolutely critical.
Data science is very technical, lots of math,
lots of programming.
It's really far down the rabbit hole of technical
stuff that you could be doing.
But as a manager, it's still very important
to understand it.
And so, when we are talking to managers and
we're training managers on how to do this,
one of the things we have to really focus
on is, "Well, you may not be able to understand
the probabilities and the nuances perfectly,
but you can understand what happens if you
have a false positive or false negative; which
way you're more willing to make a mistake,"
right?
Then use that to set your thresholds and your
comfort level about, "Okay, I'd rather have
more of one type or the other."
That even goes down to training, right?
You can train models that are focused more
on one type of error than another.
The ultimate call about which way you want
to go, that's a business decision, and that's
why it's such an important lesson for businesspeople
to know about this distinction between false
positives and false negatives.
That's right, Michael, which is why I get
very excited about the notion of really pairing
data scientists with economists or business
analysts who can really shape the context
of how those models are built because you've
got issues around stability; you've got issues
around the standard deviation of some of your
predictions and noise in your predictions
perhaps being higher in certain segments;
you've got issues around tradeoffs that you
make between false positives and false negatives.
Depending on the context in which you operate,
you value that dramatically differently.
If you do value that dramatically differently,
that difference should actually be reflected
in the art of the choices that you make in
practicing data science.
The art of understanding those priors, those
assumptions that I'm making, how does that
impact the data science; how do the results
that come out of these models; how does that
then impact my business decision-making, my
P&L?
I think that that kind of high level understanding
is so important on the business side.
We're talking about the business side.
What about the organizational issues?
How do you introduce this type of thinking
into an organization for which data science
is relatively new?
I'll start with my perspective, Michael, and
I'd be delighted for you to jump in as well,
please.
I don't think there's any obvious, easy answers.
The challenge that I see today in most mature
firms is you've got C-suite leaders that say,
"I'm doing data science," or, "I'm doing data,"
or, "I'm doing digital."
"There's my chief data officer.
There's my chief digital officer.
There's my chief analytics officer."
The way I would reframe that is you help them
fundamentally recognize that this is not just
a separate pillar that you should be thinking
of as being incremental to how you will shape
your business strategy.
These competencies are in the very near future
or, in fact, even in the here and now.
In effect, a mitochondria that will shape
the energy and the life that your firm will
have in terms of its sustainability in a world
of data and tech driven disruption.
The challenge then is that typically in many
of these large institutions, you've got leaders
who have risen to those senior positions on
the basis of historic experiences, which are
less relevant if you extrapolate them to the
future.
And so it really does become an issue around
having the humility to develop much more of
a learning mindset; and recognizing that the
more ambitious you are in terms of really
re-sculpting and reshaping your competitive
positioning, the more you have to be willing
to break glass based on the insights that
you achieved through data science.
It is very much of a leadership, courage of
conviction, issue.
It's very much about CXOs having a view on
what legacy do they want to build in that
organization, what is their courage of conviction,
and how do they shape the problems and questions
that they would like to tackle through the
competencies in a way that will fundamentally
shape their sustenance in the medium and longer
term?
You need a broad swath of the organization
to understand the value of data, how you use
data--think about some of the issues that
Murli and I were just talking about earlier--that
really embrace taking time to have their employees
learn about data science and big data.
On the cultural side, actually, I'd be curious,
Murli, to ask you this question.
I think one of the things that's maybe unique
about insurance or banking is that there is
kind of a legacy of data around the actuarials,
around the statisticians.
How does that change the dynamic of creating
a data culture when you have a legacy group
that's somewhat already steeped in this?
I think there are two parts to that, Michael.
One is, how does that change decision-making
today, and how should that change decision-making
tomorrow?
If one were to zoom out, in general I think
the actuarial function, the profession, and
the exams have not embraced, from my point
of view, the power of data science in its
totality the way perhaps they should.
Maybe they will, looking into the coming few
years.
As you move to that, or if we move to that,
I think it obviates the need for having rigid
titles such as an actuary or a data scientist.
It infuses, into the fundamental DNA of the
organization, a sense of curiosity and a comfort
with challenging one's own assumptions, and
the ability to consistently ask the question,
"What do the data tell us, and where could
the data possibly mislead us?" even if the
models seem perfection personified.
That's one piece of it, I think.
The other piece of it is, if you disaggregate
the entire value chain of insurance, there's
data science that can be applied to many,
many, many aspects of it that can fundamentally
shape the sophistication, timeliness, [and]
granularity of decision-making in ways that
the industry could not have imagined a decade
ago.
To me, the role of data science is very, very
widespread, even if one were to dodge the
traditional domain of the actuarial sciences.
Where I'm hoping the industry is going to
head toward is, rather than have this mindset
of creating rigid silos or pillars, see that
the competencies are interchangeable and they're
one in the same.
Let's actually move to a world where we're
challenging; we understand our assumptions
and are challenging those assumptions to shape
the caliber, effectiveness, and efficiency
of decision-making as opposed to hanging our
hats on what titles we've got, what professional
credentials we've got, or what academic experiences
we have because those are an interesting starting
point, but are really not particularly relevant
in a world where everything around us is changing
at a more profound pace than ever before.
With the actuarials, I think that a lot of
the really farsighted ones, the ones who are
really looking to the future, seem to really
understand this and are embracing a lot of
these new techniques around data science,
around big data, really looking to challenge
the assumptions that maybe their own discipline
has engrained into them through indoctrination.
[They're] really leveraging the existing knowledge
that they have, this really strong knowledge
of probability and statistics, and then seeing
how they can apply that to the data science,
which of course is very rich in probability
and stats.
Much of this falls into the general category
of helping change an organization.
Are there aspects of this that are specific
to the data science as opposed to general
change issues?
The way I would frame that is, Michael, data
science is the engine that is fundamentally
reshaping practically every industry that
we know about.
The pace at which the aggregation of data
is changing and the definition of data itself
is so rapid today that it necessitates this
discussion about the pace at which firms need
to fundamentally question their paradigms
on how they've made decisions historically.
Yes, you're right.
It's a broader sort of change management issue.
If the question is, "Why is this specific
to data science and why isn't it broader?"
the answer is it is broader, but the force
of change that data and technology are imposing
upon our society across sectors make this
issue much, much more critical in the here
and now than perhaps other forces of change
might.
That pace of change is only going to accelerate.
If you think about a lot of the secular trends
that are pushing data into the forefront of
this conversation, those things are not going
away: the falling costs of storage, the falling
costs of being able to transmit data, the
increasing rate of CPUs.
The greater on the social side, so the greater
demand by consumers to have instant responses,
to be on their phones interacting with their
friends as well as companies in digital ways,
I think that trend is not going away.
It's only accelerating.
That's going to be forcing companies to move
more and more in the direction of data.
As we finish up, I'll ask each one of you.
Maybe I'll start with Murli.
What is your advice for organizations that
want to adopt data science in a larger way?
What should they do?
Number one is, develop a sense of humility
about yourselves and evolve from what Carol
Dweck would describe as a fixed mindset to
a growth mindset; i.e. please recognize that
the future is not an extrapolation of the
past, certainly not at least a linear extrapolation
of the past.
The fundamental foundations on which you've
made decisions and built your businesses are
shifting today at a faster pace than ever
before.
That requires you to develop, as an organization,
as individual professionals, that mental agility
to question your assumptions and to challenge
your traditional paradigms in which you run
your functions or businesses.
The first critical aspect of that is to develop
that curiosity and ask questions around the
art of the possible by drawing from learnings
that you have around innovation across sectors
and across fields.
It kind of comes down to two basic first steps.
The first step: get the data, collect it,
[and] store it, what have you.
Second step is to find the talent that's necessary
to deal with the data, manipulate the data,
and be able to come up with actionable insights
from that data.
If you can do both of those things, then I
think you will be at least taking the first
few steps in the direction of building a data
driven culture.
Okay.
Well, thank you so much.
This has been a very interesting show.
We've been talking about the ins and outs
of data science.
I want to thank our guests, Michael Li, who
is the CEO of The Data Incubator; and Murli
Buluswar, who is the former chief science
officer at AIG and currently is working with
Boston Consulting Group.
Thanks so much, everybody.
You've been watching Episode #259 of CxOTalk.
We will see you next time.
Bye-bye.
