MODERATOR: Welcome to the
SAS Global Forum Technology
Connection.
Please welcome the Chief
Operating Officer and Chief
Technology Officer of
SAS, Oliver Schabenberger.
OLIVER SCHABENBERGER:
Good morning.
Good morning, and welcome
to the Technology Connection
at SAS Global Forum.
I'm Oliver Schabenberger,
COO and CTO of SAS
and your emcee for the morning.
And I want to reveal one
big secret right up front--
I do own a pair of jeans.
The theme of the conference
is "Analytics In Action."
And during the
next 90 minutes, we
want to show you exactly that--
how we solve important problems
using data and analytics
using SAS technology.
I speak to organizations and
customers around the world,
and many conversations
have a common thread.
My industry is going
through a transformation,
digital transformation.
Physical assets, books,
cars, computers, stores
are turning into bits and bytes.
The world is drowning
in data, and we're not
taking advantage of it.
We know we have to
do something, and we
know we have to get it right.
But we can't find the
talent to implement
a data-driven business.
We do not know where
and how to get started.
And now there is extreme hype
around artificial intelligence
and machine learning, the
secret weapon in this fight.
But my organization is not
wielding that sword yet.
Are we falling further behind?
We do not want to
add to the hype.
We do not want to
add to the confusion.
Over the next 90
minutes, we want
to make analytics,
machine learning,
and artificial intelligence
real, bring it to life.
Yes, AI is overhyped.
But it's also real and powerful.
Many technologies
and methodologies
are today swept under the
AI umbrella, and that's OK.
Someone quipped that,
quote, "You only call
it AI until it becomes useful.
Then you find
another name for it."
Our current form of narrow
artificial intelligence
is data driven.
And that distinguishes
this era of AI
from the approaches in the past.
We try to create
machine automation
through handcrafted knowledge
systems, expert systems where
software developers pour
our expertise into machine
instructions.
And that works well
for systems that
are defined by clear rules.
When the task is
to capture logic,
it's not to interact with a
complex and dynamic world.
The incredible
improvements we experienced
in computer vision and
natural language understanding
in just the last decade are
based on a different approach.
We worked for years
on handcrafted models
for object detection, facial
recognition, natural language
translation, and so on.
And despite honing
those algorithms
by the best of our
species, that performance
does not come close to what
we can accomplish today
with data-driven
approaches, approaches that
let algorithms discover
patterns from data
rather than coding logic.
The powerful message
here is not that machines
are taking over the world.
It is that we are learning
that we can generate
tremendous value by unlocking
the information, patterns,
and the behaviors that
are captured in data,
that we are understanding that
this is a new era of machine
automation governed
by algorithms that
are derived from
data or that have
shaped themselves iteratively.
And we are learning how to
use this power at scale, how
to apply it across
an enterprise.
During the next 90 minutes, you
will see analytics in action.
At the center of the
technology connection
this morning and
throughout the conference
are analytics, data, the
SAS user, and not hype.
I've been part of many
changes and transformations
in the analytics market
and in our software.
Our innovation is
customer driven,
innovate to meet your needs, and
create tools and solutions that
help you innovate.
And for this to work, we have
to communicate, work together,
talk to each other,
learn from each other,
exchange openly what
works and what does not.
Your feedback is
all important to us.
Here to recognize
one of our SAS users
with the Annual
User Feedback Award
is Annette Harris,
Senior Vice President
for Technical Support at SAS.
Annette?
[APPLAUSE]
ANNETTE HARRIS: Hello.
Thank you for being with us
at SAS Global Forum 2019.
The theme for this
year's conference
is "Analytics In Action,"
and our winner today
is a perfect example
of someone who
is visionary in examining ways
that artificial intelligence
and machine learning can be
used for demand forecasting
and planning capabilities.
He provided input that resulted
in the creation of a new demand
capacity, Assisted
Demand Planning, that
uses machine learning to
boost forecast of value added.
He has shared ways that his
company is using SAS Forecast
Server and SAS Demand-Driven
Planning and Optimization.
He has also engaged with
our product management
at SAS to discuss
functional requirements
for the upcoming Demand
Planning Solution on SAS Viya.
He also led the deployment of
SAS Solution to 48 countries
globally.
So on behalf of SAS, I am proud
to present the 2019 SAS User
Feedback Award to Dr.
Davis Wu of Nestle.
[APPLAUSE]
DAVIS WU: Thank you very much.
Thank you, Annette.
I'm really glad my
contribution adds value
to SAS product developments.
In Nestle, SAS has
become an important tool
for everyday process in
demand planning globally.
And some of the
users and supporters
are with me this morning.
In fact, the success attributes
to the great teamwork
in Nestle.
Today I would like to thank
especially my sponsor Oliver
Gleron, who is
also with us today
and will be in a panel
discussion tomorrow.
Also thanks to the support from
Nestle IT, Francois, Raghav,
and from SAS, Jonathan Riches
and many of your colleagues.
Thank you very much.
Thank you.
MODERATOR: Congratulations
to the 2019 SAS User Feedback
Award winner, Dr. Davis Wu.
OLIVER SCHABENBERGER:
Thank you, Annette.
Thank you, Davis.
Earlier I mentioned
customer driven innovation.
We want to innovate
to meet your needs,
and we want to empower you to
innovate with our products.
Organizations are learning
about the power of analytics,
and we are learning about
their needs for applications.
Working together, we
can generate value
for the organization, for its
constituents, and customers.
It is especially rewarding
when that collaboration
has a positive effect on lives,
affects them, maybe improves
them, maybe even saves them.
Analytics is an
opportunity and a necessity
in the transformation
of health care.
About two years ago, we
partnered with the Amsterdam
University Medical Center
to use computer vision
and predictive analytics
to improve care
for cancer patients.
Ladies and gentlemen,
please join me
in welcoming Dr. Geert
Kazemier, Professor of Surgery
and Director of Surgical
Oncology at the Amsterdam
University Medical Center.
Good morning, Geert.
How are you?
GEERT KAZEMIER: Good.
Good to be here.
OLIVER SCHABENBERGER:
Nice to see you.
Geert, thank you so
much for being with us.
Thank you for the
partnership and for being
here at Global Forum and
sharing with the audience
about the important work
you're doing at Amsterdam UMC.
Tell us about the medical
problem we're trying to solve
and the kind of patients
we are trying to help.
GEERT KAZEMIER: Oliver, in the
product that we do together,
the patients that we're aiming
to help have what we call
colorectal liver metastases.
So those patients have
large bowel cancer,
and the cancer has
spread to the liver.
Colorectal cancer is about
the third most common type
of cancer in the Western
world, and those metastases
occur in about half
of the patients.
So half of the
patients, the tumor
does not stay in the large
bowel but travels to the liver.
OLIVER SCHABENBERGER:
Well, you have
to let that sink
in for a moment.
One of the most common
cancers worldwide,
and half of the patients
experience liver metastases.
What sort of treatments are
prescribed for these patients?
GEERT KAZEMIER: We have
several, but the best available
treatment for these patients
is surgical removal, resection
of the tumor.
That's my daily work.
Unfortunately, it may not
be safe to do this resection
initially because the
tumor is too large,
or you have too many tumors.
And those patients
can become resectable
if we give them
chemotherapy upfront.
So we give them chemo
first and then operate.
And those are the
patients that we
are focusing on in the project.
OLIVER SCHABENBERGER:
So we'd like
to focus on patients who
might undergo therapy
to shrink tumors in order
to make them candidates
for resection.
Today, how do your
physicians assess
whether a patient might be
responding to chemotherapy
and is on the path to
becoming resectable?
GEERT KAZEMIER: Now,
our radiologists do it.
They use what we called
a RECIST criteria.
RECIST is actually
an acronym that
stands for Response Evaluation
Criteria In Solid Tumors.
And to evaluate those
RECIST criteria,
a radiologist selects two
lesions in a patient's image,
as shown on the screen.
And for each lesion,
the radiologist manually
measures its largest
diameter in the slides
before and after the chemo.
If the sum of the diameters
decreases by at least 30%
after treatment,
that's a good thing.
The tumor shrinks.
The patient is classified as
responding to the therapy.
But if the sum of the diameters
increases by 20% or more,
the patient is progressing.
The cancer is progressing,
which is a bad thing.
If it stays about the same,
the patient is called stable.
And this classification
of a patient
determines how we proceed
with the treatment.
OLIVER SCHABENBERGER: OK.
So I'm putting on
my data science head
for a minute here to frame that
challenge that you're facing.
The selection of
a treatment path
depends on the
classification of a patient
as responding, stable,
or progressing.
The classification is made
based on a rule-based system.
The decision input is
measurements made manual
for medical images,
actually one image.
So it seems to me
that the radiologists
have to make some subjective
decisions in following
the RECIST guidelines, such
as which lesions to look at
and which image slices.
Also that RECIST criteria
does not take into account
all the details we could have
available for modern scanners,
like the 3D geometry
of the lesions.
And finally maybe, maybe
we could develop a better
predictive approach model to
predict patients respond better
than summing just
diameters of two lesions.
Am I on the right
track with this?
GEERT KAZEMIER: Yeah,
you're absolutely right.
I mean, up until now
that was not possible.
It's just the two
lesions because more
is too much work for them.
The manual process
at this moment
takes even more than 20 minutes
per scan for a radiologist
to do.
So we believe that those
medical imaging analytics
that you guys have
at the SAS platform
can provide alternative
criteria that
are indeed more objective,
accurate, and automated.
OLIVER SCHABENBERGER:
To make decisions
based on data that are
objective, optimized,
can be applied to all the data
because they scale, and can be
carried out quickly and
consistently-- that sounds
to me like a win-win situation.
Well, let's see how we tackle
this problem with analytics
and how much
progress we've made.
Please meet Fijoy Vadakkumpadan,
Senior Staff Scientist
in our Computer Vision Team.
[VIDEO PLAYBACK]
- I was a very curious
kid growing up, often
tinkering with various
electronic and mechanical
devices at my parents' house.
When I came across
computers in my late teens,
that opened up an
entirely new world
of things that I
could build and fix,
these new things being
computer programs.
And I haven't stopped
coding ever since.
A personal experience
that I had a few years ago
changed the way I view my work.
In 2015, my wife
and I were pregnant
with identical twin girls.
Towards the end
of the pregnancy,
we were getting a detailed
ultrasound imaging
exam almost twice a week.
Because of all these exams,
we discovered early on
that one of the
girls was not growing
as fast as she should have.
So we decided to move forward
with a planned C-section
instead of waiting for
natural delivery, which would
have been unsafe at that point.
The C-section went
well, and we now
have two healthy and
happy girls at home.
If it weren't for
medical image analytics,
the outcome could have
been very different.
And I'm very deeply
touched by that experience.
The realization that I can--
or my work can help make a
similar impact on someone
else's life is very gratifying.
There's no doubt
that medical imaging
has revolutionized medicine.
But at the same
time, this revolution
has brought new
challenges to the clinic.
A radiologist
typically has to look
at thousands of images per day.
And this is where my team at
SAS has stepped in to help.
We have extended
the SAS platform
to process medical images.
SAS platform now
provides an environment
where users can
build applications
that convert medical
image data to insights
that can drive decision making.
My hope is that
this work can help
improve the lives
of radiologists
and associated health
care professionals.
It may even help
save a life one day.
[END PLAYBACK]
OLIVER SCHABENBERGER:
Good morning, Fijoy.
FIJOY VADAKKUMPADAN:
Good morning, Oliver.
OLIVER SCHABENBERGER: Fijoy,
the team has been working
on extending the SAS platform
for medical image processing,
using it to develop applications
that can help oncological teams
like Geert's.
What type of data have you
received from Amsterdam UMC?
FIJOY VADAKKUMPADAN: Oliver,
we have received 3D CT images
from Geert's team,
and also RECIST
data for a number of patients.
The images are stored in DICOM
format, which as you know
is the most popular
format used in the clinic.
Geert's team has also
provided contours
of liver and lesions drawn
by expert radiologists
on each of these scans.
OLIVER SCHABENBERGER:
Well, can we take a look?
Let's see what it looks like.
FIJOY VADAKKUMPADAN: Absolutely.
What you see on screen is
a Python Jupyter notebook
connected to SAS Viya.
The example that I'm
going to show you
is that of a female
patient who was
73 years old at the time
of her hospital visit.
Maybe some of you
in the audience
has a person like that near
and dear to you in your lives.
On the screen are her
data from multiple sources
loaded, integrated, and
processed all in SAS Viya.
This is a 3D visualization
that I can interact with.
The image slices that you see on
screen are three perpendicular
slices from her CT scan.
Along with the slices,
you also see the surface
of her liver in transparent blue
and the surfaces of her lesions
in orange.
OLIVER SCHABENBERGER:
Geert, Fijoy's application
can capture these
highly detailed
3D geometries of the lesions
and the liver from your data.
What are your thoughts on this
when you see these images?
GEERT KAZEMIER: Yeah,
I'm very excited.
I mean, it's amazing to
see how far you guys came.
Those details-- patient
specific geometry is exactly
the kind of information
ignored by the current RECIST
worldwide, actually.
I can't wait-- I have
to be honest-- to see
the new criteria we can come
up with and use those data.
FIJOY VADAKKUMPADAN:
Sure, Geert.
The first criterion
that we looked at
was the total lesion volume
in each of these scans.
We can compute quantities like
that using a specialized action
in SAS Viya now.
Let me run that action
and show you the results.
What you see on the
x-axis are 10 patient IDs.
And on the y-axis, we
have total lesion volumes.
The blue bar shows the lesion
volume before any therapy.
The orange bar shows the total
lesion volume after therapy.
Now, therapy was continued
for some patients.
And for those
patients, the green bar
shows the total lesion volume
after continued therapy.
It's clear from this plot
that this volumetric captures
the shrinkage of tumor
that occurs in most cases
during therapy.
OLIVER SCHABENBERGER: OK.
That sounds great.
It looks like we're
trending in the expected
direction with the criteria.
I assume that this
total lesion volume
is more accurate than just a
RECIST diameter, because we're
working with a 3D volume.
Do we have any
quantitative evidence
for how this might improve
evaluation of the treatment
response?
FIJOY VADAKKUMPADAN: We do.
What I did was to
take each volume value
and from that calculate the
diameter of a sphere that
has the same volume.
Let's call it the 3D diameter
and look at the results
for an example patient.
On the screen are data from
a 69-year-old male patient.
On the left, you see
his RECIST diameter
going from 32 millimeter
to 24 millimeter.
That is about 25%--
that is exactly 25% reduction.
Now, that didn't quite
meet the 30% threshold
that RECIST has to be
considered responsive,
so he was classified as
stable by the radiologist.
Now look at his 3D diameter.
It goes from 33 millimeter
to 23 millimeter, which
is about 30% reduction.
If we use the same threshold
of 30% for this new metric,
he can be classified
as responsive.
GEERT KAZEMIER: And
that's very important.
That actually can
be life changing,
because we know that patients
who we call responsive,
they can benefit from surgery.
And patients that we
call stable cannot.
So this can save lives, since
we know that chemo alone
can never cure a patient.
We most certainly
need to investigate
this new metric for the test.
OLIVER SCHABENBERGER:
So we have a new metric
that is potentially more
accurate than RECIST.
But it's still based on manual
delineations of the tumor
boundaries, which means
that it does doesn't quite
address all the
limitations of RECIST,
in terms of the
subjectivity of the work.
Do you have anything
that will address
those particular limitations?
FIJOY VADAKKUMPADAN: That's
a good point, Oliver.
I want to show you preliminary
results of applying the object
detection capability of SAS
Viya for response assessment.
First I took the
pre-processed data
that I showed earlier
and generated bounding
boxes of lesions in all slices.
Take a look.
What you see on the
screen our example
slices with rectangles
around tumors.
Using these data, I trained a
convolutional neural network
based deep learning
model in SAS Viya.
Let me show you a plot that
illustrates the training
process.
OLIVER SCHABENBERGER:
So to make it clear,
these are the bounding boxes
determined by radiologists.
FIJOY VADAKKUMPADAN: Yes.
OLIVER SCHABENBERGER: Now we're
training a computer vision
model on that data.
FIJOY VADAKKUMPADAN:
Based on that.
The last function here on y-axis
is the objective function that
is minimized during training.
You can see that it
gradually decreases
with the number of epochs
on the x-axis, which
is the number of passes
through the training data,
indicating convergence.
This is what you want to
see when you train a model.
Now, let's score this trained
model on a set of test slices.
OLIVER SCHABENBERGER:
So now we're
looking at how well the
model you trained performs.
FIJOY VADAKKUMPADAN: While
the model is running,
it's called TinyYOLOV2.
It has nine convolutional
layers and about 11 million
parameters.
It looks like the model
has finished running.
Let me scroll it
up so you can see.
What you see on the
screen are results
of automatic lesion
detection performed in SAS
Viya on some example slices.
OLIVER SCHABENBERGER:
This is very impressive.
So we now have an
AI model trained.
What impact here does this
have such an automatic metric
we can derive from here?
Like, this one have
teams like yours.
How can this be
deployed in the clinic?
GEERT KAZEMIER: Yeah,
first such automation
will save those radiologists
a lot of time-- as I
explained to you earlier,
20 minutes per scan.
This is very important, given
that some of our radiologist
spent about a third of
the daily work on RECIST.
OLIVER SCHABENBERGER:
A third every day.
GEERT KAZEMIER: And I
can't share the secret
with you, Oliver.
They don't consider
these measuring tasks
the most inspiring part of
their job, as you can imagine.
And secondly, it provides a more
objective response assessment
metric that will help us to
treat patients consistently.
I'm very, very, very
impressed with the results.
FIJOY VADAKKUMPADAN: We
have actually a plot that
shows the objective metric.
What I did was to take these
bounding boxes and then
calculated a single lesion-sized
metric for each scan based
on the side lengths
of the bounding boxes.
Let's call this
the YOLO diameter
and look at the results
for all patients.
Again, on the x-axis,
you see the patient IDs.
On the y-axis, now we
have the YOLO diameter.
The colors have the
same meaning as before.
You can see that this new metric
captures the shrinkage of tumor
that occurs during
therapy in most cases,
just like the 3D volumetric
that we looked at earlier.
What you've just seen
is a demonstration
of the value proposition of
Viya in medical image analytics,
specifically its ability to
support applications that
can almost fully automatically
go from raw images
to objective metrics that
may be used in the clinic.
OLIVER SCHABENBERGER:
That's wonderful.
Here, looking ahead,
it seems to me
that the new criteria we're
developing and deriving,
you have applications beyond
colorectal cancer and liver
metastases and
colorectal cancer.
Where do you see
applications outside of this?
GEERT KAZEMIER: Yeah,
most definitely.
First, the new
criteria we're deriving
may be applicable to
other solid tumors.
I mean, this is just a use
case that we came up with--
other tumors, like breast
cancer, lung cancer.
And secondly, some
of those new criteria
by themselves, or in
combination with other data--
I could imagine genomic
data, your DNA, or whatever--
may help us to predict outcome
of surgery and overall patient
survival much better
than we do now.
And such predictive analytics
is extremely important to us.
We know that not all
patients respond to surgery
or chemotherapy equally well.
OLIVER SCHABENBERGER:
Yeah, we've
made some great
progress here to develop
more reliable and repeatable
metrics for medical images.
It helps with automation,
saving precious time
of medical
professionals, and when
we talk about artificial
intelligence augmenting
us, supporting us, making
us better at what we do,
this is exactly what
we have in mind.
But we've really only scratched
the surface of what's possible.
Geert, I totally agree.
Predictive analytics based on
combining better intelligence
about medical images with
other sources of data,
genetic information,
environmental information,
is the next logical
and important step.
And personalized medicine,
reliably predicting what
will happen to the
patient rather than
to an average patient--
that should be our goal.
Fijoy, where can
we find out more
about medical image
analytics on the SAS platform
and the SAS partnership
with Amsterdam UMC?
FIJOY VADAKKUMPADAN: We
have two breakout sessions
on these topics, one presented
by myself and Dr. Joost
Huiskens, and another
by Dr. Xindian Long.
Please check them out.
OLIVER SCHABENBERGER:
Geert and Fijoy,
thank you very much for being
with us today and for the very
important work that you do.
FIJOY VADAKKUMPADAN:
Thank you, Oliver.
OLIVER SCHABENBERGER: Good job.
[APPLAUSE]
Ladies and gentlemen, you just
experienced the following--
medical image
processing in SAS Viya
to improve estimates of
tumor lesion size and volume,
augmenting a clinician by
applying a machine learning
model, and the
power of combining
data sources in the service
of predicting health outcomes.
In this demo, Fijoy worked with
an artificial neural network
to recognize tumor
lesions on those images.
And while the algorithm allows
us to process more images,
extracting better
information faster,
such automation also
raises important questions.
Are the algorithms reliable?
Can they be trusted?
Are they performing as
expected and anticipated
by their designers?
Are they equally accurate
for men and women?
Are factors that
matter accounted for?
Are protected classes
indeed protected?
Saying that software
works as coded has never
been an acceptable answer.
All software works as coded.
In this era of machine learning
and artificial intelligence,
we must rethink our
approach and ask
whether algorithms
work not as designed,
but do they work as intended?
A set of data is a
snapshot of the world.
It does not tell us
how the world works.
Take all the patient
data in the world,
and algorithms can find patterns
and correlate conditions
with outcomes.
But they cannot learn medicine.
The desire and need for
transparent and fair decisions
naturally leads us to questions
about interpretability,
explainability, and
bias of algorithms.
None of this is, new but
it is amplified today
because of the speed and
the scale with which we
can automate human tasks
and the new domains,
as you have just
seen, into which
data automation has penetrated.
We rightly want to
know how we fare
when important decisions
about our lives
are arbitrated by technology
that is outside of our control.
A poorly placed ad is
much less consequential
than a misdiagnosed disease
a college admission denied
or financial reputation
harmed by misrepresenting
a disadvantaged group.
Interpretability uses a
mathematical understanding
of the outputs of a
machine learning model.
How does the model react
to changes in the inputs,
for example.
Explainability goes
further than that.
It involves full
verbal explanation
of how a model functions,
what parts of it
were derived automatically,
what parts were modified
in post-processing, how does
the model meet regulations,
and so forth.
Here to discuss and demonstrate
model interpretability and bias
is Xin Hunt, software developer
in the AI and Machine Learning
R&D at SAS.
[VIDEO PLAYBACK]
- The first time I really
got interested in software
was in college.
I was in an engineering
degree, so we
had some programming classes.
One of the classes was
teaching compiled language.
It was really fun and really
got me interested in software
developing.
I think what I'm doing,
what I'm building,
is going to have a big impact on
the future of machine learning
because in order
for general public
to accept certain tools,
these kind of models,
for the society to accept it,
you have to understand it.
And also it's really fun.
I like working with
the people here.
We have a wonderful,
dedicated, hardworking group
of people who are super-smart.
And ever since I came
here as an intern,
I felt like it's a great
group to work with.
Everybody was so
friendly and so smart.
And all our products
are vigorously tested,
so we know it's going to be easy
to use and robust and reliable.
One thing about SAS software is
so much dedication innovation
goes in there.
We have whole groups working
on the cutting edge machine
learning and AI algorithms.
It's also, I think--
SAS software is for everyone,
from novice practitioners
to data scientists, very
senior data scientists,
you can always find a platform
that suits the best for you.
[END PLAYBACK]
OLIVER SCHABENBERGER:
Good morning, Xin.
Welcome to the stage.
Xin, this is your first
Global Forum, right?
XIN HUNT: Yes, very
excited to be here.
OLIVER SCHABENBERGER:
Way to start out.
Xin, many machine
learning models and AI
models we are building today
are not easily understandable.
We cannot just look at their
parameters and figure out
what's going on and
make sense of it.
And it's these type
of models that we
want to focus on right now.
Xin, how would
interpretability help
the radiologists,
the clinicians,
in the lesion
detection application
Geert and Fijoy just showed us?
XIN HUNT: I'd love to
tell you all about that.
But before that, let's take a
step back and take a quick look
at one of the difficulties
detection algorithms tend
to have.
So if you look at
the demo right here,
we'll see that for each of
the lesions the model detects,
it gives you a
probability the model
decides the legion
actually exists there.
So this means in the
model the algorithm has
to set a threshold.
And in the end, the model
only shows you a bounding box
if the probability is
higher than that threshold.
This is tricky to set.
OLIVER SCHABENBERGER:
So we could
have, depending on what you
said, more false positives.
So we might miss a lesion
that actually exists.
How can we mitigate that risk?
XIN HUNT: Yes.
So let me show you
an example first.
Here in the middle, you
see the ground truth
labeled by the clinician.
On the left and right,
we intentionally
set the threshold a
little bit too high
and a little bit too low.
And you can see
that in both cases,
you're met with either false
positives or false negatives.
OLIVER SCHABENBERGER: So
what do we do about this?
How do we set those thresholds?
XIN HUNT: Exactly.
OLIVER SCHABENBERGER:
Or how do we explain
how those images are detected?
XIN HUNT: Right.
For these cases for
medical applications,
it's extremely tricky because
even a small number of mistakes
is dangerous.
So what we really need is a
clinician's final decision
and judgment.
So luckily for a good
model, most of the mistakes
are made right
near the threshold.
I call those the marginal cases.
As you can see on the
right, the marginal cases
tend to have low contrast
and irregular shapes.
Those are best recognized
by a trained professional.
So what we want to
do is have the model
take a look at the
images first, label
those it's confident
about in green
as lesions directly, and
pass on those marginal cases
to the clinician so they
can make the final decision.
OLIVER SCHABENBERGER:
It's almost
like giving the clinician
a virtual assistant.
The model explains,
or tries to explain,
what it sees in the image.
XIN HUNT: Exactly.
It's like an
assistant-- actually,
let's fire up our
assistant here.
In this assistant
here, we combine
the capability of
model interpretability
and our natural
language generation
to generate a short
report for the clinician.
OLIVER SCHABENBERGER: So
you're running Shapley method
in SAS Viya.
XIN HUNT: Yes.
The Shapley method
we're running here,
we actually call it HyperShap.
It's a patent pending algorithm
we developed here at SAS.
We patented this very scalable,
accurate model agnostic
explainer based on Shapley
values, which gives you
an idea how each variable--
or in this case how
each pixel-- contributes
to the final decision made
by the model.
OLIVER SCHABENBERGER: And
without those performance
improvements, without
that scalability,
we would not be able actually
to automate that virtual system
that you're showing us now.
XIN HUNT: Right.
OLIVER SCHABENBERGER:
So the results are back.
What do we see here?
XIN HUNT: Let's take a look.
The report says, hey, I
found two lesions here
with high probabilities.
So I labeled them directly
in green on the left.
There's one more area on
the top of the image labeled
in orange, because I'm
not super-sure about it.
The red pixels in the
explanations in that area
shows why the model thought
there could be a lesion.
OLIVER SCHABENBERGER:
And the text,
where does that come from?
XIN HUNT: That is from the
natural language generation
tool I was talking about.
It can be changed to fit
any type of application
we're running.
OLIVER SCHABENBERGER:
So we want to reduce
the workload of the
clinicians by doing
an initial pass with the model.
But why does the clinician need
to know what the model thinks?
XIN HUNT: There
are a few reasons.
First of all, you see
that the marginal cases
we are passing on
to the clinicians
tend to have low contrast, and
it's hard for really anyone
to see.
So if we can highlight here,
where's the red pixels,
and show where the
area really is,
the clinicians can
make a decision
faster and more reliable.
It also--
OLIVER SCHABENBERGER:
Yeah, go ahead.
XIN HUNT: It also works
as a feedback loop, where
the explanations-- if the
model makes a mistake,
the clinician can send
the explanations back
to the person who
built the model,
and it can potentially be used
to figure out what went wrong
and to further
improve that model.
OLIVER SCHABENBERGER: That's
a very important point.
When we talk about augmentation,
it's not just the machine
augmenting us.
It's also us
augmenting the machine.
It's really
augmenting both ways.
XIN HUNT: Yes.
OLIVER SCHABENBERGER:
That's exciting.
That's wonderful.
So we have a model that now
makes itself interpretable.
The computer vision
model explains
its eyes, both visually
and in natural language.
Let's shift gears a little bit.
And I'm going to take on
a different persona now.
I'm in charge of college
admissions at a university
or in a county or state.
And I'm thinking about
using machine learning--
machine learning to gauge
maybe a student's propensity
or aptitude for college.
And I've heard there's some
really cool machine learning
stuff out there in AI.
And so I asked the
data science team
to come up with a
model, which they did.
And they handed it to me.
They said, it's a gradient
boosting thingamajiggy.
It's really, really cool.
I don't know what that means.
So should I not deploy
this model for real?
Should I use this
to score students
and use this in
college admissions?
Xin, you're my ethicist.
Thank god you're here.
Tell me what I do
with this model.
XIN HUNT: Sure.
So let's first load the
model and take a look.
So now we have the model.
The first thing we
will want to see
is what is in there,
what variables
are contributing to the
decision process of that model.
So what we do is we
run partial dependence
to analyze all the potential
variables that possibly would
be used in the data set and take
a look at their contribution.
OLIVER SCHABENBERGER: All right.
We've got a graph back.
What does that tell us?
XIN HUNT: We see in
the data set there
are five relevant
variables, including
SAT score, the highest
math class the student took
in school, GPA, extracurricular
activities, and high school
ranking.
The analysis found that
out of the five variables,
four of them have significant
contribution to the decision
making process.
And this one variable, the
high school rank variable,
does not affect the
model very much.
So it's probably not
being used by the model.
OLIVER SCHABENBERGER: Oh, and
I see you used natural language
generation to help me
actually understand
what that graph says.
That's great.
So that makes sense to me.
I see the probability
for college admission
depends on your SAT score,
goes up with an increasing SAT
score.
That makes sense.
I feel more comfortable
now about this model.
But I still don't quite
know how it works.
What would happen if I applied
this model to the students?
XIN HUNT: So one thing
we will want to see
is if the model is fair
and unbiased, especially
towards different
groups of people.
So here we have,
say, two counties,
and we want to make sure that
the model is behaving fairly
to the students from them.
So what we run here--
OLIVER SCHABENBERGER: So we
have sort of an expectation how
the model should behave.
And now we're
comparing the reality
against the expectation.
XIN HUNT: Yes.
Here I ran two things.
On the left is the ICE
plot, Individual Conditional
Expectation.
On the right is the partial
dependence plot by county.
So on the left, each
line is an individual,
how their probability
of admission
would change if you changed
your SAT scores and holding
everything else constant.
On the right-hand side
is the group average.
So what we see here, there is
actually a small discrepancy
between the two groups.
OLIVER SCHABENBERGER: Well,
I don't know-- what was that,
Individual Conditional--
I don't know what that
means, Individual Conditional
Expectation.
But I can look at the
plot on the right,
and I'm not comfortable.
So if students have the same
SAT score-- say, 1,000-- then
if they live in County B or
going to school in County B,
they are less likely to
get into college compared
to a student in County A.
XIN HUNT: Yes, that's what
the explanation for the model
is saying to us.
OLIVER SCHABENBERGER: I
would not have expected that.
We'll provide the
same resources,
we have the same quality
teachers in the counties.
What could explain
that difference?
XIN HUNT: Well, since our
models are trained on the data,
usually we want to find out what
was causing it from the data.
So the first step is to
take a look at our data
and see what's different between
those two groups in the data,
and that will give us an idea
of why the model predicts
different
probabilities for them.
Here I plot-- on the left
is the mean difference
between the two counties,
using County B as a baseline.
We have four dots, four
different variables.
And we see, out of the four
variables used by the model,
three of them are
pretty similar.
Their difference
is close to zero.
And only one
variable stands out.
It's the highest math level.
County A students tend
to have highest math
level than County B.
OLIVER SCHABENBERGER: Oh, OK.
I see what's driving this.
If you take higher
math classes, then this
is a contributing factor to
increasing the probability
that you get into college.
XIN HUNT: Right.
OLIVER SCHABENBERGER: But I
would not have expected that,
because I thought that the
math levels we're offering
in the counties are similar.
XIN HUNT: Well, there
are two possibilities.
One is the two counties
are actually offering
different educational programs.
In that case, you would
want to change the model
to include that county
information so we don't
penalize students
from County B by just
being in a different county.
On the other hand, if
the assumption-- or we
know that two counties are
offering similar classes,
students are taking
them but we are
seeing a difference
in the data, then
that means we
could be collecting
data that's not representative
of the student population.
OLIVER SCHABENBERGER:
So now we're
starting to talk about the
root cause of a model deviating
from our expectation.
It could be the model is
wrong where the model needs
to be corrected,
or the input data
does not represent what
we really had in mind.
And then should we
correct the model,
or should we correct the data?
XIN HUNT: It depends
on the assumption.
Here we assume that
the data is bad
because we assume the
two counties are actually
offering similar classes.
Students take them
similarly too.
So we are seeing the
distribution difference
in the student taking classes.
Then we want to either
recollect the data,
or if that's not feasible
we balance the data.
OLIVER SCHABENBERGER:
I don't have
funds to go out and collect
data on all the students in all
the counties now.
But I see that
this is unexpected.
Distribution of the students
in the highest math level
should be the same.
Can we just focus
on those students
and add more samples for that?
XIN HUNT: Yes, we can do that.
We can resample the
data to increase
the percentage of
County B students
with high math classes, so that
the distribution between two
counties are similar in the end.
OLIVER SCHABENBERGER:
And we would have
to retrain the model, then?
XIN HUNT: Yes, we will
have to retrain the model.
And we do that and plot out the
partial dependence and ICE plot
again.
On the left is the original
plots we saw earlier.
And on the right is
after the data balancing,
the two counties' differences
are now very small.
And basically they're not
statistically significant.
OLIVER SCHABENBERGER: Xin, thank
you very much for joining us
and for demoing this morning.
XIN HUNT: Thank you.
OLIVER SCHABENBERGER:
It was wonderful.
[APPLAUSE]
Thank you.
Well, should we
correct the model,
or should we change the data?
We just showed you how
to identify and correct
potential bias in a model.
I think there's a very
important message here--
that this is not a task that's
left to the data scientist
alone.
It requires agreements
on policy, regulations,
and a clear definition of
what success looks like,
as well as an understanding
of the data we expect,
what it should be representative
of on the data that we have.
This is really a
conversation for all of us.
Ladies and gentlemen, this
segment you saw the following--
a complex computer vision
model that makes itself
interpretable, a patent
pending enhancement
to the popular
Shapley method that
makes that
interpretability scalable,
and how to examine and
correct data in a model
for possible bias.
Putting analytics into
action invariably requires
automation of data flows, data
processing, and decisioning.
We are dealing with
increasingly voluminous data,
and automation
allows us to scale
data prep and data processing.
We are dealing with
increasingly varied data,
unstructured data from
logs transcripts and voice
recordings.
Automating natural
language processing
ensures that these data
are not left behind.
And we are dealing with
increasingly complex models.
Finding the best model
and its best parameters
and hyperparameters is really
facilitated through automation.
And maybe more most
importantly, we
are democratizing analytics, and
allowing and enabling everyone
to consume and to
produce analytics.
The business analyst,
the field engineer,
police officers at
headquarters and on the street
should be able to produce and
consume right-time insights.
Last night, during
the opening session,
we introduced you to New Hanover
County in North Carolina,
home of the city of
Wilmington and ground zero
for the opioid epidemic.
The extent of this
epidemic comes into focus
when you think about
this statistic--
12% of the population of New
Hanover County, one in eight,
are abusing opioids.
This has huge
impact on children.
With SAS Visual
Investigator on Viya,
the Department of
Social Services
can bring together
disparate data sources
from law enforcement, case
management, 911 calls,
and generate in near real
time rule-based alerts
when a child's risk
level has increased.
Now, let's kick this up a notch.
What if-- what if we could use
the historical data to develop
a machine learning
model to predict a risk
score for every child?
And that score can
accompany the alert
and helps the social
work to prioritize visits
and follow-ups.
How then could we automate the
modeling and deployment steps
and derive a model that we feel
good about, a model that we
trust?
Here to put
analytics into action
are Susan Haller, Director
of Advanced Analytics
at R&D, and Dragos Coles, Senior
Machine Developer at SAS--
Machine Learning Developer.
[VIDEO PLAYBACK]
- I have been at SAS for
20 years, over 20 years.
So I've spent half
of my life here.
And what I find exciting
is that every day I
come through the door
I'm happy to be here,
and I'm excited about
the new challenges that
are presented to me,
working with my colleagues
to come up with creative ways
to kind of solve those problems.
- I mean, work is
one thing, right?
Work is important.
It's important you like to work.
But it's probably
just as important
that you like the people
that you work with.
- We have created
a new product that
allows you to build dynamic
and automated machine learning
models.
- If you want to do
machine learning,
a data scientist would
go through multiple steps
to be able to model and build
that final model, right?
We're taking all
that work and we're
hiding it behind one click.
- This particular project
has been super exciting to me
since day one.
If you think about it,
we're taking analytics
and we're making them
accessible to everyone.
- You know, we talk
to a lot of customers
who, when you mention
machine learning,
they're interested in it.
They've heard the terminology.
But they're afraid of it, so
they don't know how to get
started.
This is going to be an enabling
technology for those users.
It's rewarding when
you work on something
that will be a real application
that somebody can use.
So I'm not talking about things
that are just cool because they
sound cool, but things
that are cool because they
can have an impact.
- At the end of the day, I hope
that the work that I'm doing
helps our customers do their
job better and more efficiently,
so make them more productive,
enable them to answer
more complex business
problems, allow
them to look in their data
and find information that may
help them make a difference.
[END PLAYBACK]
[APPLAUSE]
OLIVER SCHABENBERGER:
Susan and Dragos,
thank you for joining us today.
Before we start out,
I want to point out
that the technology
you're about to see
is not yet in use by
New Hanover County.
We are showing
technology that will soon
be available from SAS.
OK-- Susan, your role is now
the senior data scientist,
and you're guiding
Dragos, a business
analyst at the Department
of Social Services.
Dragos, you are
about to be augmented
by artificial intelligence
and machine learning.
Good luck.
DRAGOS COLES: I'm excited.
SUSAN HALLER: Thank you, Oliver.
As you've just
heard, we have been
tasked with building a
machine learning model
to generate and assign a
safety risk score to each
of the kids who
are being followed
by the Department
of Social Services.
As you can imagine,
lots of people
are interested in the
field of machine learning,
but not everybody knows
how to get started
in building such a model.
With that in mind, our
team of data scientists
has built a very simple
and custom web application
that the business analyst
in the department,
such as my colleague
Dragos, can use
to get started building a
dynamic and automated model.
So we're going to spend
just a few minutes with you
this morning walking through
building that model using
this custom application while
at the same time walking you
through each of the steps
that we're executing
underneath the covers.
So Dragos, let's get started.
DRAGOS COLES: OK.
So what do you want me to do,
just fill in these parameters?
SUSAN HALLER: That's it.
DRAGOS COLES: That's
simple enough.
So assign a project name,
select a data source.
SUSAN HALLER: So here,
the data science team
has gone ahead and identified a
handful of tables that could be
useful in this model exercise.
Considering what we've been
asked to build, let's go ahead
and select the
child safety data.
DRAGOS COLES: OK.
And what's our goal here?
SUSAN HALLER: Now
that we have our data,
we're presented with a list
of variables in that data.
And by goal, we're
simply asking for you
to identify the
variable that represents
the goal or the outcome
that we're trying
to project in this model.
DRAGOS COLES: OK.
So in this case, we're going
with their safety risk flag.
SUSAN HALLER: That's right.
That's it.
You have now provided all
of the required information
that I need for you to go ahead
and start building a model.
All that's left is for Dragos to
click that Build Model button.
Behind that button is
a very powerful tool
coming from SAS that offers an
API for dynamic automated model
building.
DRAGOS COLES: OK.
So, I mean, this sounds really
simple, but what is an API?
SUSAN HALLER: Ah.
API-- anyone can build
their own custom application
as we've seen here based
on their business problem,
while at the same time embedding
and leveraging SAS' machine
learning capabilities.
DRAGOS COLES: Maybe
that's too easy.
I'll just run it.
SUSAN HALLER: Let's run it.
DRAGOS COLES: OK, so
right now machine learning
is running behind the scenes.
Does that include any
data preparation steps?
SUSAN HALLER: Of course.
Imagine, if you will,
that this API is simply
emulating what I
as a data scientist
would do if I had been
tasked with building
this model by hand.
So first I'm going
to explore my data.
Are there any issues
that I need to resolve?
Second, I'm going to iterate
through different data
preparation techniques--
transformations, imputation,
things such as that.
And finally, I'm even going
to automate the building
of features for you.
DRAGOS COLES: OK.
As a data scientist,
though, you have
to consider different
type of models
when you want to build
the best model, right?
What's available here?
SUSAN HALLER: So
the API is obviously
going to consider a variety
of different models,
finding the best model
type for your data.
It's going to look
at things like radium
boosting models, neural
networks, random forest
to name a few.
DRAGOS COLES: OK, sounds good.
But one thing that I heard
about data scientists
working on projects
like this is they
go through this
iterative process
of data preparation, some
feature engineering, and then
more modeling.
Is that iterative process
running behind the scenes?
SUSAN HALLER: This is
where the intelligent part
of the automation
comes into play.
So at each step along
the way, the API
is going to
continually reassess.
It's going to add
steps to the model.
It's going to remove things
that are no longer necessary.
It may go back and
revisit existing steps
and make modifications to them.
And when the API is happy
with the data preparation
and the model that
it's built, it
goes one step further and
creates an ensemble model,
trying to improve our
overall model accuracy.
DRAGOS COLES: Wow, Susan.
I mean, it really
sounds like what
we have here is a data scientist
behind the click, right?
It's kind of you
behind a button.
SUSAN HALLER: I guess
you can say that.
And in just a few
short minutes, you
can see here that as we walk
through each of the steps
that we're running
behind that API,
Dragos has gone
ahead and created
a model that helps us predict
that safety risk score.
DRAGOS COLES: OK.
Now, we got all this
output from the API.
Since I'm new to
this, let me see
if I can understand
what's happening here.
If we look at the project
summary which, top left side,
seems like we're getting
a summary of the project,
but it seems a little bit like
this text might be dynamic.
So it was telling
us that our model
is based on the KS statistic
on the Test partition.
We have an accuracy
rate of about 90%.
SUSAN HALLER: I'm
glad you noticed that.
Worth mentioning, included
in this automation process
is natural language generation,
where we're dynamically
building this text for you based
on your model and your data.
DRAGOS COLES: OK.
If I look over to
the right side,
I see that our best model
is a gradient boosting model
with 10% misclassification.
On the bottom left, the
most important variable plot
seems that this is listing our
predictive attributes, sorted
by relative importance.
And looking at these attributes,
I can understand some of them,
because I know the data.
So we have school reports
in the last 60 days.
We have the parental
attachment score.
I can intuitively understand
where these prefixes are coming
from, like impute or transform.
This PC1 and PC3, I'm pretty
sure those variables are not
in the original data.
You know, I really
wish I could see
what happened behind the scenes
so I can understand where
these things are coming from.
SUSAN HALLER: You are in luck.
So if you will, go ahead and
select that Open Pipeline link
at the top of your application.
Now, when Dragos executed the
API to build his dynamic model,
he also created a new
project in a SAS product
called Visual Data Mining
and Machine Learning.
Visual Data Mining
and Machine Learning
provides a very nice visual
representation and editable
representation of each of
the steps of the model that
was created for us.
DRAGOS COLES: OK.
So you're saying that the
process is transparent
and now this
project is editable?
SUSAN HALLER: That's
exactly right.
And remember, dynamic as well--
so data specific.
Had Dragos selected a
different data source or even
a different goal
for that matter,
this pipeline could
look vastly different.
DRAGOS COLES: OK, let's
go through this pipeline
a little bit.
It looks like the orange nodes
are data pre-processing nodes.
So we see we have
some transformations,
we have Variable Selection,
Imputation, Feature Extraction
here.
I mean, this is
fairly intuitive,
just understanding the process.
SUSAN HALLER: And it's
these exact data preparation
steps that resulted
in those variables
that Dragos inquired
about just a minute ago
in his variable
importance listing.
The feature extraction
node, for example,
is running a principal
component analysis.
And that principal
component analysis
is creating some
new features for us
that were labeled
PC1, PC2, and PC3,
and we found those as
significant in our model.
DRAGOS COLES: OK.
Looking further down, we
have our modeling nodes.
It looks like the green
ones are the modeling nodes.
You mentioned that the
project is editable, right?
So if I select a node, now I
get a property panel over there
on the right side.
I can edit those properties?
SUSAN HALLER: That's
exactly right.
So here we're looking at
the properties associated
with the Gradient
Boosting model.
But every node in our pipeline
has a similar property listing.
Not only do you see the
properties themselves,
but you also see
the optimal value
for each property that was
selected by the automation
process.
So I, as a data scientist,
if I wanted to come in here
and start changing
things, see if I
could make some modifications,
could easily do.
So for example, I
might want to see
if I could reduce the complexity
of my gradient boosting
model while at the same time
retaining the same accuracy.
The optimization process
selected 75 trees
from a gradient based model.
Dragos, why don't you go
ahead and change it to 50?
You see we can easily do
this, he can rerun the node,
and update the model.
DRAGOS COLES: So
what if I want to add
a new node in the project?
Can I do that?
SUSAN HALLER: Of course.
So just like you can edit
the properties to update
your model, you can
also insert new steps.
And you can do that by dragging
nodes from the tools palette
that he has expanded here into
any step within your pipeline.
So it's a very editable
process, also very flexible.
If you notice,
there are two nodes
listed on the palette that allow
you to inject your own custom
code.
That custom code can be
SAS-based code, obviously,
or it can be open source,
if you want to include
R or Python into your model.
DRAGOS COLES: OK.
I mean, we have a
project that gave us
a good model we're
happy with, right?
So how are we going
to give this model
and put it in the
hands of the consumer
so they can start making
a more informed decision?
SUSAN HALLER:
Excellent question.
Obviously we all know
that building the model
is only the first
step in the process.
It's just as important that
we're able to deploy this model
and get the model into
the hands of those
who want to consume it.
So at this point,
Dragos, let's go ahead
and leave the SAS Visual Data
Mining and Machine Learning
product and go back into
your custom application.
You see a Deploy Model button
embedded in this application.
Why don't you go
ahead and click that?
DRAGOS COLES: OK.
Is this another one of those
APIs you were talking about?
SUSAN HALLER: Of course.
Just like we had a button that
allowed us access to an API
for dynamic and
automated model building,
we have embedded a
similar button here
that surfaces another SAS API
for one click model deployment.
DRAGOS COLES: I mean, Susan,
I'm really excited about this.
In about 10 minutes,
you showed me how
to leverage machine
learning behind the scenes
with the click of a button.
I can open that project that
gets created behind the scenes.
I can use it as a learning
tool or as a prototyping tool,
and then we deployed a
model also fairly easily.
I feel really enabled.
Thank you.
SUSAN HALLER: I'm happy
you're excited about the API.
More importantly, that
something like this
will enable and empower
Dragos and other data analysts
in the department to continue
building models such as this
in the future.
And if you consider
our specific use case,
imagine now that when
an agent in the field
gets an alert that a child
needs a follow-up visit,
that alert is now augmented
with a model-based risk score
indicative of their safety.
DRAGOS COLES: Wow.
Awesome.
OLIVER SCHABENBERGER:
(SINGING) Happy birthday.
Happy birthday to you.
Happy birthday, Dear Susan.
Happy birthday to you.
SUSAN HALLER: Thank you.
OLIVER SCHABENBERGER: Well done.
And happy birthday.
SUSAN HALLER: Thank you.
OLIVER SCHABENBERGER: Dragos.
That was amazing.
And DCSH County is quite
advanced in its use
of machine learning.
Of course, it's a
fictitious county
named after Dragos
Coles and Susan Haller,
but there's nothing fictitious
about the application
or the demo.
Susan and Dragos,
thank you very much.
Ladies and gentlemen, you just
experienced the following--
automating the
iterative construction
of a complex machine
learning model in 10 minutes
by simply calling one API;
transparency of the resulting
model--
you can examine, you can
understand, you can modify;
and deploying a final model just
as easily by simply calling one
API.
Digital transformation
and analytics
are not science projects.
While pilot projects
and POCs are
important to prove
feasibility and ROI,
the goal is to impact the
organization positively
by increasing revenue,
lowering costs, raising safety,
maybe by launching
a new business.
And there are many
barriers to success
in data-driven initiatives,
chief among them
lack of talent, lack of data of
the right quality and quantity,
difficulty
operationalizing analytics,
taking it from the
science project
to operational excellence.
Susan and Dragos
showed us how SAS
helps overcome these barriers.
Automation of the
model building process,
automation of the
model selection process
through challenging
existing models,
automation of the data
preparation and feature
engineering steps, abstraction
of steps that previously
required deprogramming expertise
and deep analytic expertise,
choosing your desired
level of automation
from an open API to
a visual interface
to programming interfaces.
We call this
intelligent automation.
It is data led,
dynamic, transparent,
and you can look under
the hood any time.
Automation does not
mean to look away.
Automation does not mean
you cannot intervene.
It is not the same as autonomy.
Analytics is not
a science project,
and it is not the domain of
only statisticians and data
scientists--
not anymore.
Everyone can contribute,
, everyone can consume,
everyone can produce.
We've just now developed and
deployed predictive analytics.
For each case and child, we
can predict a risk score.
Why have we not yet fully
operationalized the model yet?
How do we put it in
the hands of the users?
Please meet our next
contestant, Sebastian Charrot,
Senior Manager in our
Scottish R&D team.
[VIDEO PLAYBACK]
- I recently became a dad, so
I don't have much spare time.
But when I do I like to do
a bit of art and drawing.
My dad was a cartoonist for a
number of French newspapers,
so as soon as I
could hold a pen,
I was trying to imitate him.
And there's something quite
satisfying about the emotion
you get when you're
really deep in drawing.
It's quite similar to the
flow that you get when you're
solving a programming problem.
If I think back to when I
first began the world of work
after graduating, I
still remember the sense
of deep satisfaction
of knowing that I
was working on a
real product, solving
real problems for real people.
Once you get a taste for
that, it's hard to give up.
So the bigger police
forces currently
raise around a million
intelligence reports a year.
That's a million trips
back to the office
to raise the information that
they've gathered in the field.
That's a lot of waste of
time and effort and manpower.
Having Mobile Investigator
means that you're
no longer desk bound to
access the information
or the capabilities that
you need to do your job.
It means maximizing the time
that you have in the field
and allowing you to access
all those rich and powerful
capabilities on the go.
And it marks the
first time that we'll
be surfacing the operational
and investigative powers of Viya
to users in the field.
So it's a big step.
So we release a lot
of software at SAS.
And it's easy to fall into
the mindset of thinking about
your work in terms of the
releases that you ship
or the bugs you fix or the
features that you implement.
But in reality, we're
not in the business
of delivering features.
We're in the business of solving
problems for our customers.
I'm very fortunate
to be in a position
where I think I know the
challenges that our customers
face and actually have the
power to do something about it.
I work with some of the most
wickedly smart, terrifyingly
capable, generous,
and creative people,
and it's a real joy
to be able to build
great things with them.
[END PLAYBACK]
[APPLAUSE]
OLIVER SCHABENBERGER:
Seb, welcome to the stage.
SEBASTIAN CHARROT:
Thank you much.
OLIVER SCHABENBERGER:
Seb, what are
the applications of the model
Dragos has just shown us?
In the lab, we use
machine learning to detect
and flag children who are
potentially at high risk.
Now that we have the data,
what do we do with it?
How do we make use of
it in the field, put it
in the hands of
those who need it?
SEBASTIAN CHARROT: Well,
SAS has a powerful suite
of tools which allow our
users to triage alerts,
manage their intelligence,
and then coordinate
any investigations that
need to follow from those.
Until recently, however,
access to those capabilities
was limited to users sitting
at their desks in the office
or the station, which
is why I'm really
proud to announce that we
recently launched SAS Mobile
Investigator, a mobile
application which surfaces
the operational and
investigative powers of SAS
Viya to users in the field.
So if we pick up where
Susan and Dragos left off
and continue our
scenario, let's say
that I'm a police officer
working in the Child Protection
Unit.
So it's my job to liaise
with social workers,
visit certain at risk children,
assess the situations,
and then determine any
necessary course of action
that we need to take.
And how do I know who to visit?
Well, using Susan
and Dragos' model,
we can generate
a number of tasks
to visit the highest
risk children
and assign those tasks to
myself and other officers
in the field.
So why don't we just jump
in and see how it plays?
OK, so on my home
screen here, you
see at the bottom right hand,
I have Mobile investigator
installed.
So we'll launch the app, and
we'll sign into the system.
Now, the first screen
you'll see here
will be the Mobile
Investigator homepage.
It's your one-stop shop for all
functionality in the system.
And on the banner, you'll see I
have a number of notifications.
Now, clicking this will take
me to my prioritized task view.
So that's a view of
every task that's been
assigned to me in the system.
OLIVER SCHABENBERGER:
So the model
that Dragos and Susan
developed is already running?
It's prioritized your tasks
based on the risk score?
SEBASTIAN CHARROT: Absolutely.
So the highest
risk is at the top.
So it looks like Jack is
indeed the highest risk
child on my list.
So we'll click into his
records and have a look.
So we have an address.
We even have it
plotted on the map.
So how about we
just go visit Jack?
Now, there's a button underneath
my map here to navigate.
If I click that, it'll
take Jack's address
and then launch my
external map app
and show me a route together.
Now, that's the first of many
examples of Mobile Investigator
tapping into native
capabilities to streamline
things for its users.
OLIVER SCHABENBERGER: Let's
pause on this for a second,
just so we can appreciate this.
The only other way I could
have previously accessed
all this information is
turn the car around, go back
to the station, go to the
desk, do some research,
and then head back out again.
SEBASTIAN CHARROT: Absolutely.
OLIVER SCHABENBERGER: A lot
of waste of time and effort,
now eliminated by just
placing that information right
into the hands of the police
officer or the child safety
person.
SEBASTIAN CHARROT: Exactly.
Now, let's say
we're heading there.
We jump into the
car, and my partner
is driving using
those directions.
And while we're en route, I
want to do a bit of research
to see what else
we know about Jack.
So I can take a look here.
There's some basic details,
everything you'd expect.
He's a nine-year-old boy.
I can see his
family details, so I
know that Pete and
Jane are his parents.
And crucially, I can
see the risk factors
that have come to play to
determine Jack's high risk
score.
So these are all things
that we should maybe
be looking at which explain
our level of concern.
And I could really
drill into those
and appraise myself
of that if I wanted.
Now, as well as all
this core information,
I can also see
any documents that
have been uploaded and
associated with this file.
So I can see a couple of prior
social care visit reports,
and we even have a
photograph of Jack.
OLIVER SCHABENBERGER:
Recognize Jack?
That was this COO/CTO
a few years ago.
SEBASTIAN CHARROT:
So additionally, I
can see that Jack's
file here forms
part of a much larger network
of information in our system.
So he's actually related to
other reports in our data.
And a couple of things
jump out immediately.
Firstly, I see that
Pete Marsh, his dad,
is suspected to be in possession
of an unlicensed firearm.
Now, that's an officer
safety concern.
And it's going to
change my approach
to how I carry out my task.
OLIVER SCHABENBERGER:
So this information
you're receiving in this field
is now affecting, shaping
how you approach the task.
SEBASTIAN CHARROT: Absolutely.
So I may choose to not
go into the premises.
Or I may choose to bring backup.
But regardless, I'm
aware and informed.
So having this information
in the field ahead of time
can save officer lives.
Now, secondly, I can
see that Jane, his mom,
was arrested only
yesterday on a DUI.
Now, that's timely and
relevant information
which I need to have
access to and which
is going to shape my overall
evaluation of Jack's situation.
And lastly, I see that
Jack has been involved
in a number of school incidents
in the recent past which
maybe I want to discuss with him
and his family when I sit down.
OLIVER SCHABENBERGER:
So you visit to Jack.
You conduct an interview
and an assessment.
While the information is fresh
in your mind, what do you do?
How can you record
your findings?
SEBASTIAN CHARROT: Yep.
There's one last
piece of research
I want to do before
we do that, and that's
a neighborhood search.
So we know how crucial the
quality of a neighborhood
is to the welfare of a child.
So what I want to do is
click this top button
to launch my
neighborhood search.
It'll take my current location.
It'll search my
immediate vicinity
for any relevant intelligence
or investigations or incidents
that could be of interest to me.
So we'll kick off that search.
And actually, when
the results come back,
I see there's a fair amount
of drug-related activity
in the neighborhood.
So that's also something
that's going to factor
into my overall assessment.
So as you see, now it's time
to raise a new visit report.
So I'll click a
button to do that,
and I can start filling this
in to my heart's content.
Jack was fine.
I'm always amazed when that
works with a Scottish accent.
OLIVER SCHABENBERGER:
Technology is amazing.
SEBASTIAN CHARROT: It's amazing.
Now, I can really
start fleshing this out
with all the information
I've gathered
during the course of my visit.
And what you'll notice is that
Mobile Investigator is also
capturing and adding
in its own information
to augment my report.
So the visit date has
been set automatically.
I've been set as the
reporter, as well as
details of how to contact me--
that's not my real number--
as well as the county
that I was in-- in fact,
the exact location that I was
in when I raised that report.
OLIVER SCHABENBERGER: Yep.
It's about automating the
obvious, time-consuming,
and possibly the
error-prone task.
Why spend time on that?
SEBASTIAN CHARROT: Yeah.
And it provides crucial
context for the report
that I'm raising.
Now, if anything else
comes to my attention,
I could always just take a
photo of it and upload that.
So I'll take a photo of
this terrific audience.
But that could just as easily
be a picture of the neighborhood
or drug paraphernalia
or really anything
that I deem to be of relevance.
So now all that information
is in the system.
OLIVER SCHABENBERGER:
Maybe it's obvious,
but I want to point out
just how powerful this is.
The information in your
report is now available
using SAS Visual
investigator to everyone
who has access to
SAS VI at the station
or through Mobile
Investigator in the field.
Systems are updated
in real time,
not through an
overnight batch job.
And with that new
information available,
we can do additional reporting.
We could even kick off
that modeling pipeline
Dragos and Susan
developed a moment ago.
Because the data collected
through our site visit
might contain important
information and insights
that might change how we do
the risk score calculations.
SEBASTIAN CHARROT: Exactly.
Now, imagine a world
without Mobile Investigator.
I would have had to
go back to the station
to raise that report, and
it might be one of a dozen
that I have to raise every day.
Having this app means that
I can make that information
available to everyone
as soon as we
know it and not as soon as
traffic or bureaucracy allows.
OLIVER SCHABENBERGER: Indeed.
For the first time we have
placed the power of SAS Viya
into the hands of
operational users,
allowing them access to data
and analytics wherever they are.
The users can spend
more time in the field,
are better informed,
better equipped,
and can do their job
more effectively.
Seb, great work.
Thank you very much.
SEBASTIAN CHARROT: Yes.
OLIVER SCHABENBERGER:
I don't want
to imagine a world without
Mobile Investigator.
[APPLAUSE]
Ladies and gentlemen, you just
experienced the following--
a highly flexible
application that
can be customized for
almost any use case;
real-time interaction
between back end and front
line, analytics on the go; the
blending of systems of records,
systems of engagement, and
systems of intelligence.
You heard that term throughout
the morning, the model.
We are building a model, testing
a model, deploying a model.
Models are at the
heart of analytics,
at the heart of data science.
But they are no longer
just narrowly defined
statistical models,
like a finite mixture
or proportional hazard models.
Today, models are
complex pipelines
of data transformations,
data reductions,
with internal tournaments
and ensembles of approaches.
The input is data,
the output can
be a report, a prediction,
a recommendation,
a classification, and so on.
How many models do you
have in your organization?
Susan and Dragos
built one for us.
Do you have two,
three, 400, 2,000?
When you work with models,
some of the major challenges
are knowing whether
they are still valid,
how can I track their
version, their vintage?
Is this model superior
to one developed
in a different language
with different libraries?
How do I move the model from
the sandbox into production?
How do I deploy the
model in a data stream
in Hadoop inside a database
or capture its end point
with an API--
I have models in
SAS, R, and Python.
How do I manage them all?
Model management
is a key ingredient
in making analytics
real, in making
analytics stick in operation.
With SAS Model,
Manager, you can control
the versioning of models,
compare them, test them,
and publish them.
You can monitor their
performance over time,
challenge them, retrain
them, and update.
You can integrate open source
models in your data science
pipeline and govern them
alongside SAS models.
Please look for presentations on
visual data mining and machine
learning and Model Manager
in the Quad, super demos,
and in paper sessions.
This is the technology
journey we took you on today.
The theme of the tech
connection this morning
was "Analytics in Action."
We use SAS technology
to tackle problems
in health care,
child safety, fraud,
and security intelligence.
Problems that can only
be solved through data
and analytic automation
exist in many, many fields.
Here to discuss a
domain that is near
and dear to all of our hearts,
the health of our planet,
is John Gibson, Chairman for
Energy Technology at Tudor,
Pickering, Holt, and Company.
Welcome, John.
[APPLAUSE]
JOHN GIBSON: How
you doing, Oliver?
OLIVER SCHABENBERGER:
Hello, my friend.
JOHN GIBSON: Good to see you.
OLIVER SCHABENBERGER:
Come on in.
John, thank you for being
here at Global Forum.
We're in Texas, the nerve center
of the oil and gas industry.
And I admire your boots.
Do you admire my boots?
You have deep roots in
the oil and gas industry,
and you're an absolute
expert in that field.
Share a little bit with the
audience your background.
JOHN GIBSON: Well,
Oliver, believe it or not,
my first use of SAS was about
1988 at Chevron Research.
And so I've been a user then--
don't ask me to do
anything now, though.
I couldn't do a demo for you.
OLIVER SCHABENBERGER:
We can automate this.
We can visualize it.
Visual program is very easy.
We'll get you into Quad
in front of lectern.
JOHN GIBSON: Well, my
career after Chevron,
I have had the opportunity to
run two of the largest software
companies in oil and gas.
So Landmark Graphics, which
we sold to Halliburton.
Then left Halliburton and
did Paradigm Geophysical,
which is now Emerson
E&P. And so I
was CEO of both of
those organizations
and helped build
those platforms,
which really do a lot
of computer vision
and others for the
subsurface there.
So I had a lot of work in
the software technology area.
OLIVER SCHABENBERGER: John,
probably the most important
topic to the oil
and gas industry--
and I think to the world--
is carbon, CO2,
greenhouse gases.
All link quite closely
to climate change.
How much carbon is being
created on an annual basis?
JOHN GIBSON: Well,
on an annual basis,
we're at about 36 gigatons.
OLIVER SCHABENBERGER: 36--
JOHN GIBSON: Gigatons--
which you and I
were talking about it.
It's hard to
visualize 36 gigatons.
So to try to create
a mental image,
if you're familiar
with Jerry's World,
the AT&T Stadium, if we took
all of the air out of it
and extracted the CO2, we'd
get about 2.2 tons of CO2.
So we only need 18 billion
or so Jerry's Worlds
in order to extract
the amount of CO2
we're emitting each year
above the carbon cycle.
OLIVER SCHABENBERGER:
Billion, with a B.
JOHN GIBSON: B, billion.
OLIVER SCHABENBERGER: We
always talk about technology
and we focus on
its urgencies, what
it wants, the progress
we made, that today's
better than the Stone Age
because of technology.
And then there are
these side effects,
the unintended consequences of
technology, like CO2 emissions,
like greenhouse gases.
What is going to
happen to the world
if we do not address
carbon dioxide?
JOHN GIBSON: So, you sort
of put me on the spot
as an oil and gas guy.
And we've been on the spot
for the last few years.
If we don't address carbon
dioxide as a hydrocarbon
industry, we can't sustain
the hydrocarbon industry.
We'll have to go to a
different form of energy.
We can't see CO2 levels
grow from 410 to 450
without having a plant
to begin to address them.
We now estimate it could be up
to 300,000 years for the Earth
to restore the carbon level
if we just left it alone.
And so we're going to have to
make positive actions in order
to actually reduce the
levels as we're growing them.
OLIVER SCHABENBERGER:
You mentioned
this amazing-- this huge
number, this mind-boggling
number of annual output.
So if you know how
much it is, why
don't we do something about it?
Isn't this just an attribution
problem, who generates what?
JOHN GIBSON: Well, it is.
I mean, I kind of
follow politics on this,
and it's getting to
be very political.
The Green New Deal--
I won't ask everybody
to shout out if they're
for it or against it.
But directionally,
that tells you
where our country, where the
sentiment of our government's
going.
And so as a result, you're
going to see regulations come.
We've got about 60 bills that
are going to be introduced,
30 in the House of
Representatives,
30 in the Senate.
In the absence of
a strong EPA, we're
seeing congressional
efforts in that.
So even as we're
speaking now, we're
very close to
launching OCO3, which
is our newest carbon emission
satellite here in the US.
And so it got no approval
from the White House,
but it got approval
from Congress,
and it will be going up shortly.
OLIVER SCHABENBERGER: So
how does data and analytics
play a role in all this?
Who collects the data today?
What do organizations
need to know?
JOHN GIBSON: Well, the
regulation which is coming
is really--
most people are using
greenhouse gas protocol.
And so on that
greenhouse gas protocol,
you report in scope one,
scope two, scope three--
which is what do
you use directly,
what do you use indirectly--
so electricity generated
that might come
in in scope two--
and then scope three would
include business travel.
So if you're sitting on
a United Airlines flight
coming here to the
conference, what
portion of the
emissions from that
should you be accounting back?
Now, as it turns out,
one company's scope one
is another company's
scope three.
And so you can see the
hydrocarbon industry
has Uber as scope three.
And then Uber has the
hydrocarbon industry
in scope one and producing it.
So just the sheer accounting
and reporting of this
is going to require some
significant analytical models
going forward.
OLIVER SCHABENBERGER: I
read a fascinating article
about the actual carbon
footprint of some of the things
we're doing today.
There was a carbon footprint
about streaming platforms,
and we thought the carbon
footprint was high when we all
used vinyl on turntables.
But actually it turns
out, the carbon footprint
might be higher for
the streaming music
because of all the back end
computing and the energy
we have to generate
to support that.
JOHN GIBSON: There's
no question there's
unintended consequences.
We tried to remove carbon,
and we increase it.
We see that in Europe where
an intent to be carbon neutral
ends up increasing
carbon because we end up
having to outsource
power to coal plants.
We've also seen an
elimination of coal in the US,
and we've seen coal consumption
grow by 3% globally.
So we've underestimated
the human element,
which is that need
for cheap energy
in order to grow the quality
of life in other countries.
And so consequently we're
doing the right thing here,
and we're getting
the wrong outcome.
And so it's a very
complicated problem.
OLIVER SCHABENBERGER: Yeah.
Carbon accounting
systems-- so it's
rolling up all the contribution.
You mentioned scope
one, two, three.
Where do you see SAS fitting
into this urgent need
to address carbon?
JOHN GIBSON: Well, there's
no question that SAS I think
could have a tremendous role.
And I'm hoping that
the end of this session
is the beginning of a new
journey for SAS in climate
accounting, because
each company,
if you're one of the
chief data scientists here
or chief technology officer,
you should be thinking about,
how do you do scope one, how
do you do scope two and scope
three, and build a model?
And then understand, as
you turn those knobs,
do you get the
desired consequence
or an unintended consequence?
And how does that risk
performance really
get coordinated or communicated
to a board of directors?
You're at the board level at
SAS with these carbon models
and how that's going to
create financial risk
for organizations.
I hope next year I'm here and
we have somebody actually doing
a demo for you that's
really showing how they've
done their climate model.
OLIVER SCHABENBERGER: How
about you come back and drive
that demo for us?
JOHN GIBSON: Well, I'm not
sure I'm the right guy.
OLIVER SCHABENBERGER:
Something we have not
mentioned much today is
IOT and connectivity.
But I see opportunities for
when everything is connected,
when devices are
talking to each other,
just as they report how
much electricity they need,
maybe they can start reporting
without us having to know it
their carbon footprint--
you know, their scope one,
two, three contributions--
and we could roll it up.
JOHN GIBSON: There's
no question--
OLIVER SCHABENBERGER:
Use technology
to address that problem
with technology.
JOHN GIBSON: It
has to be that way.
I mean, it can't
be a system where--
if we take a look,
there's a quote
on a slide that'll
tell you that KPMG,
that 75% of global
companies that are producing
the majority of
the revenue don't
have any statement
on climate change.
In the US, 50% don't have a
statement on climate change.
Very few are doing greenhouse
gas protocol reporting.
In the absence of data,
we get no progress.
And I think that with SAS and
with a data-driven activity
associated with climate,
we have a future.
Without it, we
have a real problem
that's continuing
to accrete if we put
more and more CO2 in the air.
OLIVER SCHABENBERGER: Well,
let's work on the problem
to secure the future.
John, thank you for sharing
your insights this much.
JOHN GIBSON: Thank you so much.
I appreciate it so much.
Thank you.
OLIVER SCHABENBERGER:
And John will
be here presenting on
Tuesday about predicting
the unpredictable.
Technology is unstoppable.
It's who we are and what we do--
not just at SAS, as a species.
Technologies are all the
inventions of the human mind,
not just tools and gadgets--
analytics.
The multidisciplinary effort
to derive insight from data
is technology.
And as such, it exhibits the
same urgency as all technology.
It wants to reorganize.
It wants to become more
distributed, abundant,
and accessible.
What we have shown this
morning is how these organizing
principles manifest
themselves, enabling insight
and decisioning based on
data by those without degrees
in data science--
jobs made easier,
more productive,
decisions made more reliably
and faster, analytics that
follows the data.
It becomes more distributed.
It's supplied by the right
person at the right place
and time.
Analytics moves from
science projects
into operations, the
hospital, the Department
of Social Services, the field
engineer, and police officer.
At SAS, we are on a mission--
on a mission to remove barriers
to producing and consuming
analytics through
visual interfaces
at parity with
programming interfaces,
through open source
integration, through APIs
that make building and
deploying models simpler,
through automation of
analytics, embedding analytics.
You can see this play out
throughout this conference
in talks, super-demos,
and in the Quad.
Look for it-- analytics in
action, hidden in plain sight.
Enjoy the rest of
the conference,
and thank you very much.
[APPLAUSE]
