[MUSIC PLAYING]
JONATHAN WEXLER: Hey, everyone.
This is Jonathan from SAS
Machine Learning and Artificial
Intelligence.
SAS visual data mining
and machine learning,
affectionately known
as VDMML, provides
an automated, open,
and most importantly
transparent and explainable
experience for both business
users and data scientists.
Unlike many of our competitors,
VDMML is not blackbox.
Today, I'd like to show you how
SAS automation is transforming
the machine learning landscape.
Let's take a look
at VDMML in action.
I've logged into VDMML,
and in front of us,
we have a dynamic pipeline.
This pipeline was
automatically generated
based on the data I
loaded to the system.
Now, the types of techniques
that are available to you
are feature engineering,
dimension reduction,
feature extraction, to many
different machine learning
techniques.
The system will find the
most optimal solution
based on your data.
How did we get there?
Through the ability
of adding a pipeline.
I can either add a pipeline
from the SAS Exchange.
Over the years, you've seen how
the SAS Exchange provides you
with the ability to add in
templates from other users,
so you can add in best practices
from one of your data scientist
colleagues, or you can even
add in best practices from SAS.
SAS provides out-of-the-box
templates for you to use inside
your analysis.
The Exchange is a wonderful
place for collaboration.
What we've added
here is the ability
to automatically
generate pipelines.
So once I hit the OK button,
an intelligent process
is underway.
The system, depending on the
type of data that I load,
will intelligently
determine what
the right set of
transformations--
the optimal set of features
to help predict my outcome.
As it's going
through this process,
it's continually benchmarking.
So as it finds a set
of transformations,
a set of features that it
wants to use in its models,
it will continue to benchmark
and then build another model.
So for example as
its determining
the right set of
features, it will add
in a gradient boosting model.
It will then add in a forest.
And it will optimally
tune those features
to find the best solution.
And this is an
iterative process.
So as it finds an algorithm
that helps predict the outcome,
it may go back and try
additional features.
So again, this is a
continuous process
that will run through
hundreds, if not
thousands, of permutations to
help find the optimal solution.
And as this is running
through, you're
presented with an
automated status.
So the status that you are
presented with is real time.
It gives you an indication of
where it is in the process.
So it looks like it's completed.
Let's take a look
at the results.
As we can see here, it
automatically drew a pipeline.
Now, there's different
types of transformations
that it generated,
different algorithms.
So for example, it ran decision
trees, and in this case,
logistic regression.
And you'll notice
there that it ensembled
a handful of the models.
It didn't just
ensemble everything.
It determined the
smart way to ensemble
to find the best solution.
Let's take a look at
what the system will
generate once it's completed.
If we go back to my
original pipeline,
we can see here
that the system ran
through different permutations.
And the model that it chose
was an ensemble model.
And if we take a
look at the results,
you're presented with
different types of score code.
You're presented with
assessment information.
Typically, when a data scientist
is analyzing their results,
they're very comfortable
with what a lift chart.
Means but if you're
a business analyst,
it's possible that
you don't understand
what cumulative lift is.
You don't understand
what an ROC plot is.
So as I click
through the system,
there's little info helpers.
So for example, I clicked
on Cumulative Lift.
And you'll notice through
the use of natural language,
the system presented a natural
language understanding.
So for example, you can see
here that the VALIDATE partition
has a Cumulative Lift of 1.84
in the top decile, for example.
But this helps explain to me
why this particular metric was
important and how
to interpret it.
It's dynamic.
So for example, if
you have new data
and the system has
determined a different model,
has run through
different properties,
the explanation
will be different.
So as I'm clicking
throughout the system,
I'm provided with
different explanations.
Model interpretability
is critical
when understanding the
validity of a model.
For those of you out
there that are not
familiar with such techniques as
LIME, ICE, partial dependency,
Kernel SHAP--
these are very popular
methods in the market
to help describe from a
surrogate perspective what
the important factors
are in your model.
So inside the system,
you're presented
with partial dependency plots.
So if I'm a business
analyst, it's
very easy to understand from
a "what if" perspective, what
happens if my behavior
score changes?
My prediction
changes accordingly.
For example, if I click
on a different product--
you can see here that if I
have the lavender product,
I have the highest probability.
And again, if I'm not sure how
to interpret this information,
the system provides
me with a real time
readout in interpretability.
We also have many other
advanced techniques,
from LIME, ICE, Kernel SHAP.
These allow you
to go even deeper
into individual observations.
So again, I don't need to be
a data scientist in this case
to understand how to
interpret these models.
It's important to note
that the system provided
an automated readout, or
an automated pipeline,
for me to be able to
interpret or to build
the most accurate model.
But I mentioned that
this is not blackbox,
and this is also customizable.
So you have the ability, for
example, to add in, let's say,
SAS code.
If I wanted to amend my
analysis or extend it,
I can add in
additional SAS code.
I can also add in
open source code.
So the system is also open
to different languages,
from Python to R. So if you
have other data scientists
on your team and they want
to amend the analysis,
they want to add
to the analysis,
they can bring in other
open source techniques.
Once you're, quote
unquote, "done,"
this could have taken days.
This could have taken
weeks, because this
is an iterative process.
Perhaps there is
approval processes
that you want to run through.
Model deployment is critical.
And the system certainly has
many ways to deployment models,
from one click, to REST APIs,
to in-database, in-Hadoop
processes.
But let's say you're
a business analyst
and you want to be
able to explain this
to the rest of the
business through the use
of automated Insights.
Automated Insights
allow you to be
able to communicate this
information to a wider
audience.
So when I click on
Insights, you'll
notice that the
system dynamically
generated what I like to call--
it's almost a book report.
This is a summary of
everything that I've done today
inside this analysis.
And through the use
of natural language,
this gave me an indication of
the most important factors.
But this also has natural
language built in.
So again, if I'm not familiar
with any one of these results,
I can just click my
little info helpers.
So again, this system is
intended for not only business
users, but it's also
intended for data scientists
to work together with
these business users.
So I hope you enjoyed this quick
tutorial on SAS automation.
It's a really exciting time
to join in and use VDMML.
You can be a data scientist and
write the most advanced machine
learning models, but you
can also be a business user,
and through the use
of SAS automation,
you can take advantage of either
SAS best practices, or even
best practices from
your colleagues.
And most importantly, you
can communicate your findings
and deploy your
models to production.
Thanks for joining.
