>> Hi, this is Steve Michelotti
of the Azure Government
Engineering Team.
I'm joined here today
by Phil Coachmen,
a Cloud Solution Architect
for Microsoft. Welcome, Phil.
>> Thanks Steve.
>> So, we're going to talk about
Data Science and Containers
on Azure Governments Day,
is that right?
>> That's right.
>> All right. So, why do
Data Science with Containers?
>> Well, Containers
offer several advantages
for a data scientist.
It provides a consistent,
portable framework in order
to train and deploy my models,
and it allows me to scale
my APIs once I get them out
there without having to
worry about managing
different VMs or other resources.
>> Okay. So, you not
only training but
also deploying the models?
>> Right. So, once I
build an API and I want
to have something available
for people to call my API,
how do I scale that when I
have a million requests a
day and a 100 request tomorrow?
That's what Containers
allow me to do,
is tailor that very easily and
make sure I have the same
environment every time.
>> So, I could even train without
Containers but once I
have that model built,
then put it in Containers,
I can do it either
way, it sounds like?
>> Exactly. I could
use Containers for
massive training using
something like Azure Batch,
or I could train in
Jupyter Notebooks and use
my model for my API and
scale that with Containers.
>> Even though it's
like a HD insight,
we can train with that.
>> Exactly.
>> But then we use Containers,
Docker Containers,
whatever the case
may be once we start deploying.
>> Exactly.
>> Okay. So, let's dig into
this a little bit more.
>> Okay. So, one of the things
that Containers provides me is
the ability to have
that consistent and
portable environment.
A lot of times what
you will see is that
in highly regulated industries,
like healthcare, public sector,
or financial services,
is not all the resources
that we have available to
our commercial customers meet
those regulatory requirements,
like FedRAMP or Hipper.
So, today, we do that with
virtual machines using
an iOS environment
and Azure government.
But as the future comes
out and we get to
the PaaS offerings
like Azure Kubernetes Services
and Azure Container Registry,
we can just pick those
up and move those into
that environment without
having to change anything,
so the data scientist just
gets to keep focused on
their data and not having to
worry about what environment
they're in anymore.
>> Okay.
So we have the environment
with Azure Government with
this certifications
and compliance to
do the training as
we build our models?
>> Right. So, you kind
of take to secure
requirements away from
the data scientists,
and leave the
security requirements
up to your deployment model.
Which is really nice because data
scientists don't want to
have to worry about that.
They just want to focus on data,
modeling their data, and
getting it out there.
>> Absolutely. Here,
you have never
outlined doing this in
IaaS, with infrastructure,
and also platform as a service,
but even once I've got that
model in a Docker Container,
I can deploy that anywhere,
even at the Edge, even
outside of Azure.
>> You can take it
to the Edge, you
take it to Azure Stack,
I could take it to Azure Batch,
or I could even take it
On-Premise if I wanted to.
So, using Azure for my
training for that environment,
and if I had to
bring it On-Premise,
I'd still have that Container.
I could take it wherever I want.
>> I could put the Container on
IoT device if I wanted to.
>> Exactly.
>> Okay. Awesome.
>> Alright. So, today we're
actually going to
use what is called
the Hello World kind
of data science and
machine learning model.
So, we're going to
predict irises.
There's three different types
of irises: the Versicolor,
the Setosa, and the Virginica.
What they found is that they
could predict which one it
was based on their sepal length
and their petal length.
This screenshot here shows
the difference between the sepals
and the petals and what we're
going to try to predict.
I'm going to use
a support vector machine,
and a package called
Scikit-Learn in Python,
and the datasets are
already loaded there.
So, I guess that this
is the Hello World.
It's not a complicated model
but the idea is to show
how easy it is to use
Containers to do this work.
>> Okay. So, if I
understand this correct,
because I can't pronounce any
of the words you use there,
I can simplify this by
thinking of it like
a computer vision problem
where the computer vision will
plot the vectors of the petals,
those will give us coordinates,
and then we use that to determine
what type of iris it is?
>> Exactly. So, on this slide,
you can see here the vectors
based on a sepal length,
and this is exactly what
our computer vision
model is doing.
So, that's what we're going to
build using the Hello World.
So, to get started,
I'm going to go
to aka.ms/dscontainers,
which is going to take
me to GitHub, and this is
where I have the code set up.
I'm going to run
two different Containers,
one for training, and one
for deploying my model.
So, I'm just going to clone
this using GitHub desktop.
So I'm going to clone
it locally down
to my work spaces area.
>> Okay, awesome. So, you're
using GitHub desktop,
but if someone wants to
use the command line,
that's fine too.
Totally their choice.
>> They could use
command line, they
could use Visual Studio,
Visual Code, whatever their
environment of choice is.
>> Whatever makes them happy.
>> Exactly. So now that
the repository's down,
let me just pull up my Explorer.
So, you see I have
two different Containers.
One for my training environment,
one for my deploying environment.
So, one I'm going
to train the model
in, one I'm going to deploy it.
Now, I'm a data scientist,
so I like to use PyCharm.
>> That's what we're
looking at there.
It's at the GitHub
source tree just before that.
>> Yes. This is source tree.
So, I'm going to pull up PyCharm.
So, one of the things is that,
I don't want to have to use
a different tool set every
time I do something,
so I want to continue
work in the same
environment that I'm working in.
So, now that I'm in PyCharm,
first thing I need to do is
just going to look and see
all my packages that
I have installed.
This is pretty
standard with PyCharm.
But, if we dig in,
you'll notice that I
have my two Containers.
So, train and deploy.
The train one is going
to train my model,
and then the deploy one is
going to deploy my API.
So, let's look into
the train model.
So, this Python code here,
from this line down,
from line 13 down,
this is all Boilerplate code,
that is just to save my model
out to my storage environment
which is Azure Storage.
>> So, I see. So, on line 10,
I guess, where they had the fit,
that's training the model
and then all this stuff
down here at the Boilerplate
is just saving the trained model.
>> This is just Boilerplate.
You'll see that I'm
getting my environment so
it's getting these in
my "Blob Account",
my "Key", my "Container
Name", my "File Name",
I'm going to put those
into my Docker file,
so those commands, so,
as a data scientist,
I don't have to worry
about this part.
All I need to focus
on is this top part
where I'm going to say,
"I need to get my data,
and I need to train my model."
That's it. So, I take
what I'm normally doing,
I throw it in there, and now
I'm able to train my model.
So, a couple of things
with the Docker file,
let's take a look at that.
Here I'm passing in my
environment variables.
So, my "Account", my "Key",
my "Container", my "File name",
then, you'll see down here,
I have my "Pip Install
Requirements Texts"
and that's going to
allow me to have
that consistent
environment every time.
So, I can say here
the package that I need
for my environment,
and I know that they're
going to be installed and I
don't have to worry about
that package dependency issue.
>> Great.
>> Then, at the very
end is going to
actually call my training.
So, my requirements texts
is where I would say,
"Okay, now I need Scikit-Learn,
I need Pandas, I need
Flash, whatever it may be."
I can have this specific
version numbers,
so that as my Container scale,
I know that each Container has
the exact package that
I want every time.
>> Alright. Sounds good.
>> So, once my model is trained,
I'm going to actually
have my API deployed.
So, if you look at
the deploy package,
similar Docker file,
I am still standing in my
environment variables but,
in this case, I'm exposing
using "Gunicorn."
Using "Gunicorn", I'm exposing
a web service on port
80 on my container,
and that's where my APIs
are going to live.
>> Okay. So, if we think
about the trained Container,
it was training and saving
it to Blob Storage,
and this one here for deploy,
is retrieving it
from Blob Storage.
This is what it's going
to encapsulate this.
>> Right. So, my training,
I can schedule,
let's say I need to have my
model retrained nightly.
Well, how do you do that
now as a data scientist?
It's usually very
complicated process,
but now I can say was,
with the Container,
I haven't trained every night
or whatever my schedule
on Build is,
and then I need to
expose those APIs, so,
that's going to load
into my deploy package,
and then that gets Build on
a regular basis as well.
So, building into that
continuous integration type of
environment that people have been
doing with DevOps
for a long time.
The "Requirements Text
File" is the same.
So, I know my environments
are matching up.
But if you notice, when I did my
"Gunicorn," where I
just expose my API,
it was calling this
"Falcon Gateway."
So, the "Falcon Gateway"
is just some, again,
a lot of Boilerplate code
that's exposing and saying,
"Here's an info API".
It's going to say, "Well,
here's what this API is doing."
But now I have my "Post-API."
So, on-post. What
am I gong to do?
This is how you normally call
in RESTful APIs on the web.
So, whether you're hitting
other Microsoft APIs
whether cognitive services,
it's a same type of environment.
Send a post with some JSON,
get some JSON back.
That's what this is building out.
You'll see down here,
the same boilerplate
code where we were
saving the file off
in our train model,
now we're pulling it back
using the same account keys,
the same file names,
the same code.
So, as a data scientists,
I haven't touched anything yet.
Then, in the very end,
I'm going to call
this invoke_predict,
which is going to call
my model and pass
that raw JSON in.
My invoke_predict is in
this handle Python module.
This is some
really,really small code
right here, but really powerful.
Again, as a data scientist,
I don't want to have to worry
about setting all this up.
So, I'm coming in,
and all I'm working on is
setting these couple of
features right here.
So, what features do I want
to pull out, and what order?
After that, everything else
doesn't have to be touched.
So, model after model after
model that I'm building,
I'm touching very little.
I'm setting my
requirements and I'm
saying here are the features
that I want from my JSON.
>> So, it seems like
the beauty of this is
80 percent of the code
is boilerplate.
>> Eighty percent of
the code is boilerplate,
really allowing the
data scientists have focus on
the data and not have to focus
on how do I build my APIs,
how do I deploy them,
how do I build this out.
>> Right.
>> So, as data scientists,
at this point, I'm done.
I don't want to shake
in my code and I'm
want to go work with
some other data.
So, I checked in my code
into my build environment,
which is VS TS,
and I'm not a DevOps person.
So, I actually had some of
my DevOps friends help
me with the build,
which is what I would
expect the data
scientists to do as well.
Most of my customers are
doing the same thing.
So, here's my code
and just to show you,
here's my Docker file.
>> Here's all the code you
had locally checked in to-.
>> Here's the code
that I checked in.
You see, here's my blob account
key that I filled in,
here's my container,
here's the name of the file name
that I want to save.
Now, we erase this containers or
I'll erase race
my blob_key afterwards.
But just to show that,
yes, I've set this all up.
Then, after this, the build run.
So, I can see my builds,
and you see that they're
running on a regular basis.
Again, as a data scientist,
I don't have to worry
about, what do I need?
What are the Docker
commands to push out
containers or to
build my containers?
So now, I'm on DevOps
build environment.
I have two steps, builder
or run my train container.
So, this is going to
pull that Docker image
that I built that Docker file.
Pull that down from the code I
checked in, it's
going to build it.
It's going to push the image
to my container registry,
and then it's going
to run that image.
Running that image means,
run that Python script that
is going to train my model.
Once that successful, it's
going to build my deploy API.
So again, it's going
to build the image,
push it to my container registry.
But then, because I'm working
in an IaaS environment,
I'm actually going to connect to
my virtual machine
that's sitting in Azure,
connect with SSH and run
this command which is docker run.
I'm going to expose port 8080,
and my prediction model
with the Prediction API.
>> So, this is pretty
cool because with
a build pipeline like this,
you can really just configure
it to do whatever you want.
Maybe you want
your container registry
to be Azure Container Registry,
maybe you want a different
container registry,
you can just tweak those by
tweaking the build process.
>> Exactly, and if in the future,
as you go from IaaS to
Paas without having
to change anything.
All I have to do in
my build environment,
and now push to say
Azure Kubernetes service
instead of my IaaS VM.
So, without having to
change anything as
a data scientist at all,
I don't change my code.
I don't change my models,
I can now go to whatever
environment I need to.
>> Awesome, okay. So,
let's see this in action.
>> So now, this is
working. We're going to
go and open a web browser,
and the IP address for
my VM is 104210.11206.
>> So, this is going to be
running the deployed container.
>> Is running
the deployed container.
My model is stored in Azure Blob,
in the storage account.
The containers running here.
It's going to expose the info
and the predicts API endpoints.
So, let's just see.
Make sure it's running.
Here's running, this
is the code that I
had in that gateway.
>> By Python code.
>> Python code.
>> Yeah.
>> Now, to call
the predicts endpoint.
I have to call it
with a post command.
So, I'm going to bring
up an Ubuntu shell,
and I going to use curl.
So, curl post.
So, here's where I'm
calling my post,
and then I'm passing
in the sepal length,
sepal width, the petal length,
and petal width as JSON.
>> So, this was
like the vectors we
got from computer vision.
>> Exactly. Now passing those
into my endpoint predicts.
>> Okay, I see. So, it's
the same IP address,
but it's just the slash
predicts instead of slash info.
>> Instead of slash info.
You will see that it came back
with a simple JSON of one.
>> Which corresponds
to one of those kind.
>> It corresponds
to the virginica
as we were looking into
the three of them.
So, it's saying that with
that length and that width,
this is a virginica iris.
>> Okay. So, this is pretty
cool because we think
of data science and
machine learning
as this hugely complex thing,
and it can be complex.
But in this case, once we have
that trained model stick
in a container,
and it's so easy to call when
you do something like this.
>> Exactly, by using
this type of a framework,
it really allows a data
scientist to focus on training
and building models and not
have to focus on the APIs.
How do I get the APIs out
there or how do I scale my APIs?
We take advantage of what's
inherently possible
with containers,
but apply it to
the data science side.
>> What's also interesting is
even when I see
you've built here.
If someone just wanted
to use the deploy side,
because they have
their own process for how they
trained models, they
could do that too.
>> Exactly. I have a lot lot
of customers that do that,
where they will train
your models in Jupyter or data
science VMs or even
trainer models in Spark.
>> Yeah.
>> Then they want to deploy,
but they need a scalable
deployment model,
so, they use containers for that.
>> What good is training
a model if you don't have
an easy way to actually use it?
>> Exactly.
>> Okay, great.
So, this has been
highly informative.
So with that, this is
Steve Michelotti from
Azure government engineering,
with Philip Coachman
talking about
data science using containers
on Azure government.
Thanks for watching.
