[MUSIC PLAYING]
MAKOTO UCHIDA: Hello, everyone.
My name is Makoto,
a software engineer
in TensorFlow enterprise
as a part of Google Cloud.
Now that we have seen the
great story about TensorFlow
in production at work, and its
cool use cases even in space,
now I'm going to talk about the
enterprise-grade application
with TensorFlow Enterprise.
So what does it mean to
be TensorFlow enterprise?
What is so different?
What is so difficult?
Well, after talking
to many customers,
we have identified a
couple of key challenges
when it comes to
enterprise grade ML.
First is the scale
and the performance.
When it comes to production
grade enterprise applications,
oftentimes the size of
data, the scale of a model,
is beyond what fits into
my laptop or workstations.
As a result, we need to think
about this problem differently.
Second is the manageability.
When developing
business applications,
it is better to
not have to worry
about any nitty-gritty details
of infrastructure complexity,
including managing
software environment
and managing multiple machines
in clusters and what not.
Instead, it is
desirable to only have
to concentrate on the core
business logic of your machine
learning applications so
that it makes the most
benefit to your business.
Third is the support.
If your application is business
critical and mission critical,
timely resolution to the bugs
and issues and a commitment
to stable support
for applications
is essential to continue
operating your applications.
TensorFlow Enterprise brings a
solution to those challenges.
Let's take a look at the
cloud scale performance.
In a nutshell, with
TensorFlow Enterprise,
we compile and ship
a special build
of TensorFlow, specifically
optimized for Google Cloud.
It is purely based on the
open source TensorFlow,
but it also contains a
specialized optimization
specifically for
Google Cloud machines
that it services in the
form or patches and add-ons.
Let's take a look at how
it looks like in practice.
This code trained the model with
potentially very large training
data, then maybe a
terabyte of data,
maintained in Google
Cloud Storage.
As you see it, it
is no different
from any typical TensorFlow code
written with a data set APIs,
except the path to
the training file
is pointing to the
Google Cloud Storage.
Under the hood, the optimized
I/O reader specifically
made for Google Cloud Storage
is making this performant even
with terabyte of training
data, and it make
your training very performant.
This is another example
that used training data
from BigQuery table, which
is a data warehouse which
may maintain 100 millions
of business data--
data warehouse.
This example is a little
bit more involved, but still
similar to the standard
data set API that all of you
are familiar with, for that--
that your model can still
train in your familiar ways,
but under the hood, the
optimized I/O of a BigQuery
can read many, many
millions of rows in parallel
in an efficient way.
And it turns into Tensor
so that your training can
continue with the performance.
This is a little comparison
of the throughput when
large data is read from
Google Cloud Storage,
with or without optimization
that TensorFlow interface
brought in.
As you see it, there is
a nice throughput gain.
The better I/O throughput
performance actually
translates into the
better utilization
of processes such
a CPUs and GPUs
because I/O is no
longer the bottleneck
of the entire training.
What it means is your
training finishes faster
and your training
wall time is shorter.
As a result, your
cost of training
is actually cheaper,
because the complete cost
is proportionate
to the wall time
that you use the
compute resources.
Now that you have some ideas
about what kind of optimization
we were able to
make to TensorFlow,
specifically for
Google Cloud, let's see
how you actually get
it and how you actually
take the benefit of it.
We do this through
managed services.
We deliver TensorFlow Enterprise
through a managed environment
which we call our Deep
Learning Virtual Machine images
and Container images, where
all the environment is
pre-managed and pre-configured,
on top of standard Linux
distributions.
Most important is it has
TensorFlow Enterprise build
pre-installed, together
with all the dependencies,
including device drivers and a
dependency to Python packages
with correct
version combinations
or what not, as well
as configuration
to the other services
in Google Cloud.
Because this is just a
normal virtual machine
image and container
images, you can actually
deploy it in many
different ways in Cloud.
Regardless of where you deploy
it or how you deploy it,
the TensorFlow
Enterprise optimization
is just there, so you
can take the benefit
of all that good performance.
To get started, you only have to
pick the TensorFlow Enterprise
image and desired
resources such as CPUs
and RAMs, or optimized GPUs.
Start the virtual machine
in just one command.
In the next moment,
you can already
access existing machine that
has TensorFlow Enterprise build
already pre-installed
and pre-configured,
and it's ready to use so
that you can immediately
start writing your
code in the machine.
If you prefer a
notebook environment,
JupyterLab is hosted and already
stored in the VM, actually.
The only thing you
have to do is you only
have to point your
brother to the VM
and open up the JupyterLab
and open up a notebook
so that you can start
writing your TensorFlow code,
taking the benefit of
TensorFlow Enterprise.
Once you have the
satisfactory model
after many iterations
of experimentations,
now is the time to train
your model at the full scale.
It may not fit into
the one machine
and you may want to take
advantage of the distributed
training facility
that TensorFlow
offers so that it can support
the large scale of data
on the model.
For this, AI Platform
Training is a managed service
that takes care of the
distributed training
clusters and all
other infrastructure
complexities on behalf of you.
More importantly, it drives
the same TensorFlow Enterprise
container image,
which is exactly
the same environment
you have used
to build your actual model,
so you can be confident
that your model just trains
with full scale of data
under the managed
training service.
You simply need to overlay
your application code
on top of the TF
Enterprise container image,
then issue one command to start
a distributed training cluster.
This example is grabbing at 10
workers with larger machines
per each worker with 8 GPUs
attached to each worker
to train-- potentially
build a large data set
for your [INAUDIBLE]
applications.
This example brings up a
distributed training cluster
with all TensorFlow Enterprise
optimization included,
with 10 worker distributions.
Now that you can train your
model in a full enterprise
scale, it is the time to make
it an end to end pipeline
to continue running
it in production,
taking advantage of AI Platform
Pipelines and TensorFlow
Extended.
AI Platform
Pipelines is actually
hosted on the
community's engine,
but what this mean is it
can also drive exactly
the same TensorFlow
Enterprise container image,
so that all optimization
is still there,
and you can still be confident
that your application
in the pipeline
just runs because it
is all the same environment.
After end to end application
runs in production,
the Enterprise-grade
support becomes
essential to mitigate
any risk of interruption
in the operation
and also to continue
operating your application in
a business critical manner.
Our way to mitigate this risk
is to provide long-term support.
With open source
TensorFlow, we typically
offer one year of
maintenance window.
For TensorFlow Enterprise, we
provide three hours of support.
That includes critical bug
fixes and security patches.
And additionally
and optionally, we
may backboard certain
functionalities and features
from the future leaders of
TensorFlow as we see demand.
As of today, we have
TensorFlow Enterprise version
1.15 and 2.1 as our
long-term supported versions.
If your business is
pushing the boundary of AI
and if your business is sitting
at the cutting edge of AI,
where normal application
and use cases
are critical to
your business model,
and also your
business is heavily
relying on being able
to continue innovating
on this space, we actually
want to work with you
through the white-glove
service program.
We engineers and creators of
both TensorFlow and Google
Cloud are willing to work with
your engineers and your data
scientists to mitigate any
future bugs and issues that we
may not have seen yet to support
your cutting edge applications,
to unblock you and together
advance your applications, as
well as TensorFlow
and TensorFlow
Enterprise as a whole.
Please check out
the website and see
for detail of this
white-glove service program.
Looking ahead, we are
really excited to keep
working tightly together between
TensorFlow teams and Google
Cloud teams.
As being the creators and
experts and owners of the boss
products, we continue to
make the optimizations that
have implemented TensorFlow
for Google Cloud.
That includes better monitoring
and debugging capabilities
to your TensorFlow code
that runs in Cloud,
as well as integration
of this capability
into Google Cloud tooling
for the better productivity
of your applications.
We are also looking at
the smoother integration
between TensorFlow, popular
high-level APIs such as Keras
or Keras Tuner and
managed training services,
as well as even more
managed services,
such as a continuous
TensorFlow dev
for the purpose of coherent and
joyful developer experiences.
Please stay tuned.
This concludes my talk
about TensorFlow Enterprise.
For more information
and for the details,
please do check out the website.
Thank you very much.
[MUSIC PLAYING]
