
Chinese: 
（音乐）
嗨！大家好
首先，感谢大家莅临TensorFlow开发者峰会
其次，感谢大家在此停留多时
我知道今天过得很漫长
我们向你灌输了大量信息
我们已经宣布了很多信息，
但是还有更多的信息需要宣布
所以请大家听我继续说下去
我叫Clemens，这位是Raz
我们今天将聊聊Tensorflow Extended平台
在这之前，我想做个小调查
回答是的人举手示意一下好吗
你们之中有多少人在
研究或者学术环境进行机器学习？
好
人数不少
那你们之中多少人在生产环境进行机器学习？
好
看起来一半一半
很明显不少人两次都举了手
对于那些在生产环境进行机器学习的人来说
你们中有谁同意这个观点？

English: 
♪ (music) ♪
Hello, everyone.
First, thanks everyone for coming
to attend the Dev Summit.
And second, thanks
for staying around this long.
I know it's been a very long day.
And there has been a lot of information
that we've been throwing at you.
But we've got much, much more
and many more announcements to come.
So please stick with me.
My name is Clemens, and this is Raz.
We're going to talk about
TensorFlow Extended today.
But before we do this,
I'm going to do a quick survey.
Can I get a quick show of hands?
How many of you do machine learning
in a research or academic setting?
Okay.
Quite a big number.
Now how many of you do
machine learning in a production setting?
Okay.
That looks about half-half.
Obviously, also a lot of overlap.
So for those of you who do
machine learning in a production setting,
how many of you have
agreed with this statement?

English: 
Yeah? Some? Okay.
I see a lot of hands coming up.
So everyone that I speak with
who's doing machine learning
in production agrees with this statement:
"Doing machine learning
in production is hard," and it's too hard.
Because after all, we actually want
to democratize machine learning
and get more and more people
to allow them to deploy machine learning
in their products.
One of the main reasons why
it's still hard is because in addition
to the actual machine learning.
So this small orange box
where you actually use TensorFlow,
you may use Keras 
to put together your layers
and train your model.
You need to worry about so much more.
There's all of these other things
that you have to worry about
to actually deploy machine
learning in a production setting
and serve it within your product.
Now the good news
is that this is exactly
what TensorFlow Extended is about.
TFX in [inaudible] Google
is an [inaudible] machine learning
platform that allows our developers
to go all the way from data to production
and serving machine learning models
as fast as possible.

Chinese: 
是吗？有一些人？好
我看到有一部分人举了手
所以，所有参与讨论的
在生产环境进行机器学习的每个人
都同意以下观点：
“在生产界进行机器学习好难，”
而且真的太难了
因为，毕竟事实上，我们想支配机器学习
让越来越多的人将机器学习应用于产品中
它之所以困难的主要原因之一在于
除了实际机器学习剩下的部分
所以你使用Tensorflow的这个橙色的矩形方框
你可以使用Keras进行组织层
并训练模型
你需要操心的事远远不止这些
在生产环境中配置机器学习
并让其为你的产品服务
想做到这点，这里罗列了所有你要操心的事情
不过好消息是
TensorFlow恰恰是为了解决这个存在的
谷歌TFX是机器学习平台
这个平台能够让我们的开发者
尽可能提升数据到产品
及服务机器学习模型的速度

English: 
Now before we introduce TFX,
we saw that going through this process
of writing some of these components,
some of them didn't exist before.
Gluing them together
and actually getting to
a launch took anywhere
between six to nine months,
sometimes even a year.
Once we've deployed TFX
and allow developers to use it,
in many cases, people can use
this platform and get up and running
with it in a day and actually get to
a deployable model in production
in the order of weeks or in just a month.
Now, TFX is a very large
system and platform that consists
of a lot of components
and a lot of services
so unfortunately I can't talk about
all of this in the next 25 minutes.
So we're only going to be able to
cover a small part of it but we're talking
about the things that we've already
open sourced and made available to you.
First, we're going to talk
about TensorFlow Transform
and show you how to apply
transformations on your data

Chinese: 
现在...在我们介绍TFX之前
我们发现
编写一部分以前从未存在过的组件
将它们组合起来并发布
这一过程需要6到9个月
有些甚至需要一年
当我们配置好TFX，对开发者开放使用时
很多时候，人们早上起床后使用平台
让它运行一天，就能在数周甚至一个月内
得到可配置的生产模型
现在，TFX是一个极其庞大的系统和平台
拥有非常多的组件和服务
不幸的是我无法在接下来的25分钟内一一介绍它们
所以我们只会介绍其中的一小部分
介绍那些已经开放源代码，
并能供你使用的部分
首先，我们将谈谈TensorFlow Transform
并向你展示，如何在你的数据上

Chinese: 
进行训练和服务间的持续转换
接下来，Raz将向你介绍
我们正在开源的新产品--
TensorFlow模型分析工具
我们将演示这些部分
如何以端对端的形式一起工作
之后会就 TensorFlow Extended
宣布更多相关计划
并在社区分享
我们先来看看TensorFlow Transform
你在野生环境中看到的
典型的机器学习管线一般处于训练状态
通常会有分布式数据管线
来实现数据转换
因为一般情况下你要训练
并分配大量数据
而你在这条管线上
并且在把管线放置在训练器之前
将输入端具象化
在服务时间
我们需要寻找某种方法
在线重现那些精确的转换
当一项新的需求出现，
它需要被派送到你的模型

English: 
consistently between training and serving.
Next, Raz is going to introduce you
to a new product that we're open sourcing
called TensorFlow Model Analysis.
We're going to give a demo of how
all of this works together end to end
and then make a broader announcement
of our plans for TensorFlow Extended
and sharing it the community.
Let's jump into
TensorFlow Transform first.
So, a typical ML pipeline 
that you may see in the wild
is during training,
you usually have a distributed data
pipeline that applies transformations
to your data.
Because usually you train
in a large amount of data,
this needs to be distributed,
and you're on this pipeline
and sometimes materialize
the output before you actually
put it into your trainer.
Now at serving time,
we need to find a way to somehow
replay those exact transformations online.
As a new request comes in,
it needs to be sent to your model.

Chinese: 
想要做到这点面临着诸多考验
首先这是两个截然不同的代码路径
你用来批量处理的代码分布系统
与在实时转换数据中
用来向模型提出要求的
各种库和工具是非常不一样的
现在我们有两个不同的代码路径
其次，许多时候，
让这两个路径保持同步非常困难
我相信你们中的很多人看过这个
改变你的批量处理管线
并引入一种新的特性或更改它的表现方式时，
你需要采取某种方法
确保他们在你的生产系统中实际使用的代码
同时被更改，并保持同步
第三个问题在于，有时候你很想
在不同环境下配置TensorFlow机器学习模型
你想配置在移动设备上；
你想配置在服务器上；
你想配置在汽车上；
突然你面对了三个不同的环境
你需要在环境下进行转换，
或者也许你配置时需要使用不同的语言

English: 
There's a couple of challenges with this.
The first one is, usually those two
things are very different code paths.
The data distribution systems
that you would use for batch processing
are very different from the libraries
and tools that you would use to--
in real time transform data
to make a request to your model.
Now we have two different code paths.
Second, in many cases,
it's very hard to keep those two in sync.
I'm sure a lot of you have seen this.
You change your batch processing pipeline
and introduce a new feature or change
how it behaves and you somehow
need to make sure that the code
that they actually use in your
production system is changed
at the same time and is kept in sync.
The third problem is,
sometimes you actually want to deploy
your TensorFlow machine learning
model in many different environments.
You want to deploy it in a mobile device;
you want to deploy in a server;
maybe you want to put it on a car;
now suddenly you have
three different environments
where you have to apply
these transformations,
and maybe there's different languages
that you use for those,
and it's also very hard

English: 
to keep those in sync.
And this introduces something
that we call training serving skew,
where the transformations that you do
at training time may be different
from the ones in serving time,
which usually leads to bad quality
of your serving model.
TensorFlow Transform addresses this
by helping you write
your data processing job 
at training time,
so actually help you create those 
data pipelines to do those
transformations, and at the same time,
it emits a TensorFlow graph that can be
in line with your training model
and also your serving model.
Now what this does is,
it actually hermetically seals the model,
and your model takes
a raw data request as input,
and all of the transformations
are actually happening
within the TensorFlow graph.
This is a lot of advantages,
one of them is that you no longer
have any code in your serving
environment that does these

Chinese: 
并且想让它们保持同步也是难上加难
在这里我们介绍一个叫“训练服务偏度”
是指因转换训练时间与转换服务时间
存在的差异而导致
从而导致服务模型质量差的现象
TensorFlow Transform解决此问题的方法是
通过帮助你在训练时间进行编写数据处理工作
因而实际上帮助你创造这些数据管线实现转换
与此同时，
生成一个TensorFlow图表
此图表与你的训练模型和服务模型保持一致
它的运行原理实际上是模型密封
你的模型以原始数据请求作为输入端
然后所有的转换
都真实发生在TensorFlow图表中
这样做有很多好处
其中一个好处是

English: 
transformations because they're all
being done in the TensorFlow graph.
Another one is wherever you
deploy this TensorFlow model,
all of those transformations
are applied in a consistent way.
No matter where this
graph is being evaluated.
Let's see how that looks like.
This is a code snippet 
of a pre-processing function
that you would write with TF Transform.
I'm just going to walk you
through what happens here
and what we need to do for this.
First thing we do
is normalize this feature.
As all of you know, 
in order to re-normalize a feature
we need to compute the mean
and the standard deviation,
and to actually apply this transformation,
we need to subtract by the mean
and divide by the center of deviation.
So what has to happen is,
for the input feature X,
we have to compute these
statistics which is a trivial task.
If the data fits into a single
machine, you can do it easily.
It's a non-trivial task if you have
a gigantic training data set
and actually have to compute
these metrics...
...effectively.

Chinese: 
进行转换的服务环境不再有任何代码
因为转换都在TensorFlow图表中完成
另一个好处在于
无论你在哪里配置该TensorFlow模型
无论在何处评估此图表
所有的转换都会持续性的被应用
让我们看看它的样子
这是预处理功能的代码片段
用TF Transform编写而成
我将大致介绍会发生什么
以及我们应该如何处理
首先我们要做的是特性标准化
你们都知道，为了重整特性
我们需要计算平均值及标准差
为了真正实施转换，我们要消去平均
并分离标中心偏差
因此必然会发生的是，
为了输入特性X，我们需要计算这些数据
如果数据符合单一的机器，做起来非常容易
这项任务小菜一碟
但当你有一个巨大的训练数据装置
需要高效计算这些指标时
这项任务就不再微不足道了

English: 
Once we have these metrics
we can actually apply this transformation
to the feature.
This is to show you that the output
of this transformation can then be,
again, multiplied with another tensor--
which is just a regular
TensorFlow transformation.
And then in order to bucketize a feature,
you also again need to compute
the bucket boundaries to actually
apply this transformation.
And again, this is a distributed data job
to compute those metrics for the result
of an already transformed feature.
This is another benefit to then 
actually apply this transformation.
The next examples just show you
that in the same function it can apply
any other tensor in tensor [inaudible]
function and there's also some
of what we call mappers in TF transform
that don't require this analyze phase.
So, N-grams doesn't require us
to actually run a data pipeline
to compute anything.
Now what happens here 
is that these orange boxes
are what we call analyzers.

Chinese: 
当我们拥有这些指标时
我们就能实施特性转换
这向你展示了此转换的输出端能够，
再次，与其他张量成倍增加--
而这只是一次TensorFlow常规转换
然后，为了将特性桶分化
需要再次计算存储空间界线
从而真正实施此转换
再次说明，这是一项分布式数据工作
以特性转换为目的进行计算指标
而这也是转换的另一个好处
下面的这个例子将向你展示在相同函数里
它能在张量函数里应用其他张量
而且这里还有一部分无需进行此分析阶段的、
存在于TF transform的我们所说的映射层
因此， N-grams语言模型不需要我们
实际运行数据管线来进行任何运算
这里正在发生的事情是
图中这些橙色的矩形方框
就是我们所说的分析器

Chinese: 
我们意识到它们作为真实的数据管线，
在你的数据上计算这些指标
通过 Apache Beam平台执行
稍后我们将介绍更多相关信息
但这允许我们从真正意义上
在不同环境下运行数据分布管线
Apache Beam拥有多个运行器
所有的转换仅仅使用纯粹的TensorFlow代码
进行实例到实例的转换
当你运行TensorFlow时
实际上是我们在执行分析阶段
运算分析阶段的结果
并将结果作为常量嵌入TensorFlow图表中---
如右图所示--在这张图表中
这是一张TensorFlow封闭图，适用于所有转换
它能与你的服务图表内联
所以现在你的服务图表
涵盖了转换图表作为其一部分
无论你在哪种环境下配置此TensorFlow模型
服务图表都能完成所有的转换

English: 
We realize those as actual data pipelines
that compute those metrics over your data.
They're implemented using Apache Beam.
And we're going to talk
about this more later.
But what this allows us to do is actually
run this distributor data pipeline
in different environments.
There's different runners
for Apache Beam.
And all of the transforms are just simple
instance to instance transformations
using pure TensorFlow code.
What happens when you
run TensorFlow Transform
is that we actually run these
analyze phases,
compute the results
of those analyze phases,
and then inject the result 
as a constant in the TensorFlow graph--
so this is on the right--
and in this graph,
it's a hermetic TensorFlow graph
that applies all the transformations,
and it can be in-lined
in your serving graph.
So now your serving graph
has the transform graph
as part of it and can play through
all of these transforms
wherever you want to deploy
this TensorFlow model.

English: 
What can be done
with TensorFlow Transform?
At training time for the batch processing,
really anything that you can do
with a distributed data pipeline.
So there's a lot of flexibility here
with types of statistics you can compute.
We provide a lot 
of utility functions for you,
but you can also 
write custom data pipelines.
And at serving time because we generate
a TensorFlow graph that applies
these transformations-- 
we're limited to what you can do
with a TensorFlow graph, 
but for all of you who know TensorFlow,
there's a lot of flexibility
in there as well.
Anything that you can do
in a TensorFlow graph,
you can do with your transformations.
Some of the common use cases
that we've seen, the ones on the left
I just spoke about, you can scale
a continuous value to the C-score
which is minimalization 
or to a value between 0 and 1.
You can bucketize a continuous value.
If you have text features,
you can apply Bag of Words or N-grams,
or for feature crosses,
you can actually cross

Chinese: 
利用TensorFlow Transform能做到什么呢？
在批处理的训练时间中
你可以用分布式数据管线做任何事
所以可供你运算的数据类型有很多选择
我们向你提供了多种效用函数
但你也可以编写自定义数据管线
并且，因为在服务时间我们生成了
适用这些转换的TensorFlow图表
我们被你对图表的处理内容所限
但对于所有知道TensorFlow的你们来说，
可以自由发挥的空间还是很大的
所有你能对TensorFlow图表进行的操作
同样能够施用于你的转换
在一些我们曾经见过的用例，
就是左手边我刚刚提及过的用例中，
你可以测量C-score的连续值（极小值）
或测量0到1区间内的值
你可以将一个连续值桶分化
如果你有文本特征，
你可以运用Bag of Words或者N-grams
或者特征叉乘

English: 
those strings and then generate
vocabs of the result of those crosses.
As mentioned before, 
TF Transform is extremely powerful
in actually being able to chain together
these transforms so you can apply
transform under result
of a transform and so on.
Another particular interesting
transform is actually applying
another TensorFlow model.
You've heard about the saved model before?
If you have a saved model that
you can apply as a transformation,
you can use this until you've transformed.
Let's say you have an image
and you want to apply
an inception model as it transforms
and then use the output of that
inception model maybe to combine it
with some other feature
or use it as an input feature
to your model.
You can use any other
TensorFlow model
that ends up being in-lined
in your transform graph
and also in-lined in your serving graph.
All of this is available today
and you can go check it out
on github.com/tensorflow/transform.
With this I'm going to hand it
over to Raz who's going to talk
about TensorFlow Model Analysis.
Alright, thanks Clemens.
Hi, everyone.

Chinese: 
你就能组合那些字符串，并生成因组合形成的vocabs
我们提到过，
TF Transform能强有力的
将这些transform连接起来
让你能在transform后继续应用transform，周而复始
另一个极其有意思的transform变形属性是
应用另一个TensorFlow模型
你们之前听说过保存模型吗
如果你有作为转换进行应用的保存模型
在transform前你都能一直使用它
比如说你有个image对象
你想在它转换的时候应用inception模型
使用这个inception model的输出器
与其他特性相结合
或是把它当做你的模型的输入特性
你可以使用
任何结束在transfrom图表以及serving图表中
内联的任何TensorFlow模型
现在所说的这一切你都能使用
请登录github.com/tensorflow/transform
接下来我将把话筒递给Raz
他将向我们介绍TensorFlow模型分析工具
好的，谢谢Clemens
嗨，大家好

English: 
I'm really excited to talk about
TensorFlow Model Analysis today.
We're going to talk
a little bit about metrics.
Let's see, next slide.
Alright, so we can already
get metrics today right?
We use TensorBoard.
TensorBoard's awesome.
You saw an earlier presentation
today about TensorBoard.
It's a great tool--
while you're training,
you can watch your metics, right?
If your training isn't going well,
you can save yourself
a couple of hours of your life, right?
Terminate the training, fix some things...
Let's say you have 
your trained model already.
Are we done with metics? Is that it?
Is there any more to be said
about metics after we're done training?
Well, of course, there is.
We want to know how well
our trained model actually does
for our target population.
I would argue that we want to
do this in a distributed fashion
over the entire data set.
Why wouldn't we just sample?

Chinese: 
我很高兴今天我们能谈谈
TensorFlow模型分析工具
我们将聊一聊metrics
让我们瞧瞧...下一张...
好了，现在我们已经能够得到metrics了对吗？
我们用的是Tensorboard
TensorBoard棒极了
就在今天早些时候
你已经看过TensorBoard的介绍了
这是个很棒的工具--
当你训练时，你能观测你的metrics对吧
如果训练不顺利
你能为自己省下你人生中的几小时对吧
终止训练，改改东西...
假设你现在已经有了自己的训练模型
我们已经处理完metrics了吗？这样就行了？
当我们完成训练后，
关于metrics我们还想说点什么吗？
嗯，当然有
我们想知道训练模型
对于标的群体有什么好处，对吧
我想说的是我们想要
对整个数据组以分布式方式训练
为什么我们不取样？

Chinese: 
为什么我们不去省点时间，对吧
取样让事情变得更快速、更容易
假设你从一个庞大的数据组开始
你现在要切分数据集
你会说，“我想在中午观测用户。”
这是个特性
在我的家乡芝加哥
有这样一个特别的装置
每一个切片都会通过一个因子
减小你的评价数据集的规模
这是指数递减
当你在观测一组...特殊用户体验
你得不到非常多的信息
你的衡量指标的误差条会很巨大
我的意思是，你怎么知道
在这时候，噪音不会超过信号呢，对吧
所以在开始切分时你应该先着手于数据集
让我们来聊聊一个特定的metric
我不太确定...

English: 
Why wouldn't we just save
more hours of our lives, right?
And just sample,
make things fast and easy.
Let's say you start with a large data set.
Now you're going to slice that data set.
You're going to say, "I'm going 
to look at people at noon time."
Right? That's a feature.
>From Chicago, my hometown.
Running on this particular device.
Each of these slices reduce the size
of your evaluation dataset by a factor.
This is an exponential decline.
By the time you're looking at
the experience for a particular...
...set of users, you're not
left with very much data.
And the error bars on your
performance measures, they're huge.
I mean, how do you know that
the noise doesn't exceed your signal
by that point, right?
So really you want to start
with your larger dataset
before you start slicing.
Let's talk about a particular metric.
I'm not sure--

Chinese: 
你们谁听说过ROC曲线？
这在机器学习领域还算是“未知”事物呢
（笑声）
好，我们有自己的ROC曲线
接下来我将谈谈一个
你们可能熟悉也可能不熟悉的概念
那就是机器学习公平性
所以什么是公平？
公平是一个极其复杂的话题
公平基本上是指我们的机器学习模型
对群体不同分段有什么好处
你不仅仅只有一个ROC曲线
每个分段都会有一个ROC曲线
每组用户也都会有一个ROC曲线
在座的各位有谁
是基于顶线metrics经营业务的？
没人吗？好吧，真是疯狂
你需要切分你的metrics，
你需要开始研究并弄明白
导致最上面的黑色曲线所代表的幸运用户体验极佳
而蓝色曲线代表的不幸运的用户体验欠佳的原因

English: 
Who's heard of the ROC Curve?
It's kind of an unknown thing
in machine learning these days.
Okay.
We have our ROC Curve,
and I'm going to talk about a concept
that you may or may not be familiar with
which is ML Fairness.
So what is fairness?
Fairness is a complicated topic.
Fairness is basically how well
does our machine learning model do
for different segments
of our population, okay?
You don't just have one ROC Curve,
you have an ROC Curve for every segment.
You have an ROC Curve
for every group of users.
Who here would run their business
based on their top line metrics?
No one! Right? That's crazy.
You have to slice your metrics;
you have to go in and dive in
and find out how things
are going so that lucky user,
that black curve
on the top, great experience.
That unlucky user, the blue curve?

Chinese: 
什么时候我们的模型会对不同类型的人不公平对待呢？
其中之一是你没有足够多的数...数据
支持你得出结论，对吗
我们使用随机优化器
并且如果我们重新训练模型
每次的结果都会稍有不同，对吧
你会从一些用户获得大的方差
仅仅因为数据不够充足
我们可以把来自多种数据源的数据组合起来
某些数据源和其它比较存在偏差
因此一些用户只能受到不公平的对待
而另一些用户能得到理想的体验
我们的标签会出错的，对吗
这些事情都可能会发生
因此我们有了TensorFlow模型分析工具
你现在看到的是
Jupyter Notebook承载的UI界面
X轴代表了损失
你能看到指标中有自然生成的偏差

English: 
Not such a great experience.
When can our models be
unfair to various users?
One instance is if you simply
don't have a lot of data
from which to draw your inferences.
Right?
We use Stochastic optimizers,
and if we re-train the model,
it does something different
every time, slightly.
You're going to get a high variance
for some users just because
you don't have a lot of data there.
We may be incorporating data
from multiple data sources.
Some data sources are more
biased than others.
So some users just get
the short end of the deal, right?
Whereas other users 
get the ideal experience.
Our labels could be wrong. Right?
All of these things can happen.
Here's TensorFlow Model Analysis.
You're looking here at the UI hosted
within a Jupyter Notebook.
On the X-axis, we have our loss.
You can see there's some
natural variance in the metrics.

English: 
We're not always going to
get spot on the same precision
and recall for every
segment of population.
But sometimes you'll see...
what about those guys
at the top there experiencing 
the highest amount of loss?
Do they have something in common?
We want to know this.
Sometimes our users that...
...get the poorest experience,
they're sometimes 
our most vocal users, right?
We all know this.
I'd like to invite you 
to come visit ml-fairness.com.
There's a deep literature about
the mathematical side of ML Fairness.
Once you've figured out how
to measure fairness,
there's a deep literature
about what to do about it.
How does TensorFlow Model Analysis
actually give you these sliced metrics?
How did you go about 
getting these metrics?
Today you export 
a saved model for serving.
It's kind of a familiar thing.
TensorFlow Model Analysis is simple.

Chinese: 
你懂得，我们没办法总是调整准确
回溯群体的所有段
但有时候你会发现...
那些处于最高点经历最大损失的人呢？
他们有什么共同点吗？
我们想弄清楚
有时候我们的用户会...
会得到最差的经历
他们通常也是最活跃的用户，对吧？
我们都懂的
所以我想邀请你登录ml-fairness.com
上面有一本非常深奥的文学作品
从数学的角度解读机器学习公平性
当你明白如何测量公平性
这本书也会告诉你如何进行测量
TensorFlow模型分析工具
如何真正给你已切分的指标？
如何...如何获得这些指标？
嗯...你导出一个保存模型用来服务
这很常见
TensorFlow模型分析工具很简单

Chinese: 
上手简单，操作方式与此类似
你导出一个保存模型来评估
这些模型有何不同？
为什么要导出两个呢？
嗯，作为保存模型经我们序列化的评估图表
有一些额外的注解
允许我们评估批量评估作业
寻找到特性，寻找到预测，寻找到标签
但我们不希望这些东西和服务图表相混淆
因此你要导出第二个
这是我们的Github网址
我记得我们在昨夜凌晨四点刚刚开放
大家可以登陆看看
我们内部已经使用了一段时间
现在它也向外部开放
Guihub上有个样例
将它组合好后
你就能在本地机器上
尝试我们刚刚讲过的所有组件
不必需要在别处获得账号
只需要克隆，运行脚本
再运行code lab就行了
这是芝加哥打车样例

English: 
As it's simple, it's similar.
You export a saved model for evaluation.
Why are these models different?
Why export two?
Well the eval graph that 
we serialize as a saved model
has some additional annotations
that allow our evaluation batch job
to find the features, 
to find the prediction, to find the label.
We don't want those things mixed in
with our serving graphs
so you export a second one.
So this is the GitHub.
We just opened it, I think
last night at 4.30 pm.
Check it out.
We've been using it internally
for quite some time now.
Now it's available externally as well.
The GitHub has an example
that kind of puts it all together
so that you can try all these components
that we're talking about
from your local machine.
You don't have to get
an account anywhere.
You just get cloned
and run the scripts
and run the code lab.
This is the Chicago Taxi Example.

English: 
So we're using public data from--
publicly available data
to determine which riders 
will tip their driver
and which riders, shall we say,
don't have enough money to tip today.
What does fairness mean in this context?
So our model is going 
to make some predictions.
We may want to slice these
predictions by time of day.
During rush hour we're going 
to have a lot of data so hopefully
our model's going to be fair
if that data is not biased.
At the very least it's not
going to have a lot of variance.
But how's it going to do
at 4 a.m. in the morning?
Maybe not so well.
How's it going to do when the bars close?
An interesting question.
I don't know yet,
but I challenge you to find out.
So this is what you can run
using your local scripts.
We start with our raw data.

Chinese: 
所以我们使用公众数据
这些数据来源于公开可用数据
知道哪些乘客会...给司机付小费
哪些乘客，这么说吧
没有足够的钱付小费
所以在这个情境中公平性意味着什么？
所以我们的模型要做出一些预测
我们可能想按日期切分预测结果
在高峰期我们会获得大量数据
所以希望数据不会出现偏差，
我们的模型会公平
至少不会出现大量的偏差
但如果是在凌晨4点它的进展又如何呢？
也许并不怎么好
当夜深人静酒吧都关门的时候
它的表现如何呢？
这是个有趣的问题
我还不太知道，
但我希望你能挑战一下，寻找出答案
好啦，这些你都可以利用本地脚本来运行
我们从原始数据开始
先运行TF Transform

Chinese: 
TF Transform生成transform函数
和转换好的样例
然后我们进行模型训练
我们的模型再次生成两个保存模型，
按照我们说过的那样
一个用做服务，另一个用做评估
而且这些我们都是在本地运行，
只是运行脚本，随便玩玩
Clemens聊了聊关于转换的东西
在这里我们看到如果我们想获取密集特征点
并把它们缩放进一个特殊的Z-Score
但是我们又不想一批一批进行
因为每一批的平均值将会有所不同
这样会产生波动
而我们想作用在整个数据集上
我们想把整个数据集标准化
我们建立一个词汇表；囊括模型的绝大部分
并在训练器生成transform函数后运行
今天早先时候你已经听过TF评估器
这是一个范围更广更深层次的评估器
它能获得已转换的特性

English: 
We run the TF Transform;
the TF Transform emits
a transform function
and our transformed examples.
We train our model.
Our model, again, emits two
saved models as we talked about.
One for serving and one for eval.
And we try this all locally,
just run scripts and play with the stuff.
Clemens talked 
a little bit about transform.
Here we see that we want 
to take our dense features,
and we want to scale them
to a particular Z-Score.
And we don't want to do that
batch by batch
because the mean for each batch
is going to differ,
and there's going to be fluctuations.
We may want to do that
across the entire data set.
We may want to normalize
these things across the entire data set.
We build a vocabulary; we bucket
for the wide part of our model,
and we emit our transform function,
and into the trainer we go.
You heard earlier today
about TF Estimators,
and here is a wide and deep estimator
that takes our transformed features

English: 
and emits to saved models.
Now we're in TensorFlow Model Analysis,
which reads in the saved model
and runs it against all of the raw data.
We called render slicing metrics
from the Jupyter Notebook,
and you see the UI.
The thing to notice here
is that this UI is immersive, right?
It's not just a static picture
that you can look at and go,
"Huh" and then walk away from.
It lets you see your errors broken down
by bucket or broken down by feature,
and it lets you drill in
and ask questions
and be curious about how your models
are actually treating various subsets
of your population.
Those subsets may be
the lucrative subsets
you really want to drill in.
And then you want to serve
your models so our demo--
our example has a one-liner here
that you can run to serve your model.

Chinese: 
并生成保存模型
现在我们在TensorFlow模型分析工具
它能在保存模型中读取
并在不受原始代码影响下运行
我们 从Jupyter Notebook
调用随机切分指标
你就能看到用户界面
值得注意的是这里是沉浸式的用户界面，对吧
它不仅仅是一个
你看了以后表达一句“好吧”
然后转身离开的静态图片
它让你看到你的错误
被存储空间或是特性划分
它会让你练习，提出问题
让你好奇你的模型是如何
处理群体中各种各样的子集的
这些子集也许是
你非常想要训练的有利可图的子集
然后你想要服务你的模型
所以我们的演示版--
我们的样例有一个one-liner
可运行去服务...服务你的模型

English: 
Make a client request--
the thing to notice here
is that we're making
a GRPC request to that server.
We're taking our feature
tensors, we're serializing them
into the GRPC request,
sending them to the server
and back comes probability.
But that's not quite enough, right?
We've heard a little bit
of feedback about this server.
The thing that we've heard
is that GRPC is cool,
but REST is really cool.
I tried.
This is actually one
of the top feature requests
on GitHub for model serving.
You can now pack your tensors
into a JSON object,
send that JSON object to the server
and get a response back to [inaudible].
Much more convenience
and I'm very excited to say

Chinese: 
并且做一个客户机请求--
这里需要注意的是
我们正在向服务器提出gRPC的请求
我们会获取特性张量
将其序列化后放入gRPC请求，
发送至服务器并得到概率的反馈
但这些是不够的，对吗
我们收到过关于这个服务器的一些反馈
我们听说，gRPC很棒
但是，REST是真的棒
（笑声）
好吧，所以...我试过
所以
（笑声）
所以这是真正意义上在Github
关于模型服务排名靠前的特性请求之一
你现在可以在JSON object里打包张量
将这个JSON object发送至服务器并得到响应
非常便利

Chinese: 
我很开心的说我们将很快发布
很快
（笑声）
我看到有些人很兴奋哦
再来谈谈端对端
没错，你可以在本地机器上
端对端尝试所有部件
因为它们使用的是Apache Beam直接运行器
而直接运行器允许你
获取分布作业并在本地运行
现在，如果你和Apache Beam数据流运行器交换数据
你就能在云端运行整个数据集
样例也向你展示了
如何在云端版本运行大内存的作业
目前我们同社区一起
为Apache Flink和Spark各组建一个运行器
请持续关注TensorFlow的博客
以及我们的GitHub网站...
你会在上面发现样例

English: 
that it'll be released very soon.
Very soon.
I see the excitement out there.
Back to the end to end.
You can try all of these pieces
end to end all on your local machine.
Because they're using Apache Beam
direct runners, and direct runners
allow you to take your distributive job
and to run them all locally.
Now if you swap in
Apache Beam's data flow runner,
you can now run against
the entire data set in the cloud.
The example also shows you
how to run the big job
against the cloud version as well.
We're currently working
with a community to develop
a runner for Apache Flink,
a runner for Spark.
Stay tuned to the TensorFlow blog
and to our GitHub...

Chinese: 
网址是tensorflow/model-analysis
再次请出Clemens
谢谢Raz
（掌声）
好啦，我们了解了Transform
我们了解了如何训练模型，如何使用模型分析
以及如何服务它们
但我听到你们说想了解更多
对吗？刚刚的这些难道就够了吗？
你想要更多？好吧
我知道你想要更多
而且我知道你为什么想知道很多
也许你读过去年我们在KDD上发表并展示的
关于TensorFlow Extended的论文
在论文里我们大体陈述了
在Google框架内此平台的运行机制、
此平台的所有特性
以及使用它对我们带来的影响
图表一上方框里的内容描述了
TensorFLow Extended究竟是什么
尽管，看上去过于简化，它的内容仍然要比
我们今天讨论过的多得多

English: 
...and you can find the example
at tensorflow/model-analysis
and back to Clemens.
Thank you, Raz.
(applause)
Alright, so we've heard
about Transform.
We've heard how to train models,
how to use model analysis
and how to serve them.
But I hear you say you want more.
Right? Is that enough?
You want more? Alright.
You want more.
And I can think of why you want more.
Maybe you read the paper
we published last year and presented
at KDD about TensorFlow Extended.
In this paper we laid out
this broad vision of how
this platform works within Google
and all of the features that it has
and all the impact 
that we have by using it.
Figure one, which allows
these boxes and describes
what TensorFLow Extended actually is.
Although, overly simplified,
this is still much more
than we've discussed today.
Today, we spoke about
these four components

English: 
of TensorFlow Extended.
Now it's important to highlight
that this is not yet an end to end
machine learning platform.
This is just a very small piece of TFX.
These are the libraries
that we've open-sourced
for you to use.
But we haven't yet
released the entire platform.
We're working very hard
on this because we've seen
the profound impact 
that it had internally--
how people could start
using this platform
into applying machine learning
in production using TFX.
And we've been working
very hard to actually make
more of these components available to you.
So in the next phase, we're actually
looking into our data components
and looking to make those
available to users
that you can analyze your data,
visualize the distributions,
and detect anomalies
because it's an important part
of any machine learning pipeline
to detect changes and shifts
in your data and anomalies.
After this we're actually looking
into some of the horizontal pieces
that helped tie all of these
components together

Chinese: 
今天我们只讨论了
TensorFlow Extended的这四部分内容
有必要强调的是
这还不是一个端对端的机器学习平台
这只是TFX的一小部分内容
这些是我们已开发源代码的库
你可以尽请使用
但我们还没有发布完整的平台
我们努力攻克这个难题
因为我们见识过它在内部产生的深远影响--
人们如何开始使用这个平台
使用TFX在生产中应用机器学习
而且我们一直在努力让
绝大部分的部件能够开放使用
所以，在下个阶段，我们将研究数据部件
以及探究如何向用户开放使用
这样你就能分析你的数据，将分布可视化
并监测异常
因为监测异常是任何机器学习管线的重要部分
监测数据变化和转换及异常
在这之后我们将研究一些
能帮助连接所有组件的横向部件

English: 
because if they're only
single libraries, you still have
to glue them together yourself.
You still have to use them individually.
They have well-defined interfaces,
but you still have to combine them
by yourself.
Internally we have a shared 
configuration framework that allows you
to configure the entire pipeline
and a nice integrated fountain
that allows you to monitor
the status of these pipelines
and see progress and inspect
the different artifacts
that have been produced
by all of the components.
So this is something 
that we're also looking to release
later this year.
And I think you get the idea.
Eventually we want to make
all of this available to the community
because internally, 
hundreds of teams use this
to improve our product.
We really believe that this
will be as transformative
to the community
as it is at Google.
And we're working very hard
to release more of these technologies
into the entire platform
to see what you can do
with them for your products
and for your companies.
Keep watching the TensorFlow blog posts
for a more detailed announcement

Chinese: 
因为如果他们只是独立的库
你还需要亲自把它们连接起来
而且你需要独立使用
它们有清晰的界面
但你还是需要亲自把它们组合起来
在内部我们有一个共享的配置框架
允许你配置整个管线，
以及一个集成数据源
允许你监测这些管线状态
观测进展并检查
所有组件产生的不同工件
所以这也是我们希望在今年下半年发布的东西
而且我相信你大致有了一个概念
最终我们希望这些都能社区开发使用
因为在内部，
成百上千的小组在使用这个平台
来提升我们的产品
我们真心相信，它能像在谷歌一样
在社区带来变革
我们正在整个平台努力发布这些技术
想看看你们能用它们
为你的产品甚至公司做出什么
请密切关注TensorFlow博客内容

English: 
about TFX and our future plans.
And as mentioned, you can already use
some of these components today.
Transform is released.
Model Analysis was 
just released yesterday,
Serving is also released,
and the end-to-end example is available
under the shortlink and you can find it
on the model analysis [inaudible].
So with this, thank you
from both myself and Raz,
and I'm going to ask you
to join me in welcoming
a special external guest, Patrick Brand,
who's joining us from Coca-Cola,
who's going to talk
about applied AI at Coca-Cola.
Thank you.
(applause)
♪ (music) ♪

Chinese: 
了解更多TFX和我们的未来计划的详细内容
就像刚刚提及的那样
你今天就可以使用部分组件了
TF Transform已经发布
模型分析工具刚刚在昨天发布
服务器也已经发布
另外，端对端样例可详见下方链接
请在模型分析目录下查询
以上就是全部内容
我和Raz衷心感谢你的倾听
我将请你加入我来一同欢迎我们的特邀嘉宾
来自可口可乐公司的
Patrick Brand先生加入讨论
他将向大家介绍可口可乐公司的AI应用情况
谢谢大家
（掌声）
（音乐）
