
Chinese: 
大家好！很高兴又见面了。
我的业余爱好之一是演奏音乐，大家在课程开始前听到的
那一小段莫扎特的曲子
就是我和三个朋友演奏的单簧管弦乐四重奏。
我在一个交响乐团演奏的，昨天晚上我演奏了一些爵士三重奏。
如果你想听，可以在Google搜到我的个人主页。
输入我的名字，Ian Witten。
你会看到我的个人主页，而且每次你访问的时候，我都会为你演奏一段。

English: 
Hi! Good to see you again.
One of the things I like to do in my time
is play music, and that little bit of Mozart
you hear at the beginning of these videos,
that's me and three friends playing a clarinet quartet.
I play in an orchestra, and last night I was
playing some jazz with a little trio.
If you want to hear us play, if you go to
Google and just find my home page.
Type my name, Ian Witten.
You'll get me here, and every time you visit
this page, I'll play you a tune.

Chinese: 
如果你刷新页面，我会演奏另一段。
这就是我。
不管怎样，这不是我们课程的内容。
我们开始学习2.6，关于交叉验证结果。
我们上节课，学习了交叉验证的原理。
我说过交叉验证是一种比重复预留法更好的
评估机器学习算法的方法。
交叉验证会评估10次。
你可以运行10次预留法，
但是交叉验证更好。
让我们做一个简单的实验。
打开Weka，载入diabetes数据集。

English: 
If you refresh the page, I'll play you another
tune.
That's what I do.
Anyway, that's not what we're here for.
We're here to talk about Lesson 2.6, which
is about cross-validation results.
We learned about cross-validation in the last
lesson.
I said that cross-validation was a better
way of evaluating your machine learning algorithm,
evaluating your classifier, than repeated
holdout, repeating the holdout method.
Cross-validation does things 10 times.
You can use holdout to do things 10 times,
but cross-validation is a better way of doing things.
Let's just do a little experiment here.
I'm going to start up Weka and open the diabetes
dataset.

English: 
The baseline accuracy, which ZeroR gives me --
that's the default classifier, by the way, rules/ZeroR --
if I just run that, well, it will evaluate it
using cross-validation.
Actually, for a true baseline, I should just
use the training set.
That'll just look at the chances
of getting a correct result if we simply guess
the most likely class, in this case 65.1%.
That's the baseline accuracy.
That's the first thing you should do with
any dataset.
Then we're going to look at J48, which is
down here under trees.
There it is.
I'm going to evaluate it with 10-fold cross-validation.
It takes just a second to do that.
I get a result of 73.8%, and we can change
the random-number seed like we did before.

Chinese: 
基线精确度，即ZeroR的精确度
（顺便提一下，rules/ZeroR是默认的分类器）
如果我们运行它，它用交叉验证法进行评估。
实际上，为了得到真实的基线，我们应该用训练数据集。
那就是我们简单猜测最有可能的分类所能达到的
正确率，在这个例子中是65.1%。
这就是基线精确度。
对于任何数据集，这是你要做的第一件事情。
我们在树分类器的目录下寻找J48。
在这里。
用10层交叉验证评估J48。
做这些只用了一秒钟。
结果是73.8%，我们可以像之前一样改变随机种子。

Chinese: 
默认值是1，改为2。
再次运行。
结果是75%。
再试一次。
随机种子改为3，当然，我们可以输入任何值。
再运行，结果是75.5%。
幻灯片上这些数值是10个不同的随机种子的结果。
幻灯片上右手边那些是我得到的10个精确度，
73.8%，75.0%，75.5%，等等。
我们计算这些精确度的均值，得到74.5%，
计算样本标准差，
用以前用过的公式，得到0.9%，。
在我们计算预留法的均值和标准差之前，
我们先重复使用预留法10次。
这些是运行结果，如果你重复预留，即用90%的数据做

English: 
The default is 1; let's put a random-number
seed of 2.
Run it again.
I get 75%.
Do it again.
Change it to, say, 3; I can choose anything
I want, of course.
Run it again, and I get 75.5%.
These are the numbers I get on this slide
with 10 different random-number seeds.
Those are the same numbers on this slide in
the right-hand column, the 10 values I got,
73.8%, 75.0%, 75.5%, and so on.
I can calculate the mean, which for that right-hand
column is 74.5%,
and the sample standard deviation,
which is 0.9%, using just the same formulas
that we used before.
Before we use these formulas for the holdout
method,
we repeated the holdout 10 times.
These are the results you get on this dataset,
if you repeat holdout, that is using 90% for

Chinese: 
训练和10%的数据做测试，就像我们在10层交叉验证中做的一样。
我们将得到这些结果，求均值，会得到74.8%，
这和74.5%足够接近，但是我得到了一个更大的标准差。
相比较交叉验证的0.9%，4.6%大多了。
现在，你也许会问自己为什么用10层交叉验证。
使用Weka，我们可以用20层交叉验证或者其他值，我们只需在
cross-validation旁输入我们想要的份数。
所以，我们是可以用20层交叉验证的。
那样会将数据集分为20等份
并且重复20次。
拿出一份（做测试），用剩余95%的数据做训练，然后第21次
使用整个数据集。

English: 
training and 10% for testing, which is, of
course, what we're doing with 10-fold cross-validation.
I would get those results there, and if I
average those, I get a mean of 74.8%, which
is satisfactorily close to 74.5%, but I get
a larger standard deviation, quite a lot larger
standard deviation of 4.6%, as opposed to
0.9% with cross-validation.
Now, you might be asking yourself why use
10-fold cross-validation.
With Weka we can use 20-fold cross-validation
or anything, we just set the number folds
here beside the cross-validation box to whatever
we want.
So we can use 20-fold cross-validation.
What that would do would be to divide the
dataset into 20 equal parts
and repeat 20 times.
Take one part out, train on the other 95%
of the dataset, and then do it a 21st time
on the whole dataset.

English: 
So, why 10, why not 20?
Well, that's a good question really,
and there's not a very good answer.
We want to use quite a lot of data for training,
because, in the final analysis, we're going
to use the entire dataset for training.
If we're using 10-fold cross-validation, then
we're using 90% of the dataset.
Maybe it would be a little better to use 95%
of the dataset for training
with 20-fold cross-validation.
On the other hand, we want to make sure that
what we evaluate on is a valid statistical sample.
So, in general, it's not necessarily a good
idea to use a large number of folds with cross-validation.
Also, of course, 20-fold cross-validation
will take twice as long as 10-fold cross-validation.
The upshot is that there isn't a really good
answer to this question, but the standard

Chinese: 
所以，为什么是10，不是20呢？
这是一个很好的问题，
但没有太好的答案。
我们想要用大部分数据做训练，因为最终我们
将用整个数据集做训练。
如果我们采用10层交叉验证，那么我们用的是90%的数据做训练。
也许用95%的数据会更好一些，
采用20层交叉验证。
另一方面，我们想要确保
我们使用的是有效的统计样本。
所以，一般来说，分割较多份数进行交叉验证并不一定是好主意。
同时，当然20层交叉验证需要两倍的10层交叉验证时间。
结论是这个问题并没有一个好的答案，但标准的

English: 
thing to do is to use 10-fold cross-validation,
and that's why it's Weka's default.
We've shown in this lesson that cross-validation
really is better than repeated holdout.
Remember, on the last slide, we found that
we got about the same mean for repeated holdout
as for cross-validation, but we got a much
smaller variance for cross-validation.
We know that the evaluation in this machine
learning method, J48, on this dataset, diabetes,
we get 74.5% accuracy, probably somewhere
between 73.5% and 75.5%.
That is actually substantially larger than
the baseline.
So, J48 is doing something for us better than
the baseline.
Cross-validation reduces the variance of the
estimate.
That's the end of this class.
Off you go and do the activity.
I'll see you at the next class.
Bye for now!

Chinese: 
做法是采用10层交叉验证，这就是为什么Weka的默认值是10。
本节课我们演示了交叉验证优越于重复预留法。
记住，在最后一页幻灯片上，我们看到交叉验证和重复预留法
有相同的的均值，但是交叉验证的误差要小得多。
我们知道了机器学习方法J48在数据集diabetes上达到的精确度
是74.5%，大约在73.5%和75.5%之间。
这远大于基线精确度。
所以，J48分类器比基线分类器更好。
交叉验证减少了估计的误差。
这节课就到这里，
请完成课后练习。
我们下节课见。
再见！
