
English: 
Hello again! We're up to the last lesson in
the fourth class, Lesson 4.6 on Ensemble Learning.
In real life, when we have important decisions
to make, we often choose to make them using
a committee.
Having different experts sitting down together,
with different perspectives on the problem,
and letting them vote, is often a very effective
and robust way of making good decisions.
The same is true in machine learning.
We can often improve predictive performance
by having a bunch of different machine learning

Chinese: 
大家好！这是第四部分的最后一节课4.6，集成学习法。
在现实生活中，我们做关键决策时，经常会通过
委员会来决定。
选择不同领域的专家，从不同角度看问题，
通过投票的方式做决策，通常会很有效。
机器学习也是一样。
我们通常可以选取一组不同的机器学习方法来提高预测性能。

Chinese: 
对同一问题，使用不同的分类器，然后让它们投票
决定对未知测试实例的分类。
弊端之一是这样产生的结果会难以分析。
产生一个可理解的结果的方法已经存在，
但我们这里不讲。
虽然结果难以分析，但是总体性能表现不错。
这是一个较新的机器学习技术。
下面，我们来看四种方法：装袋（bagging）、随机化（randomization）、加强（boosting）和
堆栈（stacking）。
当然，它们都是Weka的功能。
使用bagging时，我们需要几个不同的决策结构，
假设使用J48来创建决策树，我们想创建不同的决策树。
我们需要几个相同大小的独立的训练数据集。

English: 
methods, all producing classifiers for the
same problem, and then letting them vote when
it comes to classifying an unknown test instance.
One of the disadvantages is that this produces
output that is hard to analyze.
There are actually approaches that try and
produce a single comprehensible structure,
but we're not going to be looking at any of
those.
So the output will be hard to analyze, but
you often get very good performance.
It's a fairly recent technique in machine
learning.
We're going to look at four methods, called
"bagging", "randomization", "boosting", and
"stacking".
They're all implemented in Weka, of course.
With bagging, we want to produce several different
decision structures.
Let's say we use J48 to produce decision trees,
then we want to produce slightly different decision trees.
We can do that by having several different
training sets of the same size.

English: 
We can get those by sampling the original
training set.
In fact, in bagging, you sample the set "with
replacement", which means that sometimes you
might get two of the same [instances] chosen
in your sample.
We produce several different training sets,
and then we build a model for each one -- let's
say a decision tree -- using the same machine
learning scheme, or using some other machine
learning scheme.
Then we combine the predictions of the different
models by voting, or if it's a regression
situation you would average the numeric result
rather than voting on it.
This is very suitable for learning schemes
that are called "unstable".
Unstable learning schemes are ones where a
small change in the training data can make
a big change in the model.
Decision trees are a really good example of
this.
You can get a decision tree and just make
a tiny little change in the training data

Chinese: 
我们来从原始的训练数据集中取样。
事实上，使用bagging，你可以重复取样，
也就是说有时你的样本中会有两个同样的实例。
我们创建几个不同的训练数据集后，再为每个数据集建立不同的模型
（比方说决策树） 使用相同的机器学习方法，或其他
机器学习方法。
然后我们通过投票来决定预测，或者，如果是回归的情况，
我们对所有数值的结果求平均数，而不是投票决定。
这非常适合“不稳定”的学习方法。
不稳定的学习方法指训练数据的微小变化就会带来
模型的巨大变化。
决策树就是个很好的实例。
你可以对训练数据做微小的变动，

English: 
and get a completely different kind of decision
tree.
Whereas with NaiveBayes, if you think about
how NaiveBayes works, little changes in the
training set aren't going to make much difference
to the result of NaiveBayes, so that's a "stable"
machine learning method.
In Weka we have a "Bagging" classifier in
the meta set.
I'm going to choose meta > Bagging: here it
is.
We can choose here the bag size -- this is
saying a bag size of 100%, which is going
to sample the training set to get another
set the same size, but it's going to sample
"with replacement".
That means we're going to get different sets
of the same size every time we sample, but
each set might contain repeats of the original
training [instances].
Here we choose which classifier we want to
bag, and we can choose the number of bagging
iterations here, and a random-number seed.

Chinese: 
继而得到完全不同的决策树。
而Naive Bayes就是稳定的机器学习方法，如果你想想
Naive Bayes的工作原理，就知道训练数据的微小变动不会带来结果的
显著不同。
在Weka中，bagging分类器在meta集合之下。
我们来选择meta，然后Bagging。
我们来选择装袋的大小（ 这里是100%），也就是
从训练数据集中取样得到另一个相同大小的数据集，
即重复取样。
也就是说我们每次取样都得到同样大小的不同数据集，
但是每个数据集都包含重复的训练数据集中的实例。
这里我们可以选择需装袋的分类器，装袋循环次数，
和随机数种子。

English: 
That's the bagging method.
The next one I want to talk about is "random
forests".
Here, instead of randomizing the training
data, we randomize the algorithm.
How you randomize the algorithm depends on
what the algorithm is.
Random forests are when you're using decision
tree algorithms.
Remember when we talked about how J48 works?
-- it selects the best attribute for splitting
on each time.
You can randomize this procedure by not necessarily
selecting the very best, but choosing a few
of the best options, and randomly picking
amongst them.
That gives you different trees every time.
Generally, if you bag decision trees, if you
randomize them and bag the result, you get
better performance.
In Weka, we can look under "tree" classifiers
for RandomForest.

Chinese: 
这就是装袋法（bagging）。
下面我们来学习“随机森林“ （Random Forest)。
这次，我们不是随机训练数据，而是随机算法。
如何随机算法取决于算法是什么。
当你使用决策树算法时就`叫随机森林。
还记得J48是如何工作的吗？每次选择最好的
属性分裂。
你可以通过不选择最好的属性，而是选择若干较好的属性，
从中随机抽取一个，来随机化这个步骤。
每次都会得到不同的决策树。
总体来说，如果随机选择决策树，并装袋结果，
你会得到较好的准确率。
在Weka中，RandomForest在tree分类器下。

English: 
Again, that's got a bunch of parameters.
The maximum depth of the trees produced -- I
think 0 would be unlimited depth.
The number of features we're going to use.
We might select, say 4 features; we would
select from the top 4 features -- every time
we decide on the decision to put in the tree,
we select that from among the top 4 candidates.
The number of trees we're going to produce,
and so on.
That's random forests.
Here's another kind of algorithm: it's called
"boosting".
It's iterative: new models are influenced
by the performance of previously built models.
Basically, the idea is that you create a model,
and then you look at the instances that are
misclassified by that model.
These are the hard instances to classify,
the ones it gets wrong.

Chinese: 
这里有一组参数。
决策树的最大深度（我想0是指无限深度）。
我们将要使用的属性数量。
我们或许会选择4个属性（我们选择最显著的4个属性） 
每次我们对树作决策时，我们从最好的4个属性中选择。
这是决策树的数量，等等。
这就是决策森林。
下面我们来学习另一种算法：加强（boosting）。
加强法是循环算法：新一轮的模型建立在先前模型的分类结果之上。
简单来说，你先创建一个模型，然后找到那些
分类错误的实例。
总有一些不好分类的实例，分类错误的实例。

Chinese: 
你在这些实例上加权来为下一轮建立新的模型
创建训练数据集。
这样，新模型对于被先前模型分类错误的实例充当
“专家”的角色。
直观的理由是在现实生活中委员会里，委员们应该是
以关注问题的不同方面来互相弥补。
最后，我们通过投票来博取众长，其实我们根据模型的效果来
加权模型。
在Weka中，有个非常好的叫做AdaBoostM1的方案，它是标准的
非常好的分类器 （ 经常得到很棒的结果）。
AdaBoostM1也有若干参数，特别是循环次数参数。

English: 
You put extra weight on those instances to
make a training set for producing the next
model in the iteration.
This encourages the new model to become an
"expert" for instances that were misclassified
by all the earlier models.
The intuitive justification for this is that
in a real life committee, committee members
should complement each other's expertise by
focusing on different aspects of the problem.
In the end, to combine them we use voting,
but we actually weight models according to
their performance.
There's a very good scheme called AdaBoostM1,
which is in Weka and is a standard and very
good boosting implementation -- it often produces
excellent results.
There are few parameters to this as well;
particularly the number of iterations.

English: 
The final ensemble learning method is called
"stacking".
Here we're going to have base learners, just
like the learners we talked about previously.
We're going to combine them not with voting,
but by using a meta-learner, another learner
scheme that combines the output of the base
learners.
We're going to call the base learners level-0
models, and the meta-learner is a level-1 model.
The predictions of the base learners are input
to the meta-learner.
Typically you use different machine learning
schemes as the base learners to get different
experts that are good at different things.
You need to be a little bit careful in the
way you generate data to train the level-1
model: this involves quite a lot of cross-validation,
I won't go into that.
In Weka, there's a meta classifier called
"Stacking", as well as "StackingC" -- which
is a more efficient version of Stacking.

Chinese: 
最后一个集成学习法叫做堆栈（stacking）。
我们将使用基础的方法，就像之前我们讲过的方法一样。
我们要使用一种元学习法（meta-learner）来组合它们的预测结果，
而不是靠投票决定。
我们把基础方法叫0层模型，元学习法是1层模型。
基础方法所做的预测就是元学习法的输入。
通常我们使用不同的机器学习法作为基础方法，
就像使用擅长不同领域的专家。
在你生成数据来训练1层模型时，要非常小心。
这里涉及到许多交叉验证，我现在不讲这点。
在Weka中，有一个元分类器叫Stacking，另一个 叫StackingC 
它是一个更有效的堆栈分类器。

English: 
Here is Stacking; you can choose different
meta-classifiers here, and the number of stacking folds.
We can choose different classifiers; different
level-0 classifiers, and a different meta-classifier.
In order to create multiple level-0 models,
you need to specify a meta-classifier as the
level-0 model.
It gets a little bit complicated; you need
to fiddle around with Weka to get that working.
That's it then.
We've been talking about combining multiple
models into ensembles to produce an ensemble
for learning, and the analogy is with committees
of humans.
Diversity helps, especially when learners
are unstable.
And we can create diversity in different ways.
In bagging, we create diversity by resampling
the training set.

Chinese: 
这就是Stacking。你可以选择不同的元分类器，以及折的数量。
我们可以选择不同的分类器，不同的0层分类器和元分类器。
如果要创建多个0层模型，你需要设置元分类器为
0层模型。
有点复杂，要在Weka中通过些步骤才能完成。
好了。
我们已经讲了通过组合多种模型来创建一个集成模型，
就像人类的委员会一样。
预测差异是有帮助的，特别是当学习方法不稳定的时候，
而且我们可以通过不同的方式来制造差异。
在bagging中，我们通过从训练数据集中重新取样来制造差异。

Chinese: 
在 random forests中，我们通过为决策树选择不同的分支
来制造差异。
在boosting中，我们通过关注现有模型的错误来制造差异。
在stacking中，我们使用另一个学习方法来组合输出不同学习方法的结果，
而不单单是通过投票决定。
课本中有一章是关于集成学习法的（这真是一大主题）。
你还需要做练习，请在我们开始讲解
最后一部分之前完成。
我们将要学习把这些知识融合在一起，宏观来看机器学习
过程。
下次见。
再见！

English: 
In random forests, we create diversity by
choosing alternative branches to put in our
decision trees.
In boosting, we create diversity by focusing
on where the existing model makes errors;
and in stacking, we combine results from a
bunch of different kinds of learner using
another learner, instead of just voting.
There's a chapter in the course text on Ensemble
learning -- it's quite a large topic, really.
There's an activity that you should go and
do before we proceed to the next class, the
last class in this course.
We'll learn about putting it all together,
taking a more global view of the machine learning
process.
We'll see you then.
Bye for now!
