
Chinese: 
创建集成学习器
是改进已有学习器的一种方式
我们并非创建新的算法
而是将多个不同的算法或模型集成起来
创建一个集成学习器
要强调的一点是
你可以将所学的集成学习器知识运用到
已经掌握的 KNN 和线性回归模型中
到目前为止 我们一直都使用的是一种学习方法
例如 KNN 并将数据传入 KNN 中 然后学习一个模型
我们可以使用 X 查询模型并得出 Y
这不是一个集成学习器 而仅仅是一个学习器
集成学习器
包含多个学习器
例如可能包含线性回归模型
决策树模型 支持向量机模型

English: 
Creating an ensemble of learners
is one way to make the learners
you've got better.
So we're not talking about
creating a new algorithm, but
instead assembling together
several different algorithms or
several different models to
create an ensemble learner.
One thing I want to emphasize here is
that you can take what you learn here
about ensemble learners and plug it
right in to what you're already doing
with your KNN and
linear regression models.
Now, what we've been doing so far,
is that we've had one kind of
learning method, say KNN, we plug our
data into there and we learn a model.
We can query our model with an X and
it will give us a Y.
So this is not an ensemble learner,
this is just a single learner.
And the idea with ensemble
learners is that we have
several additional learners.
So, we might have a linear regression
based model, we might have a decision
tree based model, we might have
a support vector machine based model.

Chinese: 
可以继续添加任何数量的不同算法
它们都使用相同的数据进行了训练
现在有 4 个不同的模型
要查询这个集成学习器 我们会单独查询每个模型
然后将答案组合到一起
如果要用 X 查询这个集成模型 我们将 X 代入每个模型
使用相同的 X 然后获得多个 Y
从每个模型中获得一个 Y 输出 如何组合它们？
如果是分类问题 例如尝试识别对象是什么
那么每个 Y 都会投票表示对象是什么
但是这是回归问题 所以通常我们会求均值
它就是这个集成学习器的结果
然后使用放在一边的测试数据
测试整个集成学习器
为何创建集成学习器？
为何要使用它们？它们为何效果可能更好？
原因有多个
首先

English: 
You could continue this on with any
different number of algorithms.
They're all trained using the same data,
and so now we have,
in this case, four different models.
To query this ensemble of learners,
we query each model by itself and
combine the answers.
So if we wanted to query this model
with X, we plug X into each model,
the same X and then our Ys come out.
So we have a Y output from each of
these models, how do we combine them?
If we're doing classification where for
instance we're trying to identify
what the thing is, we might have
each of these Ys vote on what it is.
But we're doing regression, and so
the typical thing to do here is to
take the mean, and that is the result
for this ensemble learner.
We can then test this
overall ensemble learner
using this test data that we set aside.
Why ensembles?
Why do we use them,
why might they be better?
Well, there's a few reasons.
First of all,

Chinese: 
集成学习器通常比单个方法的误差要低
集成学习器的过拟合程度更低
通常 集成学习器的过拟合程度
比单个学习器的要低
为何会这样？
一个比较明显的原因是
每种学习器都存在某种偏差
线性回归的偏差
最好理解
线性回归的偏差很明显 即数据是线性的
KNN 存在自己的偏差 决策树存在自己的偏差
但是将它们组合到一起后 一般会降低偏差
因为它们会以某种方式相互抵消
总之 这就是将多种学习器组合后
形成的集成学习器

English: 
ensembles often have lower error than
any individual method by themselves.
Ensemble learners offer
less overfitting.
The ensemble of learners typically
does not overfit as much as any
individual learner by itself.
Now why is that?
Here's at least an intuitive answer.
As each kind of learner that you
might use has a sort of bias,
it's easiest to talk about that
in terms of linear regression
in terms of what do I mean by bias.
So clearly, with linear regression
our bias is that the data is linear.
KNN has its own kind of bias, decision
trees have their own kind of bias, but
when you put them together you tend
to reduce the biases because they're
fighting against each
other in some sort of way.
Anyways that's what
an ensemble learner is like
if we use multiple types of learners.
