
Chinese: 
大家好！我现在坐这里，在新西兰。
新西兰在我后面的地球仪上。
这里是新西兰，在世界的顶端，被大海环绕。
但是，以前我不在这里。
我20年前来到这里。
在地图上，这里当然是新西兰，Google地图把北半球放在上面
你用的那个应该也是这样。
我从工作多年的 Calgary大学来到这里。
我以前是 Calgary大学计算机科学的系主任。
但是，最初，我来自Belfast，北爱尔兰，就在（大不列颠）联合王国的这里。
所以，我的口音实际是北爱尔兰口音而不是新西兰口音。
这不是新西兰口音。

English: 
Hi! I'm sitting here in New Zealand.
It's on the globe behind me.
That's New Zealand, at the top of the world,
surrounded by water.
But that's not where I'm from originally.
I moved here about 20 years ago.
Here on this map, of course, this is New Zealand
-- Google puts things with the north at the
top, which is probably what you're used to.
I came here from the University of Calgary
in Canada, where I was for many years.
I used to be head of computer science for
the University of Calgary.
But, originally, I'm from Belfast, Northern
Ireland, which is here in the United Kingdom.
So, my accent actually is Northern Irish,
not New Zealand.
This is not a New Zealand accent.

Chinese: 
在第三部分的最后一节课，我们要介绍的是另一种机器学习的
方法，叫做最邻近（nearest neighbor）或者基于实例的（instance-based）机器学习方法。
当人们谈论机械式学习，他们只谈论记住的东西，
而没有真正的思考。
这是最简单的学习。
nearest neighbor就是一种机械式学习。
它只记忆训练实例，然后，为了给新的实例分类，寻找
训练数据集中与新实例最相似的的实例。
这里的知识表现为实例的集合。
这是一种懒惰学习。
学习者在预测之前不用做任何事情。
令人费解的是，它也被称为基于实例的（instance-based）学习。
nearest neighbor learning和instance-based learning是一样的。
这是一张二维的实例空间示意图。

English: 
We're going to talk here in the last lesson
of Class 3 about another machine learning
method called the nearest neighbor, or instance-based,
machine learning method.
When people talk about rote learning, they
just talk about remember stuff without really
thinking about it.
It's the simplest kind of learning.
Nearest neighbor implements rote learning.
It just remembers the training instances,
and then, to classify a new instance, it searches
the training set for one that is most like
the new instance.
The representation of the knowledge here is
just the set of instances.
It's a kind of lazy learning.
The learner does nothing until it has to do
some predictions.
Confusingly, it's also called instance-based
learning.
Nearest neighbor learning and instance-based
learning are the same thing.
Here is just a little picture of 2-dimensional
instance space.

Chinese: 
蓝色的点和白色的点代表两种不同的类别，例如，yes和no。
然后，我们有一个未知分类的点，一个红色的点。
我们想知道它的分类。
所以，我们简单地去找每个分类中最接近的实例，比较哪个更近。
在这里，是这个蓝色的点。
所以，我们将这个红色的点归为蓝色的点所在的分类。
如果你想想看，在两个点之间划一条辅助线。
这里是一条直线，最近的两个点的连线的中垂线。
nearest neighbor 方法产生一个线性的决策边界。
实际上，要比这复杂一点。
它产生一个分段的线性的决策边界，有时是由一组短小的线性线段
组成的决策边界。
当然，关键是如何理解“最相似的”。

English: 
The blue points and the white points are two
different classes -- yes and no, for example.
Then we've got an unknown instance, the red
one.
We want to know which class it's in.
So, we simply find the closest instance in
each of the classes and see which is closest.
In this case, it's the blue class.
So, we would classify that red point as though
it belonged to the blue class.
If you think about this, that's implicitly
drawing a line between the two clouds of points.
It's a straight line here, the perpendicular
bisector of the line that joins the two closest points.
The nearest neighbor method produces a linear
decision boundary.
Actually, it's a little bit more complicated
than that.
It produces a piece-wise linear decision boundary
with sometimes a bunch of little linear pieces
of the decision boundary.
Of course, the trick is what do we mean by
"most like".

English: 
We need a similarity function, and conventionally,
people use the regular distance function,
the Euclidean distance, which is the sum of
the squares of the differences between the attributes.
Actually, it's the square root of the sum
of the squares, but since we're just comparing
two instances, we don't need to take the square root.
Or, you might use the Manhattan or city block
distance, which is the sum of the absolute
differences between the attribute values.
Of course, I've been talking about numeric
attributes here.
If attributes are nominal, we need the difference
between different attribute values.
Conventionally, people just say the distance
is 1 if the attribute values are different
and 0 if they are the same.
It might be a good idea with nearest neighbor
learning to normalize the attributes so that
they all lie between 0 and 1, so the distance
isn't skewed by some attribute that happens
to be on some gigantic scale.
What about noisy instances.

Chinese: 
我们需要一个相似性函数，通常，人们用常规的距离函数，
欧氏距离（Euclidean distance），即属性值的差异的平方和。
实际上，是平方和的平方根，但是我们只是比较
两个实例，我们不需要开平方。
或者，你可以使用曼哈顿距离（Manhattan 或 city block distance），
即属性值之间的绝对差值总和。
当然，我这里说的是数值属性。
如果是名词类属性，我们需要定义不同的属性值之间的差异。
通常，人们认为属性值不同的距离是1，
属性值相同的距离是0。
对于nearest neighbor learning，这也许是一个归一化属性的好主意，
这样距离都在0和1之间，距离就不会被
一些比例巨大的属性扭曲。
怎样处理噪音实例。

Chinese: 
如果我们有一个有噪音的数据集，我们偶尔会找到一个
分类错误的训练实例作为测试实例最邻近的实例。
你可以通过使用k-nearest-neighbors避免这种情况。
k可以是3或者5，你寻找最近的3到5个实例，
其中的大多数属于某一个分类，则该实例也属于这个分类。
这就是k-nearest-neighbor方法。
在Weka中，这叫做IBk，这是一种懒惰学习法。
让我们栽入glass数据集。
切换到分类面板，选择懒惰分类器IBk。
直接运行它。
准确率是70.6%。
模型并没有真正的显示出来，因为就没有模型。

English: 
If we have a noisy dataset, then by accident
we might find an incorrectly classified training
instance as the nearest one to our test instance.
You can guard against that by using the k-nearest-neighbors.
k might be 3 or 5, and you look for the 3
or the 5 nearest neighbors and choose the
majority class amongst those when classifying
an unknown point.
That's the k-nearest-neighbor method.
In Weka, it's called IBk (instance-based learning
with parameter k), and it's in the lazy class.
Let's open the glass dataset.
Go to Classify and choose the lazy classifier
IBk.
Let's just run it.
We get an accuracy of 70.6%.
The model is not really printed here, because
there is no model.

Chinese: 
只有训练实例集。
当然，我们用的是十折交叉验证。
让我们改变k的值，这个kNN表示k的值。
默认设置是1。
（使用的临近的实例数）。我们将改为5，运行。
k等于5时，我们得到了稍差一些的结果，67.8%。
我认为这不是一个含有很多噪音的数据集。
如果我们把k变为20，再次运行。
我们得到65%的准确率，更加差一些。
如果我们用一个充满噪音的数据集，我们会发现随着k刚开始增大时，
准确率会提高。
然后，（随着k增大）准确率总是会开始降低的。

English: 
It's just the set of training instances.
We're using 10-fold cross-validation, of course.
Let's change the value of k, this kNN is the
k value.
It's set by default to 1.
(The number of neighbors to use.) We'll change
that to, say, 5 and run that.
In this case, we get a slightly worse result,
67.8% with k as 5.
This is not such a noisy dataset, I guess.
If we change it to 20 and run it again.
We get 65% accuracy, slightly worse again.
If we had a noisy dataset, we might find that
the accuracy figures improved as k got little
bit larger.
Then, it would always start to decrease again.

English: 
If we set k to be an extreme value, close
to the size of the whole dataset, then we're
taking the distance of the test instance
to all of the points in the dataset and averaging
those, which will probably give us something
close to the baseline accuracy.
Here, if I set k to be a ridiculous value
like 100.
I'm going to take the 100 nearest instances
and average their classes.
We get an accuracy of 35%, which, I think
is pretty close to the baseline accuracy for
this dataset.
Let me just find that out with ZeroR, the
baseline accuracy is indeed 35%.
Nearest neighbor is a really good method.
It's often very accurate.
It can be slow.
A simple implementation would involve scanning
the entire training dataset to make each prediction,
because we've got to calculate the distance
of the unknown test instance from all of the
training instances to see which is closest.

Chinese: 
如果我们把k设为一个极值，接近整个数据集的大小，然后
我们算出测试实例到所有训练实例的距离，并且求平均值
我们大概会得到一个的接近基线的准确率。
这里，如果我把k定到极大的值，例如100。
找出最近的100个实例，求它们的类别的平均值。
我们得到的准确率是35%，这个数，我想已经非常接近
这个数据集的基线准确率了。
让我们用ZeroR验证一下，基线准确率准确的是35%。
nearest neighbor是一个很好的方法。
通常都很准确。
它可能有点慢。
每一个预测都需要扫描整个训练数据集，
因为我们需要计算未知的测试实例到所有训练实例的距离
以便找到最邻近的实例。

Chinese: 
一些复杂的数据结构可以加速这个过程加速，所以，
你不需要每次都扫描整个数据集。
（IBk）假设所有的实例都是一样重要的。
如果不是这样，你可以使用根据属性的重要性
来选择或加权属性的方案。
如果我们有噪音实例，那么我们采纳k个近邻实例中的占大多数的分类，
或者我们根据实例的预测准确率给它们加权。
再或者，我们可以试着找到每个分类别中可靠的原型，
这是一个非常老的方法了。
统计人员从二十世纪五十年代开始使用k-nearest-neighbor。
这里有一个有趣的理论结果。
如果训练实例的数量趋于无穷大，k也增大，以致于
k/n接近0，但k也趋于无穷大，k-nearest-neighbor的错误率

English: 
There are more sophisticated data structures
that can make this faster, so you don't need
to scan the whole dataset every time.
It assumes all attributes are equally important.
If that wasn't the case, you might want to
look at schemes for selecting or weighting
attributes depending on their importance.
If we've got noisy instances, than we can
use a majority vote over the k nearest neighbors,
or we might weight instances according to
their prediction accuracy.
Or, we might try to identify reliable prototypes,
one for each of the classes.
This is a very old method.
Statisticians have used k-nearest-neighbor
since the 1950's.
There's an interesting theoretical result.
If the number (n) of training instances approaches
infinity, and k also gets larger in such a
way that k/n approaches 0, but k also approaches
infinity, the error of the k-nearest-neighbor

English: 
method approaches the theoretical minimum
error for that dataset.
There is a theoretical guarantee that with
a huge dataset and large values of k, you're
going to get good results from nearest neighbor
learning.
There's a section in the text, Section 4.7
on Instance-based learning.
This is the last lesson of Class 3.
Off you go and do the activity, and I'll see
you in Class 4.
Bye for now!

Chinese: 
趋于这个数据集的理论最小错误率。
这从理论上保证了，对于较大的训练数据集和较大的k值，
使用nearest neighbor 方法，可以得到较好的分类结果。
课本的第四章第七节介绍了基于实例的学习。
这是第三部分的最后一课。
请大家完成课后练习，我们第四部分见。
再见！
