
English: 
Hello!
Well, we've come to Class 5, the last class
of More Data Mining with Weka.
Congratulations on having got this far.
In this class, we're going to look at some miscellaneous
things.
We'll have a couple of lessons on neural networks
and the Multilayer Perceptron.
Then we'll take a quick look at learning curves
and performance optimization in Weka.
Then we'll come back and have another look
at the ARFF file format before we do a summary
in Lesson 5.6.
You've been listening to me talking for quite
a long time now, and I just wonder if you
might be interested in finding out a little
bit more about me.
If so, if you go to the web and search for
"A stroll through the gardens of computer

Chinese: 
大家好！
我们来学习第五部分，More Data Mining with Weka的最后一部分。
恭喜大家。
这一部分，我们要学习其它方面内容。
有几节课我们学习神经网络和多层感知器。
然后我们在Weka中看看学习曲线和性能优化。
最后在课程5.6节总结之前，我们要再
回顾ARFF文件格式。
你已经听我讲课很长时间了。
如果你想对我有些许了解，
可以在互联网上搜索“A stroll through the gardens of computer science”，

Chinese: 
括在引号中。
一点都不能错：“A stroll through the gardens of computer science”。
你会得到一条搜索结果，我得到了一条结果：News from New Zealand。
这其实是一个对我的访谈，一个较长的采访。
从下一页开始。
和我的对话。
你会获知我从哪里来，都做过什么，一直从事什么，在想什么，
以及我的一些个人见解。
或许你会感兴趣，或许不会。
这取决于你。
我们回到课程5.1。
我们将要学习简单神经网络。
现在，很多人热衷于神经网络。
我并不热衷。
我认为这是一个不错的术语，因为“神经网络”使我们联想到左图，
很酷的脑机制图。

English: 
science" in quotes.
So you've got to get it exactly right: "A
stroll through the gardens of computer science".
You'll get just one result, or, I got just
one result: News from New Zealand.
This, in fact, is an interview with me, an
extended interview.
It starts on the next page.
A dialogue with me.
You'll learn where I came from and what I've
done and what I've been doing and what I've
been thinking of and some of my biases.
So, that might be interesting or not.
It's up to you.
Let's get back to the lesson, Lesson 5.1.
We're going to talk about simple neural networks.
Now, a lot of people love neural networks.
I'm not one of them.
I think it's a brilliant term, "neural network",
because it conjures up the image on the left
of some really cool brain-like mechanism.

Chinese: 
实际上，你应该想的是杂乱的右图，线性的总和。
我会随后讲到。
确切的名字是智能暗示。
然而， 事实却不是这样。
在这节课，我们将会讲到最简单的神经网络，Perceptron（感知器）。
这是一个简单的学习方法，它使用属性的线性组合
为两类的数据集分类。
测试实例a，a1, a2, a3为属性值，然后我们求和，w0加w1a1
plus w2a2 and so on over all the attributes.加w2a2，等加所有属性值。
我们可以用西格玛符号 j=0来表示。
我们将a0的属性值设置为1，这样公式看上去整洁。
如果结果x大于0，我们就认为实例
属于第一类；
否则，实例就属于第二类。

English: 
Actually, you should think of the rather grungy
picture on the right, a linear sum.
We'll talk about that in a minute.
The very name is suggestive of intelligence.
However, the reality, I think, is not.
In this lesson, we're going to talk about
the simplest neural network, the Perceptron.
It's a simple learning method that determines
the class in a two-class dataset using a linear
combination of attributes.
For test instance a--that is, with attributes
a1, a2, a3--then we take the sum w0 plus w1a1
plus w2a2 and so on over all the attributes.
We'll express that as a sigma from j=0.
We're implicitly defining a0 as 1 here just
to make the notation look nice.
If the result, x, is greater than zero, then
we're going to say that instance belongs to
class 1;
otherwise, we're going to say it belongs to
class 2.

English: 
This, of course, works most naturally with
numeric attributes.
Where do the weights come from? That's the
big question.
We have to learn them.
Here's the algorithm.
We start by setting all weights to zero until
all the instances in the training data are
classified correctly.
We continue for each instance in the training
data.
If it's classified correctly, then we do nothing.
If it's classified incorrectly, then, if it
belongs to the first class we'll add it to
the weight vector, and if it belongs to the
second class, we'll subtract it from the weight
vector.
There's a theorem that if you continue to
do this (the Perceptron Convergence Theorem),
it will converge if you cycle repeatedly through
the training data, perhaps many times.
It will converge providing the problem is
linearly separable, that is, there exists
a straight line that separates the two classes,
class 1 and class 2.
Actually, we talked about linear decision
boundaries before when we talked about Support

Chinese: 
当然，这最适用于数值属性。
权值从哪来？这是个问题。
我们需要通过学习得到。
这是算法。
我们将所有权值设为0，直到训练数据中的所有实例
都被正确分类。
我们检验训练数据中的每个实例，
如果分类正确，我们就不需要做任何事。
如果分类错误，如果实例属于第一个类别，我们就将其加入权重向量；
如果实例属于第二个类别，我们就将其
从权向量中减去。
这是个定理（感知器收敛定理），
如果循环处理训练数据，或许需要很多次，它就会收敛。
如果数据是可线性分割数据，也就是说，
存在一条可分割类1和类2的直线。
实际上，我们在讲解支持向量机之前学习了

Chinese: 
线性决策边界。
它们都局限于线性边界，但使用“核技巧（Kernel trick）”就能得到更复杂的边界，
我在课程Data Mining with Weka第4.5节提到过，但是
Weka Lesson 4.5.没有细讲。
现在我也不讲这点，但是我要告诉你感知器
能够使用同样的技巧得到非线性边界。
在Weka中的应用是投票感知器（Voted Perceptron），算法有点不同。
它存储所有的权向量，权向量的所有版本，
让它们对实验数据投票。
它们的重要性，权向量根据在权重变化之前
存在的时间长度自行加权。
我们要使用权向量，分类训练实例，
当系统出错时，我们要改变权向量。
权向量存在的时间表明了此版本权向量
的成功度。

English: 
Vector Machines.
They were also restricted to linear boundaries,
but they can get more complex boundaries using
the "Kernel trick", which I mentioned but
did not explain back then in Data Mining with
Weka Lesson 4.5.
And I'm not going to explain it now, but I'm
just going to tell you that the Perceptron
can use the same trick to get non-linear boundaries.
The Weka implementation is called the Voted
Perceptron, a slightly different algorithm.
It stores all of the weight vectors, all versions
of the weight vector, and lets them vote on
test examples.
Their importance, the weight vectors are themselves
weighted according to the length of time that
they survived before the weights got changed.
You know, we're going to use a weight vector,
keep classifying training instances, and when
the system makes a mistake, then we're going
to change the weight vector.
The survival time is some kind of indication
of how successful that version of the weight
vector is.

Chinese: 
它具有支持向量机的很多优点，但速度快，
相对简单，准确率差不多。
我们来看一下。
我们来看ionosphere数据集。
在Weka中打开。
点击Classify，在functions类别中找VotedPerceptron。
点击--有几个选项，但我们不必担心这点--
使用交互验证，运行。
得到86%的准确率。
.如果我选择SMO，会得到89%的准确率。
回到幻灯片。
使用German credit 数据，SMO也会得到较高的准确率。
使用 breast cancer数据，准确率几乎一样，使用diabetes数据，
SMO的准确率稍高。
投票感知器更快，或许快2倍、5倍、或许

English: 
This is claimed to have many of the advantages of
Support Vector Machines, but it's faster,
simpler, and nearly as good.
We'll take a look.
I'm going to look at the ionosphere dataset.
I've got it open here in Weka.
I'm going to go to Classify, and the VotedPerceptron
is in the functions category.
If I select that--there's a bunch of options,
but we won't worry about that--and just run
it using cross-validation,
I get 86%.
If I were to choose SMO, then I would get
89%.
Back to the slide.
For the German credit data, we also get slightly
better performance with SMO.
For the breast cancer dataset, they are almost
exactly the same, and for the diabetes, again
SMO is a little bit better.
It's certainly true that the VotedPerceptron
is faster, maybe 2 times, 5 times, perhaps

Chinese: 
10倍，这取决于数据集。
感知器有很久远的历史。
它在1957年被首次提出，作为基本的感知器算法，
源于大脑的工作原理。
它是“感知与识别的自动机”的缩略。
在1958年，名为Rosenblatt的人发表了一本书“Principles of neurodynamics:
Perceptrons and the theory of brain mechanisms”
在1970年，非常突然，感知器不再受人关注。因为一本由两位著名计算机科学家发表的名为“Perceptrons”的书
他们让人们展示了感知器不能做
一些简单的事情。
他们证明了感知器能做与不能做的定理。
这是他们那本把感知器移出了地图的著名的书的封面。
直到1986年，感知器被更名为“连接机制”，才又得到应用。
这场运动就是“连接机制运动”，

English: 
up to 10 times depending on the dataset.
The Perceptron's got a long history.
It was first published in 1957, the basic
Perceptron algorithm.
It was derived from theories about how the
brain works.
It's an acronym for "a perceiving
and recognizing automaton",
and a guy called Rosenblatt published a book in
1958 called "Principles of neurodynamics:
Perceptrons and the theory of brain mechanisms".
Very suddenly, in 1970, it went out of fashion
with a book by two well-known computer scientists,
called "Perceptrons", and they showed that
there were some simple things that Perceptrons
simply couldn't do.
They proved theorems about what Perceptrons
could and couldn't do.
This is the cover of their famous book that
basically took Perceptrons off the map.
Until 1986, when they came back rebranded
"connectionism",
the movement was the "connectionist movement",

Chinese: 
有几个人写了另一本书“Parallel distributed processing”。
有人认为人工神经网络模拟了大脑的机能，和Richard Rosenblatt
did back in the 50's.在五十年代的观点一样。
连接主义者使用的感知器的主要形式是多层感知器，
它能够使用反向传播（backpropagation）算法画非线性决策边界，
我们下节课会讲到。
这是本节课的总结。
基本的感知器算法实现线性决策边界。
这是非常古老的使用回归分类的方法。
适应于数值属性。
它是迭代算法，取决于训练实例的顺序，
结果取决于顺序。
实际上，在很多年前，1971年，我在硕士论文中介绍了感知器的一个简单改进，

English: 
and a couple of guys wrote another book "Parallel
distributed processing".
Some people claim that artificial neural networks
mirror brain function, just like Richard Rosenblatt
did back in the 50's.
The main form of Perceptron the connectionists
use is a multilayer Perceptron, which is capable
of drawing nonlinear decision boundaries using
an algorithm called the backpropagation algorithm
that we'll look at in the next lesson.
Here's the summary.
The basic Perceptron algorithm implements
a linear decision boundary.
It's very reminiscent of classification by
regression.
It works with numeric attributes.
It's an iterative algorithm, and it depends
on the order in which it encounters the training instances,
the result depends on the order.
Actually, many years ago, in 1971, I described
a simple improvement to the Perceptron in

Chinese: 
不过我依然对感知器不太感兴趣。
很遗憾。
最近，有些改进：使用核技巧得到更复杂的边界，
这种投票感知器策略包含多个权向量和投票。
书中有些章节讲解了这点。
这是练习，通过做练习，你会对简单感知器算法有
更多了解。
下节课见！
再见！

English: 
my Master's thesis, but I'm still not very
impressed with the Perceptron stuff;
sorry about that.
Recently, there have been some improvements:
the use of the Kernel trick to get more complex
boundaries, and this Voted Perceptron strategy
with multiple weight vectors and voting.
There are some chapters in the textbook about
this,
and there's an activity, which will get you
learning a little bit more about this very
simple Perceptron algorithm.
We'll see you in the next lesson.
Until then, bye for now!
