
English: 
Hi! This is the last lesson in the course
Data mining with Weka, Lesson 5.4 - Summary.
We'll just have a quick summary of what we've
learned here.
One of the main points I've been trying to
convey is that there's no magic in data mining.
There's a huge array of alternative techniques,
and they're all fairly straightforward algorithms.
We've seen the principles of many of them.
Perhaps we don't understand the details, but
we've got the basic idea of the main methods
of machine learning used in data mining.
And there is no single, universal best method.

Chinese: 
大家好！这是Weka与数据挖掘的最后一节课，课程5.4 , 总结。
我们来对所学的知识做个小结。
我想表达的要点之一是数据挖掘中没有魔法。
数据挖掘包含很多不同的方法，他们都是非常直接的算法。
我们学习了许多算法的原理。
或许我们没有理解细节，但是我们学到了数据挖掘中
机器学习主要方法的基本思想。
这里没有一个万能的方法。

Chinese: 
数据挖掘是一门实验科学。
你需要找出针对问题的最好方法。
Weka使事情变得容易。
使用Weka,你可以尝试不同的方法，使用不同的过滤器，
不同的学习方法。
你可以尝试不同的数据集。
在Weka中做实验是非常容易的。
或许你会说这太容易。不过重要的是明白自己所做的事情，
而不是到处乱点，查看结果。
这就是我这门课要强调的事情： 理解、评价
自己所做的事。
如果你不能真正理解后台的算法，就会落入
陷阱，
而不仅是在工作平台上盲目地应用工具的问题。
我们在课程中已经强调了评估，评估你所做的事，
以及结果的显著性。
我们已经看到，不同的算法有不同的性能。

English: 
Data mining is an experimental science.
You need to find out what works best on your
problem.
Weka makes it easy for you.
Using Weka you can try out different methods,
you can try out different filters, different
learning methods.
You can play around with different datasets.
It's very easy to do experiments in Weka.
Perhaps you might say it's too easy, because
it's important to understand what you're doing,
not just blindly click around and look at
the results.
That's what I've tried to emphasize in this
course -- understanding and evaluating what
you're doing.
There are many pitfalls you can fall into
if you don't really understand what's going
on behind the scenes.
It's not a matter of just blindly applying
the tools in the workbench.
We've stressed in the course the focus on
evaluation, evaluating what you're doing,
and the significance of the results of the
evaluation.
Different algorithms differ in performance,
as we've seen.

English: 
In many problems, it's not a big deal.
The differences between the algorithms are
really not very important in many situations,
and you should perhaps be spending more time
on looking at the features and how the problem
is described and the operational context that
you're working in, rather than stressing about
getting the absolute best algorithm.
It might not make all that much difference
in practice.
Use your time wisely.
There's a lot of stuff that we've missed out.
I'm really sorry I haven't been able to cover
more of this stuff.
There's a whole technology of filtered classifiers,
where you want to filter the training data,
but not the test data.
That's especially true when you've got a supervised
filter, where the results of the filter depend
on the class values of the training instances.
You want to filter the training data, but
not the test data, or maybe take a filter

Chinese: 
对于很多问题，这区别不大。
算法的不同在很多情况下并不十分重要，
你应该把时间花在研究特征，问题描述和
你工作的操作环境上，而不是浪费在
寻找最佳算法上。
在实践中，这不会有太大的不同。
明智地使用你的时间。
还有很多东西，我们没有讲。
很遗憾我未能讲解这些。
过滤分类器技术，你可以用来过滤训练数据，
而不是测试数据。
尤其是当你使用监督过滤器时，过滤器的结果取决于
训练实例的类值。
你想要的是过滤训练数据，而不是测试数据，有时把一个为训练数据

English: 
designed for the training data and apply the
same filter to the test data without re-optimizing
it for the test data, which would be cheating.
You often want to do this during cross-validation.
The trouble in Weka is that you can't get
hold of those cross-validation folds; it's
all done internally.
Filtered classifiers are a simple way of dealing
with this problem.
We haven't talked about costs of different
decisions and different kinds of errors, but
in real life different errors have different
costs.
We've talked about optimizing the error rate,
or the classification accuracy, but really,
in most situations, we should be talking about
costs, not raw accuracy figures, and these
are different things.
There's a whole panel in the Weka Explorer
for attribute selection, which helps you select
a subset of attributes to use when learning,
and in many situations it's really valuable,

Chinese: 
设计的过滤器，原封不动地用在测试数据上，
这可能是欺骗。
你经常需要在交叉验证时这么做。
使用Weka的问题是你无法知道交叉验证的折，
所有运算都在内部完成。
过滤分类器是解决这个问题的简单方法。
我们没有提到不同决定和不同种错误的成本，
但在现实生活中，不同错误会带来不同成本。
我们已经谈到了优化误差率，或分类准确率，但在现实生活中，
大多数情况下，我们应该谈及成本，而不是原始准确率，
这是不同的。
Weka Explorer中有一整个控制面板用于属性选择，帮你选择
在学习中使用的属性子集。在很多情况下，学习前，

Chinese: 
选择一个合适的小属性子集，是很值得的。
Weka中有许多聚类技术。
聚类是你想发现一些东西，即使没有类值，
你想要根据属性值聚合实例。
关联规则是另一种学习技术，我们要找到属性之间的
联系。
没有特定的类别，但我们在找属性之间的任何
很强的联系。
同样，在Explorer中有另一个控制面板，
文本分类（text classification）。
Weka中有一些很棒的文本过滤器，可以帮助你处理文本数据，
如单词、字符、或n-grams短语（三、四或五个连续的字符序列）。
你可以使用Weka做文本挖掘。
最后，我们只讲解了Weka Explorer，但是Weka Experimenter
也值得学习。
我们做了相当数量的枯燥、乏味的均值和标准差的计算，

English: 
before you do any learning, to select an appropriate
small subset of attributes to use.
There are a lot of clustering techniques in
Weka.
Clustering is where you want to learn something
even when there is no class value: you want
to cluster the instances according to their
attribute values.
Association rules are another kind of learning
technique where we're looking for associations
between attributes.
There's no particular class, but we're looking
for any strong associations between any of
the attributes.
Again, that's another panel in the Explorer.
Text classification.
There are some fantastic text filters in Weka
which allow you to handle textual data as
words, or as characters, or n-grams (sequences
of three, four, or five consecutive characters).
You can do text mining using Weka.
Finally, we've focused exclusively on the
Weka Explorer, but the Weka Experimenter is
also worth getting to know.
We've done a fair amount of rather boring,
tedious, calculations of means and standard

English: 
deviations manually by changing the random-number
seed and running things again.
That's very tedious to do by hand.
The Experimenter makes it very easy to do
this automatically.
So, there's a lot more to learn, and I'm wondering
if you'd be interested in an Advanced Data
Mining with Weka course.
I'm toying with the idea of putting one on,
and I'd like you to let us know what you think
about the idea, and what you'd like to see
included.
Let me just finish off here with a final thought.
We've been talking about data, data mining.
Data is recorded facts, a change of state
in the world, perhaps.
That's the input to our data mining process,
and the output is information, the patterns
-- the expectations -- that underlie that
data: patterns that can be used for prediction
in useful applications in the real world.

Chinese: 
通过手工改变随机数种子和重复运行。
手工去做是很乏味。
Experimenter可以很容易的自动运行。
所以，还有许多需要学习的东西，我在想你是否有兴趣学习高级版的
Weka与数据挖掘课程。
我有这个想法，我希望你告诉我们你的想法，
你想学习什么。
让我以最后一个话题来结束课程。
我们谈到了数据，数据挖掘。
数据是记录下来的事实，是世界的变化。
这是我们数据挖掘的输入值，输出值是信息，是基于数据的模式
（期望）：这些模式可以成为在现实世界中
有益的应用。

English: 
We've going from data to information.
Moving up in the world of people, not computers,
"knowledge" is the accumulation of your entire
set of expectations, all the information that
you have and how it works together -- a large
store of expectations and the different situations
where they apply.
Finally, I like to define "wisdom" as the
value attached to knowledge.
I'd like to encourage you to be wise when
using data mining technology.
You've learned a lot in this course.
You've got a lot of power now that you can
use to analyze your own datasets.
Use this technology wisely for the good of
the world.
That's my final thought for you.
There is an activity associated with this
lesson, a little revision activity.

Chinese: 
我们从数据得到信息。
上升到人类的世界，而不是计算机的，知识是你所有期望的集合，
你拥有的信息和信息的运作模式，大量的期望以及
他们在不同情况下的应用。
最后，我想以知识所含的价值来定义智慧。
希望你能明智地使用数据挖掘技术。
在这们课中你已经学习了很多知识。
你已经有能力分析自己的数据集。
请明智地使用技术为世界做好事。
这就是我最后想对你说的。
这节课有个小练习。

Chinese: 
请大家完成，然后去做期终测试。如果你做得足够好，
我们会给你颁发证书。
祝好运！很高兴能讲授这门课，也许我们会在高级讲座中
再次见面。
再见!

English: 
Go and do that, and then do the final assessment,
and we will send you your certificate if you
do well enough.
Good luck! It's been good talking to you,
and maybe we'll see you in an advanced version
of this course.
Bye for now!
