
English: 
Hi! Well, they say all good things come to
an end, and this is the end of "More Data Mining
with Weka":
the last class.
Let's just summarize a few things here.
This summary is actually from the previous
course, "Data Mining with Weka".
These are the main messages I wanted to convey
there, and it's the same main messages this time.
There's no magic in data mining.
There's no single universal "best method".
It's an experimental science.
Weka makes it easy to experiment, especially
now that you know how to use the Experimenter.
But there are many pitfalls, and many ways
to go wrong.
You really need to understand what it is that
you're doing, and be focused a lot on evaluation
and statistical significance using the Experimenter.

Chinese: 
大家好！但正如人们所说，天下没有不散的筵席。
这节课是"More Data Mining with Weka"的结束：
最后一课。
这里，让我们总结几个要点。
实际上，这也是对上一期课程"Data Mining with Weka"的总结。
这些是我上次想要告诉你的，也是这一次我想说的。
数据挖掘中没有魔法。
没有单一的通用的“最佳方法”。
这是一个实验科学。
Weka使实验变得简单，特别是现在，你知道如何使用Experimenter。
但也有很多陷阱，并且有很多岔道。
你必须深刻理解你在做的事情，
使用试验者来注重评估和统计显著性。

Chinese: 
上一期课结束时，我也谈到了所有这些。
如果想获得更深入的理解，你可以回去看看视频。
上一期课程中，这张幻灯片也出现了。
这是我们这期的新内容：过滤分类器，使用成本矩阵，
属性选择，聚类，关联规则，文本分类，
和使用试验者。
这些你听起来应该很熟悉，因为本期课中
我们已经详细讲解过这些内容。
再多说就多余了！
我们谈到了大数据。
我们体验了大数据。
我们谈到了命令行界面；知识流界面；
通过NaiveBayesUpdateable使用命令行界面处理数据流；
离散和离散过滤器；
树和规则的差异--
相似性和差异性；

English: 
I talked more about all of these points at
the end of the last course.
You could go back and look at that video if
you'd like some more expansion on these.
This slide is also from the last course.
This is what we missed from "Data Mining with
Weka": filtered classifiers, working with cost
matrices, selecting attributes, clustering,
association rules, text classification, and
using the Experimenter.
These should all sound very familiar to you,
because we've talked about them all extensively
in this course.
Plus more besides!
We talked about big data.
you experienced big data.
We talked about the Command Line Interface;
the Knowledge Flow Interface;
streaming data using the Command Line Interface
through NaiveBayesUpdateable;
discretization and discretization filters;
the difference between rules and trees --
the similarities and differences;

English: 
Multinomial Naive Bayes for text classification.
We had a little look at neural networks: the
simple Perceptron and the Multilayer Perceptron.
We learned about ROC curves, and learning curves,
and some more stuff about the ARFF format
and the XML version of it.
We've done a lot.
You've done a lot, actually, and I congratulate
you on having got this far.
This has been pretty intensive stuff.
You've learned a lot about a lot of important
things.
Of course, there's always more! Time series
analysis is a really important area:
how to do data mining on time series.
Stream-oriented algorithms:
NaiveBayesUpdateable is stream-oriented,
but there exist stream-oriented versions
of other algorithms, like decision tree methods.
They're in a MOA package, Massive Online Analysis,
also from the University of Waikato.

Chinese: 
用于文本分类的多项式Naive Bayes。
我们了解了神经网络：简单感知器和多层感知器。
我们学习了ROC曲线，学习曲线，
ARFF格式和它的XML版本。
我们已经做了很多。
你做了很多事情，实际上，恭喜你走到这一步。
这些内容十分紧凑。
你学到了很多重要的事情。
当然，总是还有更多！时间序列分析是一个非常重要的领域：
如何对时间序列进行数据挖掘。
面向流算法：
NaiveBayesUpdateable是面向流的，
但其他算法也存在的面向流的版本，像决策树方法。
它们在MOA程序包中，大量的在线分析，也来自怀卡托大学。

English: 
Multi-instance learning, where it's not single
instances, but bags containing several instances
that are labeled positive or negative.
One-class classification, where you don't
have any information about the negative class,
just about the positive class.
That makes things very difficult, but there
are some things you can do.
Other data mining packages.
There's a package called R, which has a lot
of excellent resources.
Actually, you can interface to this from Weka,
so Weka can take advantage of those resources.
Also, there's the LibSVM package for support
vector machines and the LibLinear package
for linear classification.
They can all be reached through the Weka interface
with the appropriate wrapper package.
There's a distributed version of Weka with
the Hadoop system for distributing processing.
Finally, there's a technique called "latent
semantic analysis" that you really need to
know about to work on text classification.
All of these things are available as packages
for Weka.

Chinese: 
多实例的学习，不是单一的实例，
而是装了几个实例，被标记为正的或负的袋子。
单一类别分类，我们没有任何负类的信息，
只有正类的信息。
这使得学习变得很困难，但是我们可以借助一些工具。
其他数据挖掘软件包。
有一个叫做R的软件包，它有许多优良的资源。
实际上，我们可以从Weka接合R，
所以Weka可以利用这些资源。
同时，还有支持向量机LibSVM软件包和
线性分类LibLinear软件包。
通过适当的包装界面，这些软件包都可以连接到Weka上。
有一个分布式的Weka，它采用Hadoop系统，可以进行分布式处理。
最后，有一种叫做“潜在语义分析”的技术，
在文本分类时是很有用的。
所有这些都可以作为Weka的软件包。

Chinese: 
新版本Weka有一个“软件包”系统，很多的功能已从核心中去除变成
可选的，可下载的软件包。
学习新版本将十分有趣，我在想是否应该
再开一门叫做“Advanced Data Mining with Weka”的课程。
我们称本期课程为“More Data Mining”，巧妙地为第三期，高级课程，留出空间。
我想知道你是否会对学习高级课程有兴趣。
这里是一些最后的叮嘱。
数据挖掘是非常重要的。
人们把数据看作新的石油。
数据挖掘的经济和社会的重要性可以和石油经济媲美--
有人说截至2020；
它可能在我们说这话时正成为现实。
你处于这样一个时代。
了解数据挖掘是一件美妙的事。
这是一个（知识）爆炸的领域。
它将持续迅速地发展。
个人数据正在成为一种新的经济资产。

English: 
The new version of Weka has a system of "packages",
where a lot of stuff has been taken out of
the core and put into optional, downloadable
packages.
That would be interesting to learn about, and
I'm wondering whether we should be thinking
of an "Advanced Data Mining with Weka" course.
We called this course "More Data Mining", which
cunningly leaves room for a third, advanced course.
I'd be interested to know if you'd be interested
in that kind of thing.
Here are just a few, final remarks.
Data mining is really important.
They're talking about data as the new oil.
The economic and social importance of data
mining will rival that of the oil economy --
some people say by 2020;
it might be happening as we speak.
You're right in there.
Data mining is a wonderful thing to know about.
It's an exploding field.
It will continue to explode.
Personal data is becoming a new economic asset
class.

English: 
You know, it used to be that the data revolution,
the internet revolution, was about our ability
to learn stuff from the internet.
You know, Wikipedia and all the things you
can learn.
But a lot of it now is about personal data,
our own personal data and the economic importance
of that.
We need a lot more trust than we have at the
moment between individuals and governments
and the private sector in order to take full
advantage of this new, economic asset.
We had a lesson on ethics in the last course.
We haven't had a lesson on ethics here, but
it's just as important.
I would urge you to think ethically whenever
you're working with data.
"A person without ethics is a wild beast loosed
upon this world," Albert Camus said.
I don't want to loose a whole bunch of wild
beasts through this course.

Chinese: 
你知道，我们曾经有数据革命，互联网革命，
是测试我们从网络学习东西的能力的时候了。
你知道可从维基百科和其它东西中学习。
但是现在更多的是关于个人数据，我们自己的个人资料，
在经济上的重要性。
为了充分利用这一新的经济资产，
个人与政府和企业需要增进彼此之间的信任。
在过去的课程中，我们已经讲过道德。
本期课程，我们还没有谈到道德，但并不意味着这不重要。
无论何时，在处理数据时，都要遵循道德规范。
“一个人，没有道德，就像世上游走的野兽一般。”阿尔贝•加缪说。
我不希望本课使你们变成一群“野兽”。

English: 
So please think of ethics and what is ethical
and the right kind of thing to do when you're
working with other people's data.
Finally, wisdom:
you know, the value attached to knowledge.
This is the really important thing. Jimi
Hendrix is supposed to have said: 
"knowledge speaks, but wisdom listens", 
which is worth pondering.
There's no activity associated with this lesson.
There's just the end-of-class assessment,
which you should go and do now.
If you do well enough in that and the mid-class
assessment, then we'll be sending you a signed
Statement of Completion from the University
of Waikato.
Meanwhile, I've enjoyed giving this course,
and I hope maybe I'll meet you again in another
version of this course.
But for now, I'm just going to relax and play
some music while you do the assessment.

Chinese: 
所以当你使用其他人的数据时，请想一想，道德和什么是符合道德规范的，
以及什么是应该做的事情。
最后，智慧：
你知道知识伴随着价值。
这是真正重要的东西。吉米•亨德里克斯说过：
“学者说，智者听”，这是值得思考的问题。
这节课没有课后练习。
只有期末作业，你现在应该去完成它。
如果期末和期中作业都做得足够好，
我们会给你一个由怀卡托大学颁发的证书。
同时，我享受教课的过程，
希望我能在下一个版本的课程中再见到你。
但是现在，在你做作业的时候，我要去放松一下，演奏一些乐曲。

English: 
Bye for now!

Chinese: 
再见啦！
