
Chinese: 
大家好！
欢迎学习More Data Mining with Weka
我是Ian Witten，
这门课的主讲教师。
这门课由新西兰怀卡托大学计算机科学系
推出，
是Data Mining with Weka
的后续课程。
这是一门实用课程，讲解如何使用Weka的高级功能完成数据挖掘。
和上一门课一样，
我们不需要学习编程，
只学习使用Weka的交互界面。
与此同时，我们学习一些数据挖掘的基本理论。
通过学习前期课程Data Mining with Weka，你应该了解如下内容：

English: 
Hello,
and welcome to More Data Mining with Weka.
I'm Ian Witten,
and I'm presenting the videos for this course,
which is brought to you by the Computer Science
Department at the University of Waikato in
New Zealand.
This course follows on from a previous course,
Data Mining with Weka.
It's a practical course on how to use the
advanced facilities of Weka for data mining.
As in the previous course,
we're not going to cover programming,
just the interactive interfaces to Weka.
We're going to pick up some basic principles
of data mining along the way.
We're assuming that you know about a number
of things that you will have learned in Data

Chinese: 
数据挖掘是什么？它有哪些用途？
我们的动机，
由简单开始，
Explorer界面的使用，
常用的分类器和过滤器算法，
结果评估，
解释输出
避免训练和测试集的陷阱、
以及整个数据挖掘的步骤。
这门课将不再重复上述内容。
如果你打算复习，
请观看优酷网WekaMOOC专辑中
的视频。
我们讲过，
weka是新西兰特有的鸟。
在我们看来，
他是数据挖掘工作台——怀卡托智能分析系统，
包含了很多机器学习的算法。
大量数据挖掘的算法：预处理算法、
属性选择、
,聚类、
关联规则、等等。
Weka是一个多功能的机器学习平台。

English: 
Mining with Weka: what data mining is and
why it's useful,
all the motivation,
simplicity first,
using the Explorer interface,
popular classifier and filter algorithms,
evaluating the result,
interpreting the outputs,
avoiding the pitfalls of training and testing sets,
and the overall data mining process.
We're not going to cover any of that in this course.
If you want a refresher,
then you can go to YouTube and look at the
WekaMOOC channel where you'll see all the
videos for the previous course.
As you know,
a weka is a bird found only in New Zealand,
but from our point of view,
it's a data mining workbench -- the Waikato
Environment for Knowledge Analysis,
which contains a lot of machine learning algorithms.
A very large number of algorithms for data
mining tasks: preprocessing algorithms,
feature selection,
clustering,
association rules -- things like that.
It's a pretty comprehensive machine learning
workbench.

Chinese: 
这门课我们将要学习如何使用Weka的其他功能。
我们已经知道了如何使用Explorer，
我们将要学习Experimenter，
知识流界面（ Knowledge Flow Interface）
和指令行界面（ Command Line interface）。
我们将要学习大型数据以及如何使用Weka处理。
我们会做一些文本挖掘。
我们会使用有指导和无指导过滤器。
我们会学习离散和取样、
属性选择、
分类标准、
规则和树的区别、
关联规则、
聚类、
重要的是
重要的是
你将学会使用Weka分析自己的数据，
更重要的是，理解你所做的事。
这就是本节课的内容。
“探索Weka界面，处理大型数据”
我们将会学习Weka 的Experimenter，

English: 
What you're going to learn in this course
is how to use the other interfaces to Weka.
We already know how to use the Explorer,
but we're going to talk about the Experimenter,
the Knowledge Flow Interface,
and the Command Line interface.
We're going to talk about "big data" and how
you deal with that in Weka.
We'll do some text mining.
We'll look at filtering using supervised and
unsupervised filters.
We'll learn about discretization and sampling.
We'll learn about attribute selection.
We'll learn about classification rules,
rules vs. trees,
association rules,
clustering,
cost-sensitive evaluation and classification.
Most of all,
I'm trying to get you to a point where you
can use Weka on your own data,
and -- most importantly -- understand what
it is that you're doing.
This is the first class.
It's called "Exploring Weka's interfaces,
and working with big data." We're going to
look at the Experimenter interface,

Chinese: 
用它来比较不同分类器，处理不同数集。
我们将会学习知识流界面、
简单命令行界面、
大型数据、
以及如何处理大型数据。
Explorer可以处理大量数据，
可以是大约一百万的实例，每个可有25个属性。
正确使用简单命令行界面，
你可以处理更多数据，
无限的数据集或数据流。
在练习中，
打个比方，
你会处理几百万、上千万的数据实例。
数据量非常大。
这门课和上门课的结构完全相同，共5部分。
每部分包含6节课。
第二部分是离散和文本分类。
然后我们学习分类规则、
关联规则和聚类。
之后我们学习属性选择和成本效益分类。
最后，第五部分我们学习Weka中的神经网络、
学习曲线、

English: 
which is used for comparing different classifiers
on different datasets.
We're going to look at the Knowledge Flow interface.
We're going to look at the Simple Command
Line interface.
And we're going to talk about big data,
how to work with big data.
The Explorer works with pretty big data,
about to maybe a million instances with maybe
25 attributes each.
Using the Simple Command Line interface in
the right way,
you can deal with much larger datasets,
effectively unlimited datasets or "data streams".
In the activity,
for example,
you'll process a multi-million instance dataset
with 10 million instances.
It's pretty big stuff.
The course is organized just like the previous
one with 5 classes.
Each class has got about 6 lessons.
The next class is on discretization and text classification.
Then we're going to look at classification rules,
association rules, and clustering.
Then we'll look at attribute selection and
cost-sensitive classification.
Finally, in Class 5 we'll look at neural networks,
learning curves,

English: 
and performance optimization.
There are six lessons in this class.
Each lesson consists of a little video,
like this one.
Following each lesson is an activity,
which you're strongly encouraged to do.
You're going to learn by doing here,
by doing the activities,
not just by listening to me talk about them.
We very strongly encourage you to do the activities.
However, you don't have to do them.
You don't have to complete them all.
The assessment for the course is based on
a Mid-class assessment and a Post-class assessment
worth 1/3rd and 2/3rds of the marks, respectively.
If you get more than 70% on these assessments
then you'll get a Certificate of Completion
at the end of the course.
I'd like you to download Weka now.
I'd like you to download the new version of Weka,
which has appeared recently: 3.6.11.

Chinese: 
和优化性能。
这部分有六节课，
每节课都是一个短小的视频，
和这段视频一样。
课后有练习。
我们希望你去做，
你将会在
做中学，
而不是仅凭听我讲课学习。
我们希望你去做练习，
但是练习不是强制的，
你不必完成所有练习。
这门课的测评是基于期中和期末考试的，
各占1/3和2/3的比例。
如果你能完成70%以上，就可以在课程结束后
得到结业证书。
请先下载Weka,
Weka的最新版本3.6.11
 最新的3.6.11.

English: 
That's not the one that was used
for the previous course.
It's the latest stable version of Weka.
You know how to download it,
so I'm not going to talk about that at all.
Just do it! 
We have the same text book as
we had for the previous course,
and the same excerpts the publisher has kindly
made available for free in e-book format.
Let me just finish off by saying this is
where New Zealand is,
at the top of the world.
We think of you as being "down under",
not us as being "down under".
We're in the top center of the world.
Here in New Zealand -- actually,
I've turned this map around with North at
the top,
which is probably what you're used to -- you
can see where the University of Waikato is
pointed to by the red arrow.
That's where I am.
That's the end of the first lesson.
Please go and do the activity associated with this lesson,
which actually involves all of the assessment
questions from the last course.

Chinese: 
和上一门课使用的版本不同，
这是最新版本。
你知道如何下载，
我不会再讲述下载指南。
请下载！
我们使用和前一门课一样的教材。
经出版商同意，我们可以免费使用书的部分电子版。
结束前，我们来看这张世界地图，
新西兰在这里，世界的顶端。
我们认为你在我们下面。
而不是我们在下面。
我们在世界的正上方。
这里是新西兰。
我把地图转过来，可以看到北岛在上面。
这是大家常见的新西兰地图，可以看到，这里是怀卡托大学，
红箭头所指的地方。
我就在这。
这是我们的第一节课，
请大家做课后的练习。
练习包括上一门课的测试内容。

English: 
Go through and do these assessments again.
Well,
you don't *have* to do them,
but you should certainly look at them.
If you know your stuff,
it won't take you long to do them,
and if you don't know your stuff,
it's particularly valuable that you do them.
Go ahead and look at that now,
and I'll see you in the next lesson.
Bye for now!

Chinese: 
希望大家再测验一下掌握的知识。
但是，
就算你不全部做一遍，
至少应该看看这些内容。
如果都会，
练习不会花费太多的时间；
如果还没有掌握，
做练习可以帮助你巩固知识。
希望大家自我检测。
下节课见！
再见!
