
Chinese: 
大家好！欢迎回到Weka与数据挖掘。我是Ian,这里是新西兰。
这是讲座1.2。我们这门课共分五部分，
每部分又包括六次课。这是第一部分的第二次课。 在这次课中，
我们要学习Weka的用户界面。首先，我们要下载Weka系统。
这是必须的步骤。
我们需要从这个网址下载。让我们直接打开网页，

English: 
Hi! Welcome back to the course on Data Mining
with Weka. I'm Ian up here in New Zealand.
This is Lesson 1.2. Remember there are five
classes in this course, and each class consists
of about six lessons. This is the second lesson
of the first class. We're going to explore
the Explorer, the Weka Explorer interface.
Actually, first we're going to download the
Weka system. This is something you're going
to have to do on your computer. We're going
to download it from this URL. Withouth delay,
let's go straight there. Here we are.

English: 
This is www. cs.waikato.ac.nz/ml/weka. You can
read about Weka here. I'm going to go straight
to the Download button and download and install
Weka on my computer. I'm running on a Windows
machine here, but there are versions down
at the bottom you can see for Mac OS X and
Linux and so on. You need to download the
appropriate version for your machine.
We want Weka 3.6.10. That's the latest version of
Weka. I'm going to download a self-extracting
executable without the Java Virtual Machine.
I already have the Java Virtual Machine on
my computer. I'm going to click here, but
you're going to need to do whatever's appropriate
for you computer.
While it's downloading, let's have a word
about the pronunciation of the word 'Weka'.

Chinese: 
www. cs.waikato.ac.nz/ml/weka. 你可以在线阅读Weka的相关内容。
我们来直接点击下载，下载并安装Weka到我的电脑。我使用的是Windows系统，
但是在网页下部我们可以看到Mac OS X，
Linux等系统的下载版. 我们需要下载适合的版本。
下载Weka 3.6.10，最新的版本。我要下载的是 "a self-extracting
executable without the Java Virtual Machine"。我的电脑上已经安装了Java虚拟机。
我要点击这里，但是你要选择适合你电脑的
版本。
趁着下载，我们来谈谈Weka这个词的发音。

Chinese: 
请读Weh-kuh。我们不希望把它叫做'weaker'（较弱的）系统，不是'weaker'，是
‘Weka’，读作'Mecca'。这才是Weka鸟和这个软件的名称。
我想下载已经完成了。我们来打开它。这是一个标准的安装向导。
我们来安装Weka 3.6.10。点击next。
同意GNU公共许可协议。下载整个文件包。
安装到默认文件夹。一定要记住默认地址的名称。
过一会我们需要访问那里。我们要安装整个系统。
需要等一两分钟。 我去取杯咖啡，马上回来。
 

English: 
It's called Weh-kuh. We don't like calling
it 'weaker' system. It's not 'weaker', it's
Weka, pronounced to rhyme with 'Mecca'. That's
the name of the bird, that's the name of our
software. Weka. I think it might have downloaded.
I'm going to open it. This is a standard kind
of setup wizard. We're installing Weka 3.6.10.
I'm just going to keep clicking next here.
Yes, I'm happy with this GNU public license.
I'm going to have a full install.
I'm going to install it in the default place—just
need to remember the name of this place.
We're going to need to visit there in a moment.
We're going to install the whole thing.
This is going to take a couple minutes. I'm just
off for a cup of coffee; I'll be back in a second.
 

Chinese: 
已经安装好了，让我们继续。点击完成就会启动Weka。
在这之前还需要做一两件准备工作，我暂不打开Weka,不选取启动，然后点击完成。
我们先找到已下载的Weka文件。 在我的电脑
程序文件夹中。应该就在这里—Weka3.6。因为课程中会经常用到Weka,
我们来建立一个快捷方式，并把它放在桌面上。
还需要做一件事，打开这个文件夹，
找到名为Data的文件夹。这个文件夹包含了很多我们要用到的数据集。
我们来复制，粘贴这个文件夹到便捷的地方，

English: 
Now, it's installed. Let's just carry on here.
I want to click Finish, but actually I'm not
going to start Weka. I'm going to uncheck
that, and click Finish, because there are
a couple of things I want to do first. Let's
go and see where Weka is. It's on my computer
in Program Files. It should be down here—Weka
3.6. I'm going to create a shortcut to that,
because we're going to be using it a lot in
this course. I'm just going to put it on the desktop.
Then, I'm going to do one more thing.
I'm going to go inside this folder, and I'm
going to look at the data folder. This contains
a bunch of datasets we're going to be using.
I'm going to take this folder and copy it
and put it somewhere convenient.

English: 
Let's cut that, and I'm going to put it in My Documents
folder. I'm going to rename it Weka datasets.
I'm all set. I finished installing Weka. 
I've got my shortcut to Weka here.
I made my shortcut to the wrong place. I meant to 
make the shortcut to this here. Let me just make
a shortcut here. Create shortcut, put it on the desktop.
That's the one I want. Now, when I click here,

Chinese: 
到我的文档。重新命名为Weka datasets。
准备就绪，安装了Weka，建立了快捷方式。
我将快捷方式连到了错误的地方，它应该指向这里，需要创建一个新的快捷方式。
在桌面上创建快捷方式。这就对了。现在，如果我点击这里，

Chinese: 
就会打开Weka。让我们回到幻灯片。Weka有四个界面。Explorer
是我们这门课要用到的界面，我们只用Explorer。
Experimenter界面是针对基于不同数据集的不同机器学习方法的
大规模性能比较。KnowldgeFlow界面
是Weka的图形界面。还有命令行界面。
但我们只讲解Explorer界面。点击Explorer。
正上方有五个不同的面板: 预处理面板，
分类面板，你可以对数据进行分类，聚类面板，尽管我们不讲解聚类，
这也是Weka所长之处，关联规则，属性选择和

English: 
it will open Weka. Back to the slide. There
are four interfaces in Weka. The Explorer
is the one that we'll be using throughout
this course. We're just using the Explorer.
But also, there is the Experimenter for large
scale performance comparisons for different
machine learning methods on different datasets.
There's the KnowldgeFlow interface, which
is a graphical interface to Weka tools, and
there's a command-line interface. But we're
just going to use the Explorer, so let's get
on with it. Here's the Explorer. Across the
top, there are five panels: the Preprocess
panel;
the Classify panel, where you build classifiers for datasets;
 clustering, another procedure Weka is good at, although we won't
be talking about clustering in this course;
association rules; attribute selection; and

English: 
visualization. In this course, we'll be using
mainly the Preprocess panel to open files
and so on, the Classify panel to experiment
with classifiers, and the Visualize panel
to visualize our datasets. I'm going to open
a dataset. The dataset that I'm going to open
is the weather data; it's a little toy dataset
that we'll be seeing a lot of in this course.
It's about 14 instances, 14 days, and for
each of these days, we have recorded the values
of five attributes. Four to do with the weather:
Outlook, Temperature, Humidity, and Windy.
The fifth, Play, is whether or not we're going
to play a particular, unspecified game.
Actually, what we're going to be doing is predicting
the Play attribute from the other attributes.
Let's not worry about that at the moment.
Let's just open the dataset and take a look
at it in Weka. Here's My Documents. Here are
the Weka datasets; this is what I copied.

Chinese: 
可视化面板。这门课中，我们只学习使用预处理面板打开文件，
用分类面板做数据分类，用可视化面板
来可视化数据。我要打开一个数据集。我现在打开的是
天气数据，我们这门课常会用到小数据集。
数据集包含14个样本，14天的天气，每天又包括
五个属性。其中四个与天气相关:阴晴属性、温度属性、湿度属性和刮风属性。
第五个属性，玩，是指此天气是否适宜某种游戏。
我们需要做的是通过其他属性来预测玩游戏的可能性。
先不必担心这点。我们来用Weka打开这个数据集。
找到我的文档，Weka数据集，即我备份的数据。

English: 
I'm going to open weather.nominal.arff. All
Weka data files are called ARFF files.
We'll talk about that later on. This is the weather
data. Just ignore these colorful bars at the moment.
There are 14 instances; those correspond
to the 14 days that we saw in the dataset
on the slide. For each day, we have five attributes:
outlook, temperature, humidity, windy, and
play. If you select one of these attributes—outlook
is selected at the moment—we can see the
values. The values for the outlook attribute
are sunny, overcast, and rainy. These are
the number of times they appear in the dataset:
5 sunny days, 4 overcast days, and 3 rainy
days for a total of 14 days, 14 instances.
If we look at the temperature attribute, hot,

Chinese: 
打开文件weather.nominal.arff。所有的Weka数据文件都是ARFF文件。
我们之后会进一步说明。这就是天气数据。先不看这些彩色的柱状图。
在幻灯片中，有14个样本，也就是我们在数据集中看到的14天的天气。
每一天都对应五个属性：阴晴属性、温度属性、湿度属性、刮风属性和是否能玩游戏的属性。
选择一个属性，如阴晴，就可以看到相应数值。
阴晴属性的数值为晴朗、多云和有雨。
每个数值在数据集中出现的次数为:晴天5天，多云4天，雨天3天，共14天，14个样本。
我们来看温度属性，炎热、温和、凉爽是

Chinese: 
常见的数值。我们也可以看到他们在数据集中出现的次数。
再来看是否能玩游戏的属性，只有两个数值，yes与no。
好，我们现在来看这两个柱状图。蓝色柱状图代表yes，红色代表no。
如果我们点击其他的属性，如阴晴属性，就可以看到
当数值是晴朗时，3天不适宜玩游戏，2天适合玩游戏。
当数值为多云时，4天适合玩游戏，没有不适合的情况。
这就是属性值的柱状图，我们可以根据属性值做预测。
数据可视化是非常有用的。我们已经在Weka中打开了

English: 
mild, and cool are the possible values, and
these are the number of times they appear
in the dataset. Let's go to the play attribute.
There are two values for play, yes or no.
Now, let's look at these two bars here. Blue
corresponds to yes, and red corresponds to no.
If you look at one of the other attributes,
like outlook, you can see that when the outlook
is sunny—this is like a histogram—there
are three no instances and two yes instances.
When the outlook is overcast, there are four
yes instances and zero no instances.
These are like a histogram of the attribute values
in terms of the attribute we're trying to predict.
It makes it kind of useful to click
around and visualize your data. We've opened

English: 
the weather data, weather.nominal.arff. We've
looked at the attribute values and the attributes
in Weka. There's one more thing I want to
do before we summarize here. I want to go
to the Edit panel. If I go to the Edit panel,
I see the data in the form that it was on
the slide, with the 14 days down here and
the 5 attributes across here. This is another
view of the data. I can actually change this
dataset. If I click here, I can change this
no to yes. Or, if I click here, I can change
on this day, the outlook from rainy to sunny.
If only it were so easy in real life to change
a day from rainy to sunny. Then I can click
OK, and we've got this edited dataset, which
we could save if we'd like. We haven't saved

Chinese: 
天气数据weather.nominal.arff，看到了不同属性和它们的值。
在结束前，还有最后一点需要说明。点击
编辑面板，就会看到之前我们在幻灯片中看到的数据表，
14天的样本，5个属性。这是另一种读取数据的方式。
实际上，我可以在这里更改数值。点这里，
可以把no变成yes。点这里，可以把雨天变成晴天。
真希望现实生活中雨变晴也这么容易。点击
OK，我们就得到一份更改过的数据。我们可以保存，但我没有保存。

English: 
any of this. The dataset on the disk is still
the same as it was. I'm not going to save
it, and I don't think you should save it,
because we're going to be using this dataset
quite a bit in this course. This is what we've
done in this lesson. We've installed Weka.
We've got the datasets. We've opened the Explorer.
We've looked at a dataset—the weather.nominal.arff
dataset. We've looked at the attributes and
their values. We've edited the dataset, and
we didn't save it. You can read more about
this in the course text. Section 1.2 talks about
the weather data, and Chapter 10 is a little
introduction to the Weka system. Now you should
go and do the activity associated with this
lesson. Good luck, and I'll see you in the
next lesson. Bye for now!

Chinese: 
硬盘中的数据集还和以前一样。我不打算保存，
你也不要保存，因为我们今后还会多次用到这组数据。
这就是今天的所有内容。我们安装了Weka，
找到了数据集，打开了Explorer界面，
在Weka中读取了数据weather.nominal.arff。
我们看了数值属性和数值，编辑了数值，
但没有保存。你可以从阅读材料中了解更多相关内容。1.2主要讲解了
天气数据，第10章是关于Weka的更多介绍。现在你可以
做这节课的课后练习了。祝你成功！
下次课见！
