
English: 
Hello! In the last lesson, we looked at
using a classifier in Weka, J48.
In this lesson, we're going to look at
another of Weka's principle features:
filters.
One of the main messages of this course
is that it's really important when you're
data mining to get close to your data,
and to think about preprocessing it,
or filtering it in some way
before applying a classifier.
I'm going to start by using a filter
to remove an attribute from the weather data.
Let me start up the Weka Explorer and open
the weather data.
That's the one.

Chinese: 
大家好！上一节课，我们学习了如何在Weka中使用J48分类器。
这次课，我们将了解Weka的另一个主要功能：
过滤器。
这次课旨在提高大家在进行数据挖掘时，
对深入了解数据，
和在使用分类器之前预处理
或过滤数据的重要性的认识。.
我们来使用过滤器从天气数据中删除一个属性。
打开Weka Explorer 界面。
载入天气数据。
就是这个。

Chinese: 
接下来，删除
属性湿度（humidity）。
湿度的序号是3。
我们可以点击“Choose”按钮打开过滤器列表，就像我们之前
在分类器面板选择分类器一样。
点击“Choose”，选择过滤器。
这里有很多种不同的过滤器：
Allfilter 和 MultiFilter 用于合并使用多种过滤器。还有监督和
无监督过滤器。监督过滤器在过滤时会
使用类的值。
它们不如不使用类值的无监督过滤器更为广泛应用。
 这些是属性和实例过滤器。 我们要删除一个属性，
因此要寻找一个属性过滤器。
Weka提供许多种过滤器，你得学会如何找到
想要的过滤器。
我要删除一个属性，
点这里，Remove。
象之前
配置分类器J48一样，我们可以单击这里，
我单击这里来
配置过滤器

English: 
I'm going to remove
the humidity attribute.
That's attribute number 3.
I can look at filters, just like we chose
classifier using this Choose button
on the Classify panel,
we choose filters by using
the Choose button here.
There are a lot of different filters.
Allfilter and MultiFilter are ways of
combining filters. We have supervised and
unsupervised filters. Supervised filters
are ones that use a class value for
their operation.
They aren't so common as unsupervised
filters, which don't use the
class value. There are attribute filters and
instance filters. We want to remove an attribute,
so we're looking for an attribute filter.
There are so many filters in Weka that
you just have to learn to kind of look around
and find what you want.
I'm going to look for removing an attribute.
Here we go, Remove.
Now, as before
when we configured the J48
classifier, we clicked here.
I'm going to click here, and we can
configure the filter.

English: 
This is "A filter that removes a
range of attributes from the dataset".
I can specify a range of attributes here.
I just want to remove one. I think it was
attribute number 3 we were going to remove.
I can invert the selection and remove
all the other attributes and leave 3,
but I'm just going to leave
it like that. Click OK,
and watch humidity go
when we apply the filter. Nothing
happens until you apply the filter.
I've just applied it,
and here we are, the humidity
attribute has been removed.
Luckily, I can undo the effect of that
and put it back by pressing the Undo button.
That's how to remove an attribute.
Actually, the bad news is there is a
much easier way to remove an attribute.
You don't need to use a filter at all.
If you just want to remove an attribute,
you can select it here and click
the Remove button at the bottom.
It does the same job.
Sorry about that.
But filters are really
useful and can do much more
complex things than that.

Chinese: 
这是一个可以从数据集删除一个范围属性的过滤器。
可以在这里输入要删除的属性的序号范围。
我只想删除一个属性，要删除的属性序号是3。
我也可以颠倒设定，删除除属性3以外的所有属性，
但是现在我不想那么做。点击“OK”,
单击“Apply”之后。
看着湿度属性消失。任何事都不会发生直到你单击Apply。
我已经点击了Apply，
现在我们看到，湿度属性已经被删除了。
幸运的是，通过单击“Undo”,我可以撤消刚才的操作，恢复湿度属性。
这就是如何删除一个属性。
实际上，我们有更简单的删除属性的方法。
我们根本不需要使用一个过滤器。
如果想删除某个属性，
                   
47
00:02:48,210 --> 00:02:50,900
只需要在这里选择它，并且单击面板底部的Remove按钮，
会得到一样的结果。
非常抱歉。.
但是过滤器是十分有用的而且可以做比这更加
复杂的事情。

Chinese: 
让我们看下个例子。试想这次不删除
某个属性, 而是删除所有湿度值为high的实例。
也就是，第三个属性的第一个值。数据集中的7个实例
将被删除。数据集一共有
14个实例，所以我们
还剩下只包含7个实例
的数据集。
我们来找一个合适的过滤器。我们想
删除实例，所以要选择一个实例过滤器。
看看这里，
有没有一个合适的过滤器。
找到RemoveWithValues
过滤器。
单击打开配置面板，
单击More获取相关介绍。这里写着
“滤除具有特定属性值的实例”，
正是我们想要的。

English: 
Let's, for example, imagine removing,
not an attribute, but let's remove
all instances where humidity has the value 'high'.
That is, attribute number 3 has this first
value. That's going to remove seven
instances from the dataset. There are
fourteen instances
altogether, so we're going to get
left with a reduced dataset of
seven [instances].
Let's look for a filter to
do that. We want to
remove instances, so it's
going to be an instance filter.
I just have to look down here and
see if there is anything suitable.
How about RemoveWithValues?
The RemoveWithValues filter.
I can click that to configure it,
and I can click More to see
what it does. Here it says it
"Filters instances according to
the value of an attribute",
which is exactly what we want.

Chinese: 
我们现在设置属性序号，我们想要的是第三个
属性，湿度，
的第一个值。我们可以删除一系列不同的的值。我们现在来删除
第一个值。
现在我们已经配置好了。
需要Applay过滤器。
当我点击Apply，看看会有什么变化。
我们保留了属性湿度，但是没有了湿度值为high
的实例。 实际上，数据集被减少到
七个实例。
完成过滤处理后，你可以保存过滤结果。如果需要，我们可以保存
过滤后的数据集，但是
我现在不想保存。
我要撤消过滤操作。
我们已经删除了湿度值为high的实例。
在选择过滤器的时候，我们必须考虑
是用监督过滤器还是无监督过滤器，
用属性过滤器还是实例过滤器。
之后，就是用你的常识

English: 
We're going to set the attributeIndex;
we want the third
attribute, humidity,
and the first value. We can remove
a number of different values; we'll just remove
the first value.
Now we've configured that.
Nothing happens until we apply the filter.
Watch what happens when we apply it.
We still have the humidity attribute
there, but we have zero
elements with high humidity. In fact,
the dataset has been reduced to only
seven instances.
Recall that when you do anything here,
you can save the results. So, we could save that
reduced dataset if we wanted, but
I don't want to do that now.
I'm going to undo this.
We removed the instances where humidity is high.
We have to think about,
when we're looking for filters, whether we want
a supervised or an unsupervised filter,
whether we want an attribute
filter or instance filter,
and then just kind of use your

English: 
common sense to look down the list
of filters to see which one you want.
Sometimes when you filter data
you get much better classification.
Here's a really simple example.
I'm going to open the glass dataset
that we saw before.
Here's the glass dataset. I'm
going to use J48, which we did before.
It's a tree classifier.
I'm going to start that,
and I get an accuracy of 66.8%.
Let's remove Fe,
that is, Iron. Remove this attribute,
and we get a smaller dataset.
Go and run J48 again.
Now we get an accuracy of
67.3%. So, we've improved the
accuracy a little bit
by removing that attribute.
Sometimes the effect is pretty dramatic.
Actually, in this dataset, I'm going to remove

Chinese: 
在过滤器列表中找到你想要的过滤器。
有时候，通过过滤数据你可以得到较好的分类。
举个非常简单例子。
我们下面载入之前见过的
玻璃数据集。
这是玻璃数据集。我打算使用之前用过的J48分类器。
J48是一个树分类器。
我运行一下，
得到的准确度是66.8%。
让我们删除属性铁（Fe），
这里，铁。删除这个属性，
得到一个更小的数据集。
再次运行J48。
这次我们得到的准确度是
67.3%。所以，通过删除属性铁，
我们提高了分类的准确度。
有时候，效果会十分明显。实际上，我将删除折射率和镁(Mg)之外的

English: 
everything except the refractive index
and Magnesium (Mg). I'm going
to remove all of these attributes.
Left with a much smaller
dataset with two attributes.
Apply J48 again.
Now, I've got an even better result,
68.7% accuracy.
I can visualize that tree,
of course, remember by right
clicking here and visualizing the tree,
and have a look and see what it means.
It is much easier to visualize the trees
when they are smaller.
This is a good one to
look at and consider what the
structure of this descision is.
That's it for now.
We've looked at: filters in Weka;
supervised versus unsupervised,
attribute versus instance filters;
to find the right filter you need to look;
they can be very powerful and
judiciously removing attributes can

Chinese: 
所有属性。
删除所有的其他属性。
得到一个只包含两个属性的小数据集。
再次运行J48。
现在，我们得到了一个更好的结果，
68.7%的准确度。
右击这里，创建可视化决策树模型，
选择可视化决策树。
我们来看一下，它代表什么。小的决策树更加
容易理解。
这是应用过滤器的
一个很好的实例。
以上是这节课的主要内容。
我们了解了Weka中的过滤器，
监督和无监督过滤器，属性和实例过滤器；
学习了如何找到正确的过滤器。
明智而审慎地使用过滤器可以

English: 
both improve performance and
increase comprehensibility.
If you'd like, for some background reading
on this, go to the textbook and
have a look at Section 11.2 on
Loading and filtering files.
Then, go and do the activity
associated with this lesson.
Bye for now!

Chinese: 
提高分类的准确度并且使分类结果更加清晰。
如果你有兴趣了解更多内容，可以阅读课本
的11.2节，加载和过滤文件，
然后去做本节课的课后练习。
下次见！
