
Chinese: 
大家好！
上节课，我们学习了成本敏感评估，即使用成本矩阵
评估分类结果；
还了解了成本敏感分类器，分类的目标是
最小化成本而不是最大化分类正确率。
但是我们没有谈到成本敏感分类器是怎么工作的。
这就是我们这节课要学习的内容。
有两种使分类器成本敏感的办法。
术语有点让人困惑。
第一种方法叫做“成本敏感分类”，第二种方法叫做
“成本敏感学习”。

English: 
Hello again.
In the last lesson, we looked at cost-sensitive
evaluation, where you use a cost matrix to
evaluate the result of a classifier;
and cost-sensitive classification, where the
classification is performed with the aim of
minimizing the cost rather than minimizing
the percentage accuracy.
But we didn't talk about how you do cost-sensitive
classification,
and that's what we're talking about in this
lesson, 4.6.
There are two ways of making a classifier
cost-sensitive.
The terminology is a little bit confusing.
The first method is going to be called "cost-sensitive
classification" and the second method is going
to be called "cost-sensitive learning".

English: 
For cost-sensitive classification, what we
do is adjust a classifier's output by re-calculating
the probability threshold.
I've opened the german_credit dataset, with
1000 instances.
I'm going to classify this with Naive Bayes.
I get this matrix here.
If I set Output predictions, which I've set,
then I can see in the output the actual predictions
for the 1000 instances.
I've written those down here--not all 1000,
I've just taken every 50.
I've got 20 results here: the actual class
of the instance, the predicted class of the
instance, and Naive Bayes' probability that
the instance is a "good" one rather than a
"bad" one.

Chinese: 
对于成本敏感分类，我们需要做的是调整分类器的
概率阈值。
我们打开german_credit数据集，有一千个实例。
我们用Naive Bayes分类。
这里我们得到一个矩阵。
如果设定输出预测结果，然后我们可以在输出窗口看到
一千个实例的实际预测结果。
我已经把它们记录在这里了--不是全部，每五十个取一个。
我们得到了二十个结果：实例的真实的分类，
预测分类， Naive Bayes得到的概率，即实例属于
“好”的一类而不是“坏”的一类的概率。

English: 
And I've sorted this list by the probability
column.
In fact, the effect of Naive Bayes is, it
looks to see if the "good" probability is
bigger than the "bad" probability, which is
the same as saying, "is the good probability
bigger than 0.5?" It's like drawing a horizontal
line at 0.5, between instance number 750 and
instance number 800.
Everything above that line is going to be
classified as "good", and everything below
the line is going to be classified as "bad".
Going back to that "classified as" matrix,
the confusion matrix, 605 plus 151, that's
756 instances that are going to be classified
as "good".
95 plus 149, that's [244], are going to be
classified as "bad".
Then, within those, if we were to look at
the matrix with the actual classes, and we

Chinese: 
我们把这个清单按照概率排序。
实际上，Naive Bayes的判定标准是“好”的概率
是否高于“坏”的概率，也就是说，“‘好’的概率
是不是高于0.5？”就像是在0.5上画一条线，
在编号750的实例和编号800的实例之间。
所有直线上面的实例都归为“好”，
所有直线下面的实例都归为“坏”。
回到“被归分类为”矩阵，混淆矩阵，605加151，
总共756个实例将被归为“好”。
95加149，也就是244，将被归为“坏”的一类。
然后，如果我们考虑到这些实例的真实类别，

English: 
counted the number of "bad" ones above the
line, we find 151 "bad" ones above the line--those
are misclassifications--and 95 "good" ones
below the line, which are misclassifications.
We don't actual have to use a threshold of
0.5.
This is exactly the same table of the actual
and predicted, but I've changed the threshold
to 0.833.
That gives me the classification matrix that's
shown here, and a total cost--using the cost
matrix we were talking about in the last [lesson],
where one kind of error costs 5 times the
cost of the other kind of error--we get a
total cost here of 517, versus 850 for the
threshold of 0.5 on the previous slide.
You can see that if you count up the numbers
above the line, then there are 501 of them

Chinese: 
我们数一数直线上的类别为“坏”的实例，我们找到151个--这些被错误的归类了--
直线下面有95个“好”的实例也被错误的归类了。
我们并不一定要用0.5作界限。
这还是一个关于真实类别和预测类别的表格，
但是我们把阈值定为0.833。
我们得到这里这个分类矩阵，和总的成本--
用我们上节课提到的成本矩阵，一种误差的成本是
另一种误差的五倍--总的成本是517，
而阈值是0.5时，总的成本是850。
如果我们数一下直线上的实例数，一共501个

English: 
(448+53), of which 53 are "bad".
Then count up the number below the line and
look at the number of "good" ones there; those
are the errors.
In general, it's not hard to show that, given
a general cost matrix 0, λ, μ, 0, you minimize
the expected cost by classifying an instance
as "good", setting the threshold at μ/(λ+μ),
which is where we got the 0.833 from for this
problem.
That's what you do for Naive Bayes, but what
about methods that don't produce probabilities?
Well, they almost all do produce probabilities.
Let's look at J48.
Imagine J48 with minNumObj set to 100.
I've done this to force a small tree.
I won't do it for you, but I'd get the tree
shown here.

Chinese: 
（448+53），其中53个是“坏”的实例。
然后，数一数直线下的实例，看有多少“好”的实例，
即有多少误差。
一般来说，想得到这个结果并不难，假设一个常用的成本矩阵（0，λ，μ，0），
为了最小化把实例归为“好”的带来的成本，把阈值设为μ/(λ+μ)，
对这个问题，我们这样得到了0.833。
这就是我们对Naive Bayes做的，但对于那些不产生概率的方法应该怎么做呢？
当然，它们几乎都会产生概率。
让我们看看J48。
想像把J48的minNumObj（最小实例数）设为100。
我们将得到一个小型的树。
我不给你演示，但就是这棵树。

English: 
If I look at the tree, the leaves of the tree
have effectively got probabilities.
The leftmost leaf at the bottom is predicting
"good", and there are 37 exceptions, 37 "bad"
instances.
The "good" probability for this leaf is 1
- 37/108 (the total number of instances that
reach that leaf), which is 0.657.
You'll find that [number] in the list of probabilities
in the table on the right.
The next leaf is predicting "bad", and there
are 68 out of 166 exceptions.
So the "good" probability for that leaf is
0.410, and you'll see that number in the list
in the table on the right.
And so on.
We can get probabilities from J48, and from
other methods as well.
Let's do this.
To do this in Weka, we use the CostSensitiveClassifier
with "minimizeExpectedCost = true".

Chinese: 
根据这棵树，我们可以有效地得到每个叶节点上的概率。
底部的最左侧的叶节点被预测为“好”，并有37个例外，
37个“坏”的实例。
这个叶子“好”的概率是108-37/108
（到达这个叶节点的总的实例数），得到0.657。
你可以在右边表格的概率一列找到这个数字。
下一个叶节点被预测为“坏”，166个当中有68个例外。
所以，那个叶子“好”的概率是0.410，
你可以在右边表格里看到这个数字。
以此类推。
我们可以从J48得到概率，从其他的方法里也可以。
让我们试试。
在Weka里，我们用CostSensitiveClassifier，设定为最小化成本期望。

English: 
I've got the credit dataset open.
If I just run J48 with that cost matrix, I
get a cost of 1027.
Over in Weka, I'm going to select the CostSensitiveClassifier,
Meta > CostSensitiveClassifier.
I'm going to configure that to have the appropriate
cost matrix.
I need to put in the cost matrix here, a 2
by 2 cost matrix, and I want the one we've
been using all along, with a 5 there.
Then I want to set minimizeExpectedCost to
true.
That gives us cost-sensitive classification.
If I run that with J48 (did I select J48?
No; I should have selected J48 here).

Chinese: 
我已经打开了credit数据集。
如果用原有的成本矩阵运行J48，成本将是1027。
在Weka，我们选用CostSensitiveClassifier，在Meta下找到CostSensitiveClassifier。
设置一个合适的成本矩阵。
我们需要在这输入一个成本矩阵，一个2乘2的成本矩阵，
沿用之前的成本矩阵，在这输入5。
然后，把minimizeExpectedCost设为true。
这就变成了一个成本敏感分类器。
如果我们应用J48。（我选了J48了吗？没有，我要先选定。）

English: 
Now, if I run that with J48, I get this little
matrix here, and a total cost of 770.
In fact, back on the slide, that's the middle
section of the slide, the cost of 770 with
the confusion matrix that's shown.
Actually, J48 isn't very good at producing
probabilities, and it's advantageous to use
bagging.
We talked about bagging in Data Mining with
Weka, Lesson 4.6.
J48 produces a restricted set of probabilities,
but using the bagging technique enriches the
set of probabilities produced.
If you just used bagged J48--I won't do this
for you, but if you used that as the classifier--then
you'd get a lower cost, a better confusion
matrix, with a cost of 603.
Or 0.603, because there are 1000 instances.

Chinese: 
现在，运行J48，得到这个小的矩阵，总成本是770。
实际上，回到课件，在课件的中间，成本770，
还有混淆矩阵也在那里。
实际上，J48并不擅长产生概率，
装袋是更好的方法。
我们在Data Mining with Weka的4.6课提到过装袋法。
J48产生一组有限的概率
但使用装袋法可以得到丰富的概率集合。
如果使用装袋的J48--这里就不演示了，但是如果你那么做
--你将得到更低的成本，更好的混淆矩阵，成本是603。
或者0.603，因为一共一千个实例。

English: 
That was what we're calling "cost-sensitive
classification", where you adjust the probability
threshold.
The second method we're going to call "cost-sensitive
learning", where, instead of adjusting the
output of the classifier, the probability
threshold, we're going to learn a different
classifier.
Here's a way to think about that.
Suppose we created a new dataset by replicating
some instances in the old dataset.
To simulate the cost matrix we've been talking
about, suppose we added 4 copies of every
"bad" instance.
The new dataset would have 700 "good" instances
and 1500 "bad" instances.
And rerun, say, J48.
When you think about it, that will give errors
on the "bad" instances effectively a weight
of 5 to 1 more expensive than errors on the
"good" instances.
In practice, we won't actually copy the instances,
we'll re-weight them internally in Weka.

Chinese: 
这就是“成本敏感分类”，我们可以
调节概率阈值。
第二种方法叫做“成本敏感学习”，
与通过调节概率阈值改变分类器的输出不同，
“成本敏感学习”是一种不同的分类器。
有一种解释是这样的。
假设我们通过复制旧的数据集中的实例创建了一个新的数据集。
为了模拟我们之前用到的成本矩阵，假设我们
把每个“坏”实例拷贝四遍。
新的数据集将包含700个“好”的实例和1500个“坏”的实例。
然后，再次运行J48。
让我们想想看，这有效地将“坏”实例的误差成本
增加为“好”实例的误差成本的5倍，
实践中，我们不会真的拷贝实例，而是在Weka内部对实例重新加权。

English: 
The way to do this is to use the same classifier,
CostSensitivieClassifier, but set minimizeExpectedCost
to false.
We had it true before, now we're going to
set it to false, which is the default.
We're going to try that with Naive Bayes and
J48.
Here we are.
Let's use J48 first.
We're going to set minimizeExpectedCost to
false, and run that.
Now we get a total cost of 658 with this confusion
matrix.
That corresponds to the middle line on this
slide: J48 has a cost of 658.
If we were to use Naive Bayes, we'd get a
cost of 530; and if we used bagged J48, we'd
get a cost of 581.
In general, these are a little bit better--certainly
for J48, the results of cost-sensitive learning

Chinese: 
完成加权的方法是用同样的分类器CostSensitivieClassifier，
但是minimizeExpectedCost设为false。
我们之前把它设为true，现在改成false，也是默认值。
我们打算试试Naive Bayes和J48。
好了。
让我们先用J48。
把minimizeExpectedCost设为false并运行。
现在我们得到了这个混淆矩阵和总成本658。
对应课件中间这条线：J48的成本是658。
如果我们用Naive Bayes，成本是530，如果用装袋的J48，
成本是581。
总的来说，成本降低了--当然对于J48，成本敏感学习

English: 
are a little better than the results of cost-sensitive
classification that we looked at before.
Here's what we've learned.
Cost-sensitive classification adjusts a classifier's
output to optimize a given cost matrix.
Cost-sensitive learning, on the other hand,
learns a new classifier to optimize with respect
to a given cost matrix, effectively by duplicating--or,
really, internally re-weighting--the instances
in accordance with the cost matrix.
Both of these are done with the Weka classifier
CostSensitiveClassifier; [it] implements both
of those with a switch to choose which one
to use.
And there are ways in Weka to store and load
the cost matrix automatically.
In the activity, you're going to look more
systematically at the different effects of
cost-sensitive learning and cost-sensitive
classification.
Off you go and do that.
That's the end of Class 4, we'll see you again
in Class 5.

Chinese: 
的结果比我们之前看到的成本敏感分类要好一些。
这就是我们学到的。
成本敏感分类基于一个给定的成本矩阵调节分类器的输出以达到最优化。
另一方面，成本敏感学习根据一个给定的成本矩阵，产生一个新的分类器
它通过有效的复制，或者更确切的说，
通过在内部对实例用成本矩阵重新加权进行。
两种方法都可以用Weka 的CostSensitiveClassifier实现，
可以通过设置（minimizeExpectedCost）进行切换。
并且Weka提供多种方法自动保存和加载成本矩阵。
通过完成课后练习，你将系统地认识
成本敏感学习和成本敏感分类的不同。
让我们开始吧。
第四课到此结束，让我们第五课见。

Chinese: 
再见！

English: 
Bye for now!
