
English: 
Hi! This is Lesson 4.2 on Linear Regression.
Back in Lesson 1.3, we actually mentioned
the difference between a classification problem
and a regression problem.
A classification problem is when what you're
trying to predict is a nominal value, whereas
in a regression problem what you're trying
to predict is a numeric value.
We've seen examples of datasets with nominal
and numeric attributes before, but we've never
looked at the problem of regression, of trying
to predict a numeric value as the output of
a machine learning scheme.
That's what we're doing in this [lesson],
linear regression.

Chinese: 
大家好！这是第4.2节，我们学习线形回归。
在第1.3节，我们提到过分类问题和
回归问题的差别。
分类问题是你想预测一个名词值，
而回归问题是你想预测一个数字值。
我们已经见过名词和数字属性的数据集，但还
没有涉及到回归问题，它是用于预测数字值的
机器学习方法。
这就是我们将在这节课要学的，线形回归。

Chinese: 
至今，我们所看到的都是名词分类，现在来学习数字分类。
这是个经典的统计方法，可追述到2个世纪前。
这就是你通常会看到的图。
你有一组二维数据，我们想在这些数据点上画一条直线
找一条最接近的直线。
对我们而言，可能要处理的数据多于二维，可能是多维的。
但这仍然是个标准的问题。
我们先来看下二维的例子。
你可以写一个这种形式的加权直线公式，w0 加 w1a1 加
w2a2，等等。
如果是一维的话，只有一个a
不用关心后面的这些，只注意w0 加w1a1。
这是这条线的等式（这是一条直线的等式），这里w0
和w1是由数据中得出的两个常量。

English: 
We've only had nominal classes so far, so
now we're going to look at numeric classes.
This is a classical statistical method, dating
back more than 2 centuries.
This is the kind of picture you see.
You have a cloud of data points in 2 dimensions,
and we're trying to fit a straight line to
this cloud of data points and looking for
the best straight-line fit.
Only in our case we might have more than 2
dimensions, there might be multiple dimensions.
It's still a standard problem.
Let's just look at the 2-dimensional case
here.
You can write a straight line equation in
this form, with weights w0 plus w1a1 plus
w2a2, and so on.
Just think about this in one dimension where
there's only one "a".
Forget about all the things at the end here,
just consider w0 plus w1a1.
That's the equation of this line -- it's the
equation of a straight line -- where w0 and
w1 are two constants to be determined from
the data.

Chinese: 
这当然只适合于数字属性，因为我们用常量
乘以这些属性值。
我们随后再看名词属性。
我们将用训练数据来计算这些加权值w0，w1，和w2。
那些是我们将用训练数据来计算的。
一旦得到加权值，我们便可预测第一个
训练实例的值a1。
这里，等式变得很可怕。
我知道它看起来很吓人，但很简单。
我们用这些已计算好的加权值的线形总和，用第一个实例
的属性值来得出这个实例的预测值。
我们将用这个很可怕的公式为训练实例计算
预测值。
我知道它看起来很吓人，但它实际上不是那么可怕。

English: 
This, of course, is going to work most naturally
with numeric attributes, because we're multiplying
these attribute values by weights.
We'll worry about nominal attributes in just
a minute.
We're going to calculate these weights from
the training data -- w0, w1, and w2.
Those are what we're going to calculate from
the training data.
Then, once we've calculated the weights, we're
going to predict the value for the first training
instance, a1.
The notation gets really horrendous here.
I know it looks pretty scary, but it's pretty
simple.
We're using this linear sum with these weights
that we've calculated, using the attribute
values of the first [training] instance in order
to get the predicted value for that instance.
We're going to get predicted values for the
training instances using this rather horrendous
formula here.
I know it looks pretty scary, but it's actually
not so scary.

Chinese: 
这些是我们由训练数据得到的数字，这里的这些是
第一个实例a1的属性值
（上面的1指的是第一个实例）。
这些1，2，3指的是第一，二，三个属性。
我们可以把这个写到这个短小的加法公式中，这看起来好一些。
顺便注意下，我们定义a0（第零个属性的值）为1。
这公式便可用了。
来看第一个实例，我们有这个数x，第一个实例的预测值
和a1的值。
然后我们选能在训练数据上得到最小的平方误差的加权值
这是第i个实例的实际的x值。

English: 
These w's are just numbers that we've calculated
from the training data, and then these things
here are the attribute values of the first
training instance a1 -- that 1 at the top
here means it's the first training instance.
This 1, 2, 3 means it's the first, second,
and third attribute.
We can write this in this neat little sum
form here, which looks a little bit better.
Notice, by the way, that we're defining a0
-- the zeroth attribute value -- to be 1.
That just makes this formula work.
For the first training instance, that gives
us this number x, the predicted value for
the first training instance and this particular
value of a1.
Then we're choosing the weights to minimize
the squared error on the training data.
This is the actual x value for this i'th training
instance.

English: 
This is the predicted value for the i'th training
instance.
We're going to take the difference between
the actual and the predicted value, square
them up, and add them all together.
And that's what we're trying to minimize.
We get the weights by minimizing this sum
of squared errors.
That's a mathematical job; we don't need to
worry about the mechanics of doing that.
It's a standard matrix problem.
It works fine if there are more instances
than attributes.
You couldn't expect this to work if you had
a huge number of attributes and not very many instances.
But providing there are more instances than
attributes -- and usually there are, of course
-- that's going to work ok.
If we did have nominal values, if we just
have a 2-valued/binary-valued, we could just
convert it to 0 and 1 and use those numbers.
If we have multi-valued nominal attributes,
you'll have a look at that in the activity
at the end of this lesson.

Chinese: 
这是第i个实例的预测值。
我们求实际值和预测值得差，求平方，
将它们相加。
这就是我们想最小化的。
我们通过最小化这个平方误差的和来得到加权值。
这是数学的工作。我们不用关心它的原理。
这是个标准的矩阵问题。
如果实例的数量多于属性的数量它效果会很好。
它不适用的情况是你有很多的属性却没有太多的实例。
如果实例的数量比属性的多（通常都是这样）
就没问题。
如果我们有名词属性，而它们只有两个值/二元值， 
可以将它们转变为0和1，然后用这些数字。
如果我们的名词属性值是多元的，你会在
课后的练习中学到怎样做。

English: 
We're going to open a regression dataset and
see what it does: cpu.arff.
This is a regular kind of dataset.
It's got numeric attributes, and the most
important thing here is that it's got a numeric
class -- we're trying to predict a numeric
value.
We can run LinearRegression; it's in the functions
category.
We just run it, and this is the output.
We've got the model here.
The class has been predicted as a linear sum.
These are the weights I was talking about.
It's this weight times this attribute value
plus this weight times this attribute value,
and so on.
Minus -- and this is w0, the constant weight,
not modified by an attribute.
This is a formula for computing the class.
When you use that formula, you can look at
the success of it in terms of the training data.

Chinese: 
我们将打开一个回归的数据集，来看看cpu.aff 。
这是个很普通的数据集。
它含有数字属性，最重要的是它的分类是数字的。
我们将预测一个数字的值。
我们来运行LinearRegression； 它在函数的（functions）类别里
运行一下，这是输出。
这是模型。
71
00:05:29,530 --> 00:05:32,580、
这个类的预测值是一个线性的总和。
这些是我提到过的加权值。
它等于这个加权值乘以这个属性值加上这个加权值乘以这个属性值
等等。
减去w0，它是个常量的加权值，不用乘以属性值。
这是分类值的计算公式。
在使用这个公式时，你可以看下它对于训练数据的成功率。

English: 
The correlation coefficient, which is a standard
statistical measure, is 0.9.
That's pretty good.
Then there are various other error figures
here that are printed.
On the slide, you can see the interpretation
of these error figures.
It's really hard to know which one to use.
They all tend to produce the same sort of
picture, but I guess the exact one you should
use depends on the application.
There's the mean absolute error and the root
mean squared error, which is the standard
metric to use.
That's linear regression.
I'm actually going to look at nonlinear regression
here.
A "model tree" is a tree where each leaf has
one of these linear regression models.
We create a tree like this, and then at each
leaf we have a linear model, which has got
those coefficients.

Chinese: 
它的相关系数是0.9，一个标准的统计测量方法。
看来还不错。
这里是其它的误差数据。
在幻灯片上有关于这些误差的解释。
很难知道该使用哪一个。
它们的结果都近似，我想你应该根据
具体的情况来选择。
这是绝对平均误差和均方根误差，它们都是
标准的测量法。
这就是线性回归。
来看一下非线性回归。
模型树就是它的每一个叶节点都含有一个线性回归模型。
我们建一个这样的树，每个叶节点含有一个线性模型。
它包含了这些相关系数。

English: 
It's like a patchwork of linear models, and
this set of 6 linear patches approximates
a continuous function.
There's a method under "trees" with the rather
mysterious name of M5P.
If we just run that, that produces a model
tree.
Maybe I should just visualize the tree.
Now I can see the model tree, which is similar
to the one on the slide.
You can see that each of these -- in this
case 5 -- leaves has a linear model -- LM1,
LM2, LM3, ... And if we look back here, the
linear models are defined like this: LM1 has
this linear formula; this linear formula for
LM2; and so on.

Chinese: 
它就像是个线性模型的补丁，这6个补丁组成了
一个连续的函数。
在树分类器的类别下，有一个神秘的叫M5P的名字。
如果运行一下，它会创建一个模型树。
我们来可视化一下这个树。
现在我可看下模型树，它和幻灯片上的很相似。
你可以看到每一个叶节点上（这里有5个）都有一个线性模型
LM1，LM2，LM3... 我们看回这里，这些线性模型是这样定义的：
LM1是这个线性方程，这个线性方程是LM2的，等等。

Chinese: 
我们选trees 然后 M5P， 运行。来看下结果。
我们可以将这些结果（92-93%的相关值，30的绝对平均误差
等等），和正常的线性回归结果比较一下，
它的相关值低一些，绝对误差高一些。（事实上，我觉得
这些误差数据都要高一些。）
这是我们在课后练习中让你做的。
线形回归是一门有充分根据的，可信赖的数学技术。
实际的问题通常需要非线性的解决方法
M5P算法用回归模型建树，树的每个叶结点存有回归模型。
存有回归模型。
你能在课本的第4.6节读到相关内容。
好了，去完成课后练习。

English: 
We chose trees > M5P, we ran it, and we looked
at the output.
We could compare these performance figures
-- 92-93% correlation, mean absolute error
of 30, and so on -- with the ones for regular
linear regression, which got a slightly lower
correlation, and a slightly higher absolute
error -- in fact, I think all these error
figures are slightly higher.
That's something we'll be asking you to do
in the activity associated with this lesson.
Linear regression is a well-founded, venerable
mathematical technique.
Practical problems often require non-linear
solutions.
The M5P method builds trees of regression
models, with linear models at each leaf of
the tree.
You can read about this in the course text
in Section 4.6.
Off you go now and do the activity associated
with this lesson.

Chinese: 
很快回来。
再见！

English: 
See you soon.
Bye!
