Hello, and welcome back to New Zealand for
another few minutes of Advanced Data Mining
with Weka.
In Lesson 1.4, we're going to continue our
exploration of the timeseriesForecasting package.
In the last lesson I showed you some graphs,
which I actually made with Excel for the purposes
of presentation, but the timeseriesForecasting
package can make such graphs itself, and we're
going to show you how to look at the output
of the package.
I think you should restart the Explorer, just
to reinitialize all of the options in the
timeseriesForecasting stuff and load airline.arff.
I've done that.
I'm going to go to Forecast and click Start
here, and we get this output, which we haven't
looked at before: "Train future predictions"
it's called; and you can see this is a graph
of passenger numbers, and
if you look very carefully, you can see that
these are square data points, and the very
last one is a round data point.
That's the predicted passenger number.
We're only predicting one time unit here,
but we can change that.
Let's go up to the interface and change the
number of time units to forecast to, say,
12, and try again.
Now you can see that we've got these 12 predicted
points and a dashed line.
So we're forecasting ahead, from the end of
the training data.
Let's go to the Lag creation panel, and remember
we removed the leading instances with unknown
lag values.
That will remove the first 12 instances,
and we can do that again.
Actually, it doesn't affect the graph. We
still get the same graph, but we know that
the first 12 instances are not being use to
create the model.
Coming back to the slide, if you think about
the timeline like this.
Here's the dataset that top line and underneath
we've got the dashed line with the leading
instances, 12 of them, and then the training
data for future predictions, and then the
future predictions leading ahead after the
end of the dataset.
All right. Now let's do some evaluation here.
We're going to evaluate on the training data
and on 24 held-out instances.
I'm going to go to the Evaluation panel and
evaluate on the training data and
24 held-out instances,
two years' worth. Run that.
Now I get the "train future predictions" output
here, which ends at the end of the training
data and then shows us 12 future
predictions from that point.
Coming back to the slide, we've got the dataset.
We've got the training data now, which is
all of the dataset, except for the last 24
instances, and the future predictions from
the training data is the dashed line there.
Then if we look at the other output here--going
back to Weka--Test future predictions,
you can see now that we've got the test data
here and future predictions from the end of
the test data, this dashed line with the round
points.
Coming back to the slide, we've got the whole
dataset,
then we've got the training data, and then
we've got the test data and future predictions
from the end of the test data, that is after
the end of the dataset.
Now, it would be nice to see the one-step-ahead
estimates for the test data.
There are a lot of graphing options here.
First of all, I'm going to turn off the evaluation
on training, because that's going to
give us too much data to look at.
Let's just look at evaluating on the test
data.
I'm not going to graph the future predictions
at all.
Now if I run this, I get no graphical output.
There's nothing.
Let's turn on Graph the predictions at step
1 and run it.
Now you can see here the test predictions
for the target.
You can see in blue the predicted passenger
numbers and in red the actual passenger numbers.
So we can see there the discrepancy on the
test data between the one-step-ahead predictions
and the actual data itself.
We're going to then do a little bit more on
this panel.
We're going to graph the predictions at step
12, that is 12 step ahead predictions, and
then we're going to compare 1-step-ahead,
6-steps-ahead, and 12-steps-ahead predictions.
Let's go back here.
I'm going to graph the predictions at step
12.
Now, I of course get worse predictions,
because we're predicting 12 steps ahead.
You'd expect that to get worse.
There's a consistent error, where they
undershoot the actual data values because,
of course, with multi-step ahead predictions,
with any step ahead predictions, once you
make an error on the first prediction, then
that error continues to propagate through
the future predictions.
Let's graph the target.
We've only got one possible target here.
If we had other attributes, we could graph
them, but we're just going to graph passenger_numbers
at step 12, and actually that's going to give
the same result.
I've got two graphs here, the one we had before,
and the new one, which looks exactly the same.
However, you can do better things here.
I'm going to turn the old one off just to
stop too much confusion, and I'm going to
graph--we can put in a comma-separated list
of numbers here--so I'm going to graph 1-step-ahead,
6-steps-ahead, and 12-steps-ahead predictions.
Now, you can see them in different colors.
The difference between 1-step-ahead predictions,
the most accurate, that's the blue line, 6-steps-ahead
predictions, which is the green line, and, yellow, which is considerably worse, and 12-steps-ahead
predictions, which is a bit worse still,
the yellow line.
You can compare predictions at different points
ahead.
I'm just going to improve these predictions
just to finish off.
I'm going to go to my base learner and change, shall we?
it from linear regression to SMO, which we
found in one of the activities tended to be
better than linear regression.
Let's have a look at that.
You can see those predictions are quite a
bit better than they were with linear regression.
Let's go and change.
We're using this large model with a large
number of attributes here.
I'm going to reduce the number of attributes.
I'm going to just use a lag of 12, and then
I'm going not to include power of time.
I'm not going to include products of time
and lag variables.
I'm going here, and I'm going to customize
this by not including any of these periodic attributes.
If I run this again, well, I've got a much
simpler model here.
This is the model based on just the date and
the lag by 12.
Now if I look at those graphs that I saw
before.
Well, you can't see them.
You can't see them, because they're all on
top of each other.
It's plotting the red and the blue last and
the green and the yellow are kind of hidden
underneath the 1-step-ahead-predictions.
I've shown you several different options for
visualizing time series predictions.
We talked about the need to distinguish different
parts of the timeline: the initialization
part of the leading instances, which contain
unknown values for the lag variables;
extrapolation past the end of the dataset
into future predictions; the full training data;
the test data, if evaluation is specified;
and the training data with the test data held
out; and we extrapolate past the end of that
for so-called "future predictions" based on
the training data.
We showed how you can look at different numbers
of steps ahead when making predictions.
You can read more about this in a document
about the time series analysis and forecasting
package with Weka, referenced there at the
bottom, and now it's time for you to go and
have a look at the activity associated with
this lesson, which will take you through
some of the different output options, but
looking at the textual output rather than
the graphical output.
Good luck with that, and we'll see you in
the next lesson.
Bye for now!
