Hello, again! Great that you're back, because
otherwise you'd be missing out on some great
visualization stuff.
Like I mentioned in the first lesson, we'll
be using JFreeChart for some of the plotting,
because Weka's plotting is a little bit complicated
and it's much, much nicer doing jfreechart
plots.
If you haven't done already, please install
the "jfreechartOffscreenRenderer" package, which
I already mentioned earlier, and, if you're
looking for more Javadoc on the jfreechart
library, you can do that on the jfree.org
website.
The classes that we'll be touching on for
JFreeChart will be datasets that jfreechart
needs for plotting, "ChartFactory" for creating
plots, and "ChartPanel", which is actually
used for embedding plots in the GUI.
And, finally, some Weka classes for displaying
trees and graphs.
First of all, I'm going to start up Weka in
the Jython console.
For the first script, we'll want to plot the
classifier errors--obtained from a linear
regression--regressor on a dataset and plot
these.
But, not just actual versus predicted, but
also take into account how bad the error is.
So, first thing, we're going to import a whole
bunch of classes again.
"Evaluation" for evaluating our classifier.
We're going to use "LinearRegression" as a simple
classifier for doing the regression.
DataSource for the usual loading of the dataset.
"DefaultXYZDataset" is a JFreeChart dataset,
which allows you to store 3 dimensions for
each data point.
We're basically using the z as the error.
"ChartFactory" for generating the plot.
"ChartPanel" for embedding it, and the "BubbleRenderer"
basically plots a bubble at the x, y position
using the z value as the radius.
Okay.
So, we're loading our data.
In this case, it has a numeric class in the
"bodyfat" UCI dataset.
Then, we are configuring our LinearRegression
classifier turning off some bits we don't need.
It also makes it a bit faster.
Once again, we are cross-validating our classifier
with 10-fold cross-validation.
And, after the cross-validation is done, we
need to collect the predictions and need to
compute the error.
So, what we're going to do here is quite simple.
We're going to start with three empty lists,
the actual, the predicted and the error.
We're going to look through all the predictions,
which we can retrieve by the predictions method,
and retrieve those predictions, store the
actual and predicted and calculate the error,
which is basically actual minus predicted,
and the absolute value of that.
Having done that, we can then create our dataset,
which is a "DefaultXYZDataset".
We are adding a series to this dataset, which
we simply give it a name, like "LinearRegression"
on the name of the dataset, with the actual,
predicted, and error.
Then we're using our "ChartFactory" to create
a plot and, in that case, a scatter plot,
with the title "Actual" and "Predicted" as
the axes titles.
As a renderer, since we not only want to plot
a little dot at that location x and y, we
use a specific renderer, the one that I mentioned earlier, "XYBubbleRenderer".
Then, we are simply embedding the whole thing
in the frame and displaying that.
Let's run that, and here we go.
As we can see, some of the outliers are quite
large, and the ones that are closest to the
diagonal, the optimal case, are the smallest
ones.
We can even zoom in if we wanted to and it
adjusts accordingly.
The next script handles ROC curves for classification,
because the area under the curve and how the
curves for the various class labels are is
actually telling you an important story about
how well your classifier's doing.
In this case, once again, new tab, and we're
going to import a whole lot of classes again.
In this case, we're evaluating "NaiveBayes",
and we're using a "ThresholdCurve" class from
Weka, which allows us to calculate the ROC
curve data, among other things.
Since we're only plotting x, y in this case,
we don't need an "XYZDataset", just an XY one
will do.
Once again "ChartFactory" and so on, which we've
already seen in the other one.
Now, once again, we'll load a dataset.
In this case, we're loading the "balance-scale"
UCI dataset, which has a nominal class.
Setting the class attribute to the last one
again.
Instantiating our "NaiveBayes" classifier.
No options to be set, and cross-validating
that once again with 10-fold cross-validation
to obtain the statistics.
We're creating our dataset again and, since
we want to plot the ROC curves for all the
class labels, we're going to have to look
through all the labels, of course.
So, what we're going to do here is we're
going to have a variable which is going to
range from 0 to the number of values minus
1 that the class attribute has.
In each case, we're going to create the threshold
curve data, so we instantiate a "ThresholdCurve",
and then use the predictions of the Evaluation
class and the current index of the label that
we're interested in and create curve data from
that.
We can simply extract then those columns of
data from the dataset curve that was generated
and put that into a list. We're looking at
the "false positive rate" versus the true positive rate"
that we want to plot.
Then, since we already have a dataset, we're
adding a plot series to it and, to make it
a bit more interesting, we're also calculating
the ROC curve for each of the class labels
and using that as the label for the plot.
Okay.
Now we're creating an XY line plot, because
we're connecting the dots rather than just
dotting them around like it was for the bubble
plot earlier.
Put the titles for the axes down, "False Positive
Rate" and "True Positive Rate".
Then, once again, put that in a frame and
display it.
Let's run that, and we have our three class
labels L, B, and R.
As you can see, the blue line is the worst
one, and, if you look it up, it also only
has an ROC of 0.719, whereas the other ones
have almost 1.
As, you can see, they go straight up and really
nestle quite nicely in the corner here, and
then plateauing out at pretty much 1 there.
So that looks pretty sweet.
This was using JFreeChart to plot some graphs.
However, we can also plot some data using
simple Weka classes.
In this case, we want to plot a tree that
got generated by J48.
Once again, we import stuff, and for visualization,
we're going to use the "TreeVisualizer".
First of all, once again we have to load some
data in, in this case, the "iris" dataset.
We're going to build an unpruned J48 tree,
build it on the dataset, and then we're creating
a "TreeVisualizer" using the graph that the
built classifier returns.
Then we're embedding the whole thing in a
frame, visualizing that, and, once the frame
has been displayed, we can also fit the tree
then to the size that's on the screen.
Running that.
We have our nice little tree of the "iris" dataset.
Now, trees aren't the only thing that Weka
can plot.
the "BayesNet" classifier allows you to plot
network graphs and this is what we're going
to do now.
In this case, we're going to use the "BayesNet"
classifier and the "GraphVisualizer" from Weka
to plot a graph that this classifier generates.
Once again, load the iris dataset, and we're
going to configure our "BayesNet" classifier.
To make the graph a little bit more interesting,
I'm using two parents rather than just one.
I am building the classifier, and then I'm
initializing the "GraphVisualizer" using the
graph that the classifier returned.
In this case, it's in the BIF format or the
Bayesian Network Interchange Format, if I'm not mistaken.
Okay.
Once again, we embed the whole thing in a
frame, display that, and just like with a
TreeVisualizer, we also want to make sure
that the layout is all right.
Let's run that, and we have our little network
graph.
If we click on the various nodes, we can then
see the probability tables.
We can inspect it further.
What we've done in this lesson is we used
JFreeChart for plotting classifier errors
and ROC curves, and we used the tree visualization
of Weka to visualize a J48 tree and the BayesNet
network graph.
All right.
That's it for today.
I'll see you next time.
