KNIME Analytics Platform is the open source software for creating data science applications and services.
Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing
data science workflows and reusable components accessible to everyone. In particular, 
thanks to its graphical workbench, KNIME Analytics Platform is extremely intuitive, fully visual guided, and
without coding needed.
We are going to demonstrate for you such ease of use by building a simple workflow step by step in this
video. The workflow we've chosen implements a churn prediction solution.
But it could refer to any other classification task.
We have documented the steps required for this workflow and annotations in the workflow editor.
This is one of the main advantages of using a visual interface:
we can document the workflow steps inside the workflow itself. The input files are available in this
data folder in the KNIME Explorer.
The .csv file contains the contract data for the customers, while the .xls file contains operational data
for the same customers.
Customers are identified by a customer ID key in both files.
All we need to do is drag and drop the file from the local workspace into the workflow editor.
This automatically creates the required reader note with most of the right settings to read the file.
The configuration window opens automatically. After a brief check,
we see that all settings are correct. Then we execute the file reader node and take a look at the output
table.
Next, we drag and drop the .xls file to the canvas.
This creates an Excel Reader node. Looking at the configuration window,
we notice that the node did not recognize the first row in the file as the column names.
We adjust this by enabling this option.
We close the configuration window, execute the node, and we take a look at the output table.
The next step in our workflow is pre-processing.
First we will need to join the content of the two red files.
This typically requires a Joiner node.
However, if we are in doubt on what to do next, we can consult the Workflow Coach on the left.
The Workflow Coach is constantly anticipating our next step.
Here, the Workflow Coach is right;
we need the Joiner node for our next action.
The Joiner node allows us to blend data from different sources.
We configure the node by selecting the key columns for the join operation and the type of join.
We then execute the node and we view the result, which is now a single table containing all the contract
and operational data for each customer in a single row. Our task is a classification task. Classification
training algorithms often need classes as String. The churn classes are currently integer values, so we
use the Number to String node to convert them to String.
Next we use the Color Manager node to color the rows in our table. In this configuration window, we choose
the colors according to State values. When we view the output table we see that each row now has the
color of its corresponding state. We add the Partitioning node. We select 80 % vs. 20 %
relative partitioning and stratified sampling. Now it is time to train the model.
A number of machine learning algorithms with different complexity degrees are available. For this example,
we chose the Decision Tree Learner node just for demonstration purposes.
We open the node's configuration window and choose our own settings, such as minimum number of records,
number of threads, and binary nominal splits.
We execute the node and we open the view. Here in the view we can track all of the decision rules produced
by our decision tree.
Before we move on to scoring the performance of our model
we make sure that class attributes are equally distributed in the test data set, which we can do by adding
the Equal Size Sampling node.
We set the necessary configuration settings, and execute the node.
Now we apply the Decision Tree Predictor node to the balanced test set.
After execution,
a column with model predictions has been added at the end of the input table. Next the Scorer node matches
the original and the predicted classes, and calculates the confusion matrix and accuracy measures.
We can also use the ROC Curve node to score the model predictions.
We choose Churn as the class column,
we arbitrarily set 1 as the positive class, and we plot the probability that a customer will churn.
We execute the node and we open the view where we have the ROC curve and the measure of the area
under the curve.
There are a number of visualizations to choose from.
For this example we used the Pie Chart node. We visualize the sum of churn predictions across states.
We choose State as the category column, sum as the aggregation measure, and the predicted probability
of churn as the frequency column. We execute the node
and we take a look at the output view.
Finally, the PMML Writer node saves the model in this output location.
Now we have created a workflow to predict churn risk of customers. Using the intuitive, visual user interface
of KNIME Analytics Platform
we have created a valuable workflow in record time.
There has been no need for coding and it was easy to find and reuse the nodes to build the complete
pipeline.
You can try the platform yourself and build a workflow to create your own data science application.
You can download the open source and free KNIME Analytics Platform at www.knime.com/downloads.
