Use the Open Source Neuroph Java Framework
for Cognitive
Cognitive computing has many facets.
In this video, I'll show you how to use supervised learning
to train, validate, and run a multi-layer perceptron network,
Using the Neuroph Java neural network framework.
In this video I'll show you how to use Neuroph
to train, validate and run a Multi-layer Perceptron,
or MLP, network.
And to use this network to make selections for
the 2017 NCAA Division I Men's Basketball
Tournament.
Also known as March Madness.
Here's what you'll see in the video:
* Install the Neuroph framework
* Download the NcaaMarchMadness code from
GitHub
* Download regular season stats from ncaa.com
* Load the data into a PostgreSQL database
* Train and validate the MLP network
* Run the tournament predictor to make selections
Let's get started.
To download Neuroph, visit the downloads page
Click on the Neuroph framework zip file, and
it will be downloaded to your computer.
Now expand the ZIP file and notice the neuroph
core JAR file.
The latest version of Neuroph may not be in
Maven Central
So to use it for your local development
I recommend you install it to your local Maven
repository
Drop out to a Terminal window or command prompt,
navigate to the directory where the Neuroph
core JAR is located, and issue this command:
mvn install:install-file -Dfile=neuroph-core-2.94.jar
-DgroupId=org.neuroph -DartifactId=neuroph
-Dversion=2.94 -Dpackaging=JAR
This will install the JAR into your local
Maven repository so it's available to your
Maven builds.
Next, you'll need to clone the NcaaMarchMadness
code from GitHub.
Open a Terminal window or command prompt,
navigate to the directory where you want the
code to land, and enter this command:
git clone https://github.com/makotogo/developerWorks
Now that you have the code on your computer,
let's import it into Eclipse.
Start Eclipse, and choose File > Import > Maven
> Existing Maven Projects.
Navigate to the directory where you cloned
the code.
Select the project to import, and click Finish.
To train a neural network you need data.
I'll show you how to download data from ncaa.com
now.
Go to www.ncaa.com/stats/basketball-men/d1
Scroll down the page until you see "Archived
National Stats" and click on that link.
On the next page, select the season you want
to download, then click View.
Under division, select Division I.
The 2010 Tournament started on March 16, 2010,
so under the reporting week, select Through
Games 03/14/2010.
Next to the statistical category, select Team
> All Statistics.
Finally, click Show Report (CSV).
It takes a few minutes for the server to crunch
the data, but when it's finished, it will
download the CSV file, called rankings.csv,
to your computer.
Rename the file to include the tournament
year, so in this case I'll call the file rankings-2010.csv.
Repeat this process for every year you need
statistical data, including the tournament
year itself.
I recommend you download data from 2010 through
2017.
The CSV data looks like this:
SCREEN CAP: SHOW CSV DATA
Each statistical category is preceeded by
a header indicating the category, and that
is followed by the data.
When the data is exhausted for a particular
category, the next statistical category follows,
then its data, and so on, until end of file.
To load the CSV stats data into the DB, you
have to do a few things.
First, you have to generate SQL load scripts
from the CSV data you downloaded.
I have provided Bash shell scripts for Linux
and Mac users.
If you're a windows user you can set up a
run configuration in Eclipse to drive the
SqlGenerator program.
First, navigate to the scripts directory of
the NcaaMarchMadness source you downloaded
earlier, which is under src/main/script.
Execute the run-sql-generator.sh script, and
specify the full path to the CSV file you
want to convert into a SQL load script.
Once you run the script, it creates a single
SQL file with SQL INSERT statements in it
to load the statistical data.
Next you have to define the DB, including
all the tables and views the application needs
to do its processing.
From a Terminal window or command prompt,
get into the PostgreSQL psql interactive shell
as the default user, sperry in my case.
Then create the ncaabb database.
Now exit the shell, and connect to the ncaabb
database.
From the shell execute the build_db.sql script,
which will build the database from scratch.
When the script finishes, the data is loaded
and you're ready to train and validate the
network.
The first step in training and validating
the network is to define the network structures
you want to use.
To do this, open Networks.java in Eclipse.
This file is where you define the structure
of the networks you want to train and validate.
You only specify the hidden layers here for
each network you want to train.
In this network, for example, the first hidden
layer has 90 neurons, the second hidden layer
has 30, and the third hidden layer has 20
neurons.
Now build the MarchMadness JAR file.
You can do this from within Eclipse, or from
the command line, which is what I'll show
you now.
Drop out to a command line.
Navigate to the root directory of the NcaaMarchMadness
project.
And kick off a maven build.
Specify the clean and package phases.
Now that the JAR file is built, kick off the
training and validation run by running the
run-mlp-trainer.sh script, and specify the
years to use to train the network, followed
by the years to validate, separated by commas.
For the 2017 tournament, I want to train using
data from 2010-2014, and validate using data
from 2015 and 2016.
So the command looks like this:
./run-mlp-trainer.sh 2010 2011 2012 2013 2014
2015,2016
This process can take hours or days to train
your networks, so be patient.
And it may take several tries before you get
a network that performs above the 70% threshold
 
Once a network performs above this threshold
it's saved on your computer to the Networks
directory,
To be run as part of the tournament simulator.
Okay, so you've downloaded stats data,
Loaded them into the DB,
Along with other supporting data from the
scripts in GitHub.
Then you trained and validated one or more
networks.
Now you're ready to run the tournament simulator.
To run the tournament simulator, open a Terminal
window or command prompt and navigate to the
scripts directory.
Execute the run-tournament-simulator.sh script,
and specify the tournament year.
For 2017 I'll execute this command:
./run-tournament-simulator.sh 2017
The output lands in the data/Simulations directory
just subordinate to the base directory defined
in the network settings.
You'll notice there is one CSV file for each
team in the tournament.
Inside each file is a list of all the other
teams, along with the results of the simulation
using the networks you trained of that team
against every other team in the tournament.
When you're filling out your brackets, pick
one of the teams, open its CSV file, locate
their opponent, make a note of the network
pick, and fill out your bracket.
That's how you train, validate, and run a Multilayer
Perceptron neural network.
I hope you enjoyed the video.
I'm Steve Perry.
Thanks for watching.
