This is a run-through of Chetna Warade's tutorial,
Sentiment Analysis of reddit AMAs.
are Ask Me Anything web events, in which anyone can
get on the reddit site live with luminaries like Bill Gates, Madonna,
Stephen King, or President Obama, and ask them any question that comes to mind.
These conversations offer interesting data analysis opportunities
and today we'll perform some sentiment analysis
on a conversation hosted by some IBM software developers.
You'll see how to move JSON data from reddit through our handy Simple Data Pipe.
into Cloudant, enriching the output using Watson Tone Analyzer.
Then the Spark-Cloudant connector makes it so easy
to analyze the data. Finally, we'll organize and report
on the reddit conversation's social and emotional tones using an iPython notebook.
 
First, deploy the Simple Data Pipe app to Bluemix,
IBM's cloud development platform.
Deployment automatically provisions and binds the Cloudant service too.
 
Once deployed, click EDIT CODE.
Here, we make a quick code change to add the reddit AMA connector.
This connector was built by our team, and lets you specify a reddit AMA as your data source.
 
To add the connector, open package.json,
and under dependencies, add the following line, which you'll find in the tutorial.
 
Save your changes, then click the Deploy button.
 
Now we can return to Bluemix to provision the services that the reddit connector needs.
First, Apache Spark.
 
By adding the Spark service, we can leverage the Spark-Cloudant connector,
which lets you connect your Apache Spark instance
to a Cloudant NoSQL database instance and analyze the data.
 
This is a great way to leverage Spark's lightning-fast processing power
directly on your Cloudant JSON data.
And, of course, Watson Tone Analyzer,
which will perform the sentiment analysis.
Bind the Tone Analyzer service to your new Simple Data Pipe app
and name it only Tone Analyzer.
Bluemix prompts you to restage, which you want to do, so click RESTAGE.
And that's how easy it is to add a connector to the Simple Data Pipe, by the way.
 
Add the dependency to package.json, provision any additional services you need, and you're good to go.
 
Now, we're ready to move some data.
Launch your Simple Data Pipe app,
click Create A New Pipe.
Within the Type dropdown, choose reddit AMA.
This choice exists now because of the changes we made to package.JSON a few minutes ago.
Name it ibmama.
Enter the URL for your reddit AMA.
This can be just the straight URL, or you can append .json to it.
We want Top comments only and flattened JSON, which is best for use with Apache Spark.
We'll skip scheduling
and run the pipe
Let's see what we've got.
Almost 400 comments have landed in Cloudant.
 
If you want to view them in Cloudant, click Details, then Top Comments Only.
But, let's move on, and set up our iPython notebook, and run some analysis.
We do that back in Bluemix, here in our Apache Spark service.
 
 
You want to create the notebook from a URL,
and that URL is provided in Chetna's tutorial.
Here's our notebook.
The final setup step here is to enter our Cloudant credentials here in cell 3.
To get them, launch your Bluemix dashboard in a new tab.
 
Click Cloudant to open the service. Then click Add Credentials.
If prompted, create credentials.
Then copy and paste the values for host, username, and password
replacing any XXXXs that you see in cell 3.
 
At the end of the line, is the name of our Cloudant database.
This default entry works for me, because I named my pipe ibmama.
If you gave your pipe a different name, enter it here.
Now we're ready to run analysis with our notebook. Let's start at the top
The first couple cells connect to Spark and to our Cloudant database.
Here, we create the dataframe
and get it in tabular format.
Prep the dataframes for SQL commands.
Now we'll compute the distribution of tweets by sentiment scores.
We really want only scores greater than 70%.
Here we plot the data as a bar chart.
Watson Tone Analyzer captures tones in the text,
gauging social traits like agreeableness, extraversion
language styles like analytical, tentative
and emotions like joy, disgust, anger, fear, and sadness.
You'll notice that no comments scored over 70% in any emotion.
This conversation just didn't get that heated.
The last cell groups comments by tone values.
Scroll through these, and you see that many comments fall under more than one tone.
You can also see what kinds of comments appear under certain tones.
For example, comments from IBMers take up most of the Agreeableness section.
Probably because they were acting as gracious hosts of this event.
Now go try the tutorial yourself.
You'll see how easy it is to add a new data source, like reddit, to the Simple Data Pipe.
Here, we used a Python notebook to help us leverage Watson Tone Analyzer and get results like this.
But you could use the analysis tool of your choice. Go give it a try.
