 
BRIAN KOKENSPARGER: All right.
Well, the first thing I'd like
to say is, I want you all to
applaud for yourselves.
I mean, this is the
very last session.
You're not up in the
room dolling
anything up and all that.
Thank you so much for
holding out, and
coming to the last session.
So I applause you on this.
I also want to say, I've seen
some awesome people standing
up here, giving some
really nice talks.
So I really feel honored to be
up here speaking with you
about this topic.
I love this topic.
I'm kind of a newcomer to it,
though, so in some ways, we'll
kind of learn together.
I think, as we go.
I want to ask to start with, how
many people have actually
done any kind of data
mining at all?
Raise your hand.
OK, for those of you who aren't
raising your hands, how
many of you have actually--
say you're given a test, and
students have come up and
handed in their tests
and moved on.
And then you thought, oh,
they're the early ones.
They are probably really know
this stuff, or they don't
care-- one or the other.
And then you get to the middle
ground, and it's like, OK,
those are the middle people.
They'll probably all
do pretty well.
They know this stuff, but they
wanted to really make sure.
And then you get to the end,
and you think, OK, these
people, some of them don't have
a clue, and others are
just really careful.
So how many people have had
thoughts like that when you've
been given a test?
Probably everyone.
So you've done data mining.
Data mining is looking
for patterns in data.
Doesn't matter where
the data are.
Doesn't matter how you get
them, how you do the
collecting, how you
do the analysis.
You're looking for patterns.
That's data mining.
So you've all done educational
data mining.
I thought I would
use this title--
Educational Data Mining
Process Using Canvas--
that's much better.
Thank you, whatever you did.
I thought I would use this,
because we're kind of in
silver mining area here--
at least at one time.
And I saw Treasures of Lost
Sierra Madre or whatever, and
that Houston guy there-- hey,
with the shaggy beard.
That's me.
I'm a prospector.
And also, I was bored in a hotel
lobby, waiting for my
wife and kids to get up, so I
wrote these little haiku.
So you'll see them along.
I don't think they're very good
but, I thought if I went
to the trouble of writing
them, I'll use them.
Actually, they're
[? center U, ?]
I guess.
I don't know how to
pronounce that.
But it's haiku in non-natural
situations.
So the definition of a
prospector is one who explores
an area, especially for
mineral deposits.
So prospectors explore.
They search, with no guarantee
of success.
So they think there
may be gold.
There might be silver.
They go, and they look for it.
They have no guarantee.
That's a little different
than researchers.
Researchers set up studies to
try to answer a question, and
they try to kind of hedge
their bets with a high
possibility of success.
So educational data mining, I
think, is actually more along
the prospecting line than it is
along the researching line.
You don't know if you're going
to be successful or not.
So I'd like to kind of
make my tagline--
there's gold in them
thar hills--
because I think there really
is gold in the educational
data mining area, but perhaps
not the gold that you might
think about.
So a quick and dirty
definition of
educational data mining--
it's basically using mainstream
data mining tools--
so not special tools,
mainstream tools--
to explore educational data.
Things like grades.
SIS information, page views in
MLSes, tutoring software, that
type of thing for educational
purposes, or to answer
educational questions.
So mainstream tools--
everything else about
it is education.
It's really a pretty
untouched area.
There's a small group
of educational
data miners that meet.
They're meeting in
Memphis this year
in a couple of weeks.
I plan on going.
And it's a small, virtually
untouched area, because for
three things.
There is so many
data educators.
Educators love to collect
data, and they
collect a lot of them.
And so there's a mountain--
literally a mountain of data.
But on the other side, there's
so little time and resources,
especially among faculty members
and administrators.
They have a lot of other things
to do than to actually
go out and pan for gold, in a
sense, in these mountains.
So there's not a lot
of resources.
And then finally, the
payoff is subtle.
You can find data mining
professionals in business a
lot because the more data mining
you do, the better
chance that you get that you're
actually going to make
a lot of money, or increase
your revenue.
In educational data mining,
it's more about
teaching and learning.
Yes, that's important.
We all know that's important,
but that extrinsic value isn't
necessarily there.
So we're really doing it for
the students, and I think
that's also part of the
reason why it's a
virtually untouched area.
So what I'm going to do is
of kind of just whet your
appetite a little bit about
basically workflow.
If you want to get into data
mining, if you want to go to
the party and then go to your
room and get on your computer,
and start data mining,
you can do it.
You can do it today--
this evening, if you like.
And I'm going to show you some
tools that you can do it with,
and a basic way to
go about it.
So I'm going to show there
are four different
ways to collect data.
There are actually more ways in
Canvas, but these are some
four that I'd like
to talk about.
First is analytics, and I
probably showed the most
undescriptive part
of analytics.
And so for those of you who
worked on that, I'm sorry.
Everyone knows you can go to
individual analytics and see a
lot richer data on an
individual person.
But I wanted to use this one,
because I kind of wanted to
point out how sometimes we think
we see patterns, but
they're really elusive.
Take a look at this, and you
can look at the percentages
over here, and they just
don't match up.
For example, you might see
here this person--
and I'll wiggle my little mouse
here-- who had a lot of
page views, average
participations, turned
everything in--
95.8%.
That's pretty good.
You might see this person, and
you might expect low page
views, participations, and then
some late assignments--
65%.
OK, so you can see some
general patterns here.
So you can do some data mining
just by looking directly at
something like analytics.
But also, there's the
access report.
And I'm actually going
to get in a lot
more to student access.
But basically, this is actually
pretty valuable.
And when I look at the
difference between, say, data
mining and classroom management,
it's kind of like
the difference between
predicting the weather.
We have some really good tools
about predicting weather.
And tools like this is actually
really good about
classroom management.
You can kind of generally tell
how your students are doing by
jumping into these pages,
and looking over them.
That's like predicting
the weather.
But what if you want to actually
study climate change?
That's a little different than
predicting the weather.
They really don't have
tools built.
You kind of have to ask a
question, and you kind of have
to wing it.
So classroom management tools
don't really work so well for
data mining.
So that's the difference
there.
Also, the API.
This actually came right out--
and I know you probably can't
see it very well--
but came right out of a little
session I did at the workshop,
which was a great workshop
on Tuesday.
So you can collect data
through the API.
A lot of people have been
talking about that, and
showing it, so I don't think it
needs a lot of additional
explanation.
And then finally, page views,
and there's actually two
different sort of
segments here.
One is sort of the whole thing,
and then the other one
is sort of a magnified
version, in case
you can't see it.
This I actually got--
I'm doing a study right now
with 367 students over 14
different courses.
And I went course by course,
student by student, went into
page views, drag and scrolled,
drag and scrolled, drag and
scrolled all the way
to the bottom.
And then Control-C, go over,
Control-V into a notepad text
pad, and then saved it.
So 367 times I went through
that, but it was worth it.
I have the data now.
I'm working with them.
I know there are easier
ways to do it.
Probably, I could have grabbed
them through the API, but
sometimes you have to put
in the sweat equity.
But the nice thing about
page views-- and I'm
going to focus on them--
is it looks like there's
not a lot here.
I mean, yeah, you have the URLs
that they visited, and
then you have the time and date
that they visit, and then
the platform that they
visited from.
And so basically, and then
if they participated in
something, I think that's
listed here.
And for some reason, I think I
didn't set up my course so
that it would do that.
So it doesn't look like
there's a lot.
But you have this data--
these data-- and you have
from page views to--
eureka.
You can look at these data in
different ways, work with
them, do some processing,
pre-processing.
Go into some data mining tools,
and you can actually
get some really great data and
great results out of something
like page views data that has
simply, what, one, two,
basically three components, and
I only use two of them.
So the idea here is grab your
axe, start asking questions.
This is supposed to be
a little gold nugget.
My wife thought it looked
like a peanut.
And I thought, well you know,
sometimes that's what you get
is a peanut, and that's OK.
Peanuts are pretty tasty at 2
o'clock in the morning, so
keep that in mind.
So there are really two
different ways of going about
data mining.
One is kind of the classical
approach, and that is you use
computerized tools--
automated tools--
that emulate a really smart
person with a lot of really
good memory.
So it emulates the
human brain.
It's kind of the classical
approach.
The other approach is the
visualization approach, and
that is you take data, you
format them, and show them in
such a way that a human being
can look at the data in a way
that makes sense to
them, and where
they can see the patterns.
It's visualization.
Seth [? Gorrell ?], some
others have shown some
excellent tools for that.
And so I think there are
probably some of you have
attended those.
I'm going to actually go more
at the classical route,
because I think that, in
some ways, that part is
underrepresented.
The visualization is a really
sexy way to go, a lot of fun,
easy to show other people.
But sometimes, I think we leave
out the classical tools.
So in terms of page views, I
went through a quite a bit of
pre-processing.
And one thing I wanted to do,
and I was looking at a lot of
the data about LMS
usage patterns.
So basically, it's log usage
patterns for internet, but LMS
usage inside Canvas.
And there's a lot of data that
says that students who are
disengaged in the LMS
are fast clickers--
so they click through very,
very quickly, kind of, I
think, gaming the system
to a certain extent--
and they're also high
timeout students.
So they'll get in, and then
they'll go do something or
whatever, then they'll
come back and they'll
have been timed out.
And then they'll kind of do a
little more, and then they'll
get distracted, and that
type of thing.
So those are two groups
of students.
So I thought, I need to kind
of create session data.
How long was this student
actually engaged with these
course materials?
And I can do that with page
views to a certain extent by
taking the URLs and taking the
time stamps, and then turning
them into sessions, looking at
the sequence of time stamps,
and then figuring out and sort
of guessing when they were
away from the computer.
So these are all derived--
except the total web
page views and
the course page views.
Everything else is derived
from the time stamp data.
You take the URLs,
and derive them.
Like total sessions is based
on the actual time that
students have actually
spent inside
Canvas, clicking on Canvas.
Average clicks per session,
the total accumulated
minutes-- and this is actually
over the entire semester.
Sure, they may have been on
longer, and maybe I didn't
catch that.
But the general idea is if you
keep all the rules consistent,
and you do your research from
there, you can generally get
some pretty good data,
especially if you have a
pretty high in.
So that's what I was
working on there.
And then, the average minutes
per session, and then the
average minutes per click.
So this average minutes per
click here is getting at the
fast clickers.
So you see here, even with
these five students, one
tended to click about every
0.26 of a minute.
Another tended to click about
every 0.9 of a minute.
So that's a big difference.
Some are clicking a lot
faster than others.
And then this is a little odd.
I don't know if I'm going to
continue using this, but it's
repeated sessions per day.
What I basically did
was looked at--
assuming a student every day,
or every time they get on,
they get on once a day,
they do their work,
and they get off.
These are students who actually
got on more than once
in a given day.
And you can kind of
capture that.
If you look at all of
the day time stamps,
you can capture that.
And so, some students get on a
lot of times during the day,
others very rarely.
So like this 36, there were 36
days during the semester where
that student had more than one
time where they logged in to
Canvas into my course, and
actually had course page views
within that time period.
This student only, three
the entire semester.
So I'm getting at those data,
fast click, or high timeout
sort of situations.
Another one is actually
running-- and I created a
quick and dirty HTM JavaScript
kind of a page for this.
I fed all of the students page
views in for the entire
semester, and then grabbed just
those that had to do with
the course, or else general page
views, like I going to
the calendar.
You can't really tell
which course
they're in in the calendar.
And then after taking that,
looking at the percentage of
basically tokens
inside the URL.
So for student 1, 31% of their
page views had to do with the
assignments, and had assignment
in the URL that
they visited.
And then, you see students
5 was actually 46% there.
Interesting one I
think is grade--
10.1% down to student
6 that's 1.3%.
How often they check grades
might make a big difference.
If we have some time
at the end, I'll
actually show you that.
It's pretty simple--
just copy, paste in, hit it,
and it actually generates
these values.
And then I also did a
date time matrix.
So again, this is just
URLs and date times.
That's it.
Those are the only fields
I'm working with these
pre-processing steps.
And this is one student when
they tended to do most of
their clicking in the course.
Monday morning, they
had 20 clicks
through the entire semester.
109 clicks Sunday--
actually, I'm sorry, 55 clicks
on Sunday evening.
So it looks like that
was a time.
So we can use this, as well, to
look for patterns, as well.
Do students who work on Sunday
afternoons tend to do better
than students have work on
Tuesdays right before class,
for example?
So I do work a little bit
with setting thresholds.
What means a session?
How many minutes should I look
for between clicks that were
within the course to consider
that a session has happened,
and a new session has started?
So I looked at these average
minutes for clicking and the
repeated sessions per day, and
did a little correlation on
them, and actually ended up
with a p value 0.002.
So actually, a pretty
significant p value, with the
correlation r value down here
at negative 0.73, which kind
of makes sense--
fast clicking is kind of one
type of student, and then sort
of go in, do a little bit,
wait a long time, do
a little bit more.
That doesn't lend itself
to fast clicking
statistics by any means.
So there's kind of a negative
correlation there.
But it kind of set up the
20-minute session time for me
pretty well.
And so this is actually
how that plotted out.
And it's not the tightest, but
it's not the loosest, either.
And you can kind of see
the line there, so
it's not too bad.
So let's actually get into
the data mining and
the decision tree--
the data mining that I actually
did with this.
I kind of used three different
ways of looking at them.
One is decision trees.
And what I actually used was
this really excellent software
package called R. Who has
experience with R as a statistic?
It does data mining.
It's great tools.
Kind of a really high learning
curve, though.
So my recommendation is
use the free tools.
Get out there, you can go back
to your apartment or your room
or whatever, you can download
R, and get it installed in
about 10 minutes, and start
playing with it.
But you kind of have
to go out.
There are a lot of
books on it.
There's a lot of stuff
out there on the web.
And then, you can just copy and
paste stuff in, and try it
out, and say, OK, that works.
And then modify it a little
bit, and go from there.
Yes?
It's R. Just the letter
R. In fact, R--
yeah, that kind of keeps
with arr-some.
So this is my installation.
If you do R space statistics,
it'll come up.
Very, very easy to use.
So let me just show
you building--
actually, not necessarily
a decision tree.
Let me actually go on.
The decision tree, I
kind of looked at
two different values.
This shows students who had
module in their URL and grade,
and then, this repeated
session per day.
How does that all stack up in
terms of those fast-clicking
students, or those high time out
kind of wiggle their way
through the course
kind of students?
And so, I ran a decision tree.
And this is the plotting that
came out of R. And you can see
the biggest determinant
was module.
And this was a module-based
course.
So if they spent less than 14.3%
of their clicks in a
module, then they were more
likely to be one of those sort
of disengaged students, tan if
their module URL percentage
was over 14.3%.
Then we get down here with the
repeated sessions per day.
If it's less than 12.5, then
only 0.2 of those students
tended to be in sort of the
disengaged grouping.
However, if it's greater
than 12.5--
so 13 or greater--
80%.
That represents 80%
of those students.
So that was the most significant
sort of predictor.
So you can actually generate
predictors out using R, and
based on page views data,
and just combination.
Notice checking grades really
didn't make that much
difference.
A lot of people think the
students are disengaged, check
their grades a lot, and
don't do much else.
Didn't seem to make much
difference in this case.
Another one is K-means
clustering.
This one here with decision
trees, you kind of have to
classify first.
You kind of have to know which
students that you want to
build predictors for.
For clustering, you
don't have to.
You just have to have a bunch
of data points, and you ask
the system to cluster them for
you, and show you which data
points, which students
belong together in
various different ways.
And that's what I have my quick
little demo here for R.
And what I basically do is I
kind of go through, and I type
these into little notepad files,
and so I can just copy
and paste when I want
to run them again.
And it makes it really easy,
so I don't have to like
formulate these on the fly.
 
So I throw these in, and it's
basically taken it.
It's read the table in.
It's attached, it's
done the names.
And so, if I want to
find two clusters--
and this I just kind of pulled
out of the book.
It's not like I'm the great
R guru or whatever.
Just throw this in, and
there's a plot for the
clustering.
Very, very simple and easy.
And the red on the left is one
cluster, the black on the
right are the others.
And interestingly enough, this
is grade plotting grade versus
average minutes per click.
So those kind of fast clickers,
and if [INAUDIBLE]
these are the fast clickers down
here, so you see those
2.5, 2.3, minutes
between clicks.
And most people who check
[? the grid ?] a lot are in
the lower half of the
fast clicker group.
So that's something
to take a look at.
And then, we can also--
and don't have to just look
for two clusters.
We can look for, say, five.
 
And I got to kill this.
 
And so, this actually breaks
it into five clusters.
And you'll notice one outlier
over here on the left is a
cluster of its own, as well as
that other outlier way down
south there.
And so, you can do
various clusters.
And then, it's pretty easy to
just track those to the actual
students themselves.
You can find that student that
has like the 0.1 click rate.
So kill that.
 
OK, so that's, again,
the clustering.
And then you can also use
association rules.
And this is something that
I've just started playing
with, but I think it's
kind of exciting.
Most of you who know about
marketing, and data mining in
marketing, like with Walmart
and all that, association
rules are used a lot for--
you go to the store, what
do you buy with bread?
Milk and beer, or something.
And they kind of know what you
buy together, so they put
those things together.
You can do this in
education, too.
You can run in R an
association rule.
You can use that tool to
actually look at the types of
things that students do
together, like conversations
happen to be the lowest
area in terms of
the number of URLs.
So that's what it hit.
And then it looked at, OK, how
many students had both
conversations and files,
and hit their files
on a Friday in August?
That type of thing.
There's a lot to be done here.
I can't really go into
a lot, because we
don't have the time.
But I think there's a lot you
can do here with those kinds
of things, and looking for
those kinds of patterns.
And then it will actually--
this is, again, right in R. It's
freely available, give
you all of these visuals
for that.
So if you want to get started
with data mining-- and I kind
of showed you the steps--
first, you have to build
a data warehouse.
Canvas, you've already really
completed this step.
All of those data
are out there.
All you need to do is collect
it, and start working with it.
Sometimes you might need your
administrator to help you with
the collection.
But once you have
it, you have it.
Decide on what you want
to find out-- what
your question is--
then gather the kind
of data that might
best answer that question.
So I wanted to look at LMS usage
patterns, so for me,
page views is really
the best way to go.
And then pre-process
for the tool.
Remember, I made sessions.
I looked grids--
created grids, that
type of thing--
and then run the tool.
And the most important thing
I can say is, play.
Have fun with it.
You can't break anything.
You pull down the data, it's not
like you're changing the
data for the students
or whatever.
You can play with
it all you want.
What did you learn?
Sometimes you learn that that's
really not the best way
to go, and you need to
redo your approach.
And then repeat steps three
to six until you or your
administrator thinks that your
question has been answered.
And then, finally-- and
this is my dream--
full integration of
R with Canvas.
Can you imagine if you were
inside Canvas, and you typed
cluster on page views
URL token module--
goom--
and then a visualization
came up.
How cool would that be?
That is data mining.
I mean, I could do all this
stuff that has taken me weeks
to do in one night with
this kind of a tool.
And how about correlate
page views URL token
module with quiz six.
Correlations--
how cool would that be?
You can ask all kinds
questions.
So this is a knowing that there
may be an engineer or
two in the room, this is my
dream that I'm throwing out.
And hopefully, some of you share
this dream, as well.
So finally, questions.
We have a couple of minutes,
I think, for questions.
And thank you all for being
very patient in
listening to my spiel.
Yes, sir?
AUDIENCE: [INAUDIBLE]
BRIAN KOKENSPARGER: Yes,
I could post it.
I think there's a place on the
InstructureCon course to
share-- yes.
Yeah, I'll be happy
to post it.
Sure.
It's all made up data,
so there are no
student datas on there.
Other questions?
 
AUDIENCE: [INAUDIBLE]
BRIAN KOKENSPARGER:
Sure, yeah.
Yes.
The question was, besides the
manual copy and paste that I
was doing, can you automate
the process?
And the answer, of
course, is yes.
I'm the type, though, that goes
out with clippers and
trims in my yard, because
I really love hands-on.
I don't get out the
weed whacker or
the Round Up or whatever.
But there's a weed whacker and
Round Up version, I'm sure, of
copy and paste that you
can automate, sure.
AUDIENCE: [INAUDIBLE]
BRIAN KOKENSPARGER: Oh, yes.
Yeah.
All you need is a text editor.
For me, HTM--
so something, any browser.
And then, there's a version
of R for every platform.
Yeah, very simple.
AUDIENCE: [INAUDIBLE]
 
BRIAN KOKENSPARGER: I think
you can get page
views through API.
And then, administrators can
get it for you in a course,
but I think that's
the only access.
Some of the guys in the
back may correct me if
I'm wrong on that.
That is correct?
OK.
Yeah.
And the question was,
is that the only
way to get page views?
Any other questions?
Thank you all very much.
Enjoy the party tonight.
[APPLAUSE]
