welcome to another coding like mad
MATLAB tutorial today we're doing part 2
of our neural nets in MATLAB series. if
you missed part 1 which was about making
regression networks you can click the
link right here
in today's video we're going to talk
about how do you classify input data
using a neural network. We're going to
use an example where we classify colors
so for example here you can see some
labeled red green and blue data for the
color gray and then what happens if we
run that through a neural network in
order to classify what color it actually
is. You can see the data is pretty messy
to begin with but at the end the neural
net knows exactly what the color gray
looks like. If you like this type of
video don't forget to subscribe to our
Channel and comment below if you have
suggestions for what you'd like to see
next.
Ok so to get started we're going to look
at some example data so the idea with a
classification problem is basically we
need to have an input space and an
output space so for all my examples here
I'm going to be looking at some color
data from Randall Munroe that has the
red green and blue values and then what
someone would actually call those so
what we want to do is train a computer
to understand what the color green
actually looks like in terms of red
green blue space or the color blue or
the color purple right so I have
provided a data set on the github called
short color data set dot CSV so you can
follow along at home what's in this data
set let's just take a quick little
look-see so I'm going to read a data
table and if I do this and wait a couple
seconds I get 170,000 whines I think
yeah it's a hundred and seventy-six
thousand lines and as you can see I took
the unique values in the table dot color
and color name and we have things like
magenta orange pink purple red sky-blue
ten teal and so forth I think it's
worthwhile to actually look at what
these different data sets look like so
if we visualize the red green blue color
space like I did during the opening
sequence you can see that for example
the blue color is indeed mostly blue
data points but there's a bunch of other
junk here
and I think this is important there are
different approaches to machine learning
we're trying to teach a mapping between
this red green blue space and some
labels but there can be real
disagreements between people on what is
the color blue like when does blue
become sky blue or for that matter the
person interpreting the survey did they
think sky blue was a real option or did
they think only blue was an option the
other thing you can see is there are a
bunch of data points in this blue plot
which clearly aren't blue so I would say
there is a let's call it a jerk factor
on the internet people who fill in just
terrible things so the point I want to
make is in this data set there is no
right and wrong answer known we just
know what people call those colors and I
want to teach a machine that so how do I
do that what we want to do now is to
make our data usable for training in
your own net and I'm going to do this in
two different parts so for ease of use
I'm going to first of all we're going to
convert our input space into just a
vector I mean that's very
straightforward we're just taking the
red green and blue features and putting
them side-by-side but what we're doing
for the output space the target is much
more interesting I'm using a command
into Veck here and I'm feeding into it
the specific values which were used or
output by the fine group command so if I
take a look at the values in G let's say
I look at the first ten values and you
can see that we have a set of integers
so integer two was blue it turns out in
fact we can even look at ID value to see
that's blue ID value 10 was light blue
so the colors as labeled by the person
doing the survey here were labeled blue
blue light blue and then whatever color
number seven was however for the neural
net we want to actually convert this
into what's called a one hot encoding so
if I
look at target and I'm going to do just
take a quick little look-see at the size
of target good so if I do the first row
of target so this is supposed to encode
the color blue but what you see is it's
a weird notation that says there's a one
at the position to one well it's
actually because this is a sparse matrix
and I can convert it to a full matrix by
using the full command and what you see
is that it's actual values are 0 1 and
then a long series of zeros this format
is called a one hot encoding when the
idea is that instead of telling the
neural net that it needs to learn that a
color blue is a value 2 instead a color
blue corresponds to a vector with unit
direction in the second dimension so
we've gone from a 1 dimensional space
with 24 possible values to a 24
dimensional space where each dimension
can only be 0 or 1 this is a really
standard approach for dealing with
categorical variables and machine
learning doesn't matter where you are so
you'll see this all the time and we need
this because although I could make a
neural net predict this otherwise it
starts to suddenly matter a lot what
order the colors are in which is weird
it shouldn't really matter what order
they're in and how do you order a color
anyway ok so now that we've got our
input and output space set up we need to
actually go ahead and train our network
so we do that actually very simply with
a piece of code that looks like this if
you remember the regression tutorial
that I previously did we had a very
similar format so the first thing we
have to do is actually create the
network and I do this with the pattern
net command in the regression case this
was a feed-forward network but the
arguments the same actually that ten
says that I want to have one layer with
10 hidden nodes and I can check this
using the view command as you can see
here there is one hidden layer with 10
nodes in it
but you see that the output and the
input both have dimension zero what's up
with that well I haven't actually run
the train command yet
so in MATLAB the input and output space
is not actually populated until you
actually run the training through it so
we can do that by using the train
command we just tell the Train command
which network to use the input space and
the output space so very straightforward
let's go ahead and do that and what you
can see is that now looking at the top
of the screen the input space is now
three dimensional and the output space
is 24 dimensional so it outputs 24
numbers and those 24 numbers are the
probabilities that it is going to be in
any individual category so if the first
Earth say the second dimension is very
high that'll be an indication that you
are most likely in the blue case we saw
the seventh dimensional is light blue so
if your seventh dimension is high maybe
you're more likely to be light blue and
if they're both high maybe it can't tell
the difference between them you can see
there's a lot of information available
on this screen in the second section
where it's the algorithms you can see
that it's using a scaled conjugate
gradient training algorithm that it's
using a random data division and maybe
most importantly that it's using
cross-entropy
so I'm going to talk about cross entropy
in some detail in the video so here's a
little bit of math on screen and what
you can see is that the mean squared
error approach taken by the regression
networks previously tries to minimize
your square error which makes a lot of
sense for a single continuous output
however for a categorical system like
this where we're trying to classify what
we really want to do is not minimize our
squared error to a label because what
does it mean to be 0.5 away from from
being blue that doesn't make any sense
what we want to do is penalize
the network from making guesses it's
confident about when it's wrong so
that's what the cross entropy function
shown here actually does when the
probability of a particular class is
high that makes the penalty for making a
mistake they're high when the
probability is low we don't penalize it
as heavily cross-entropy sounds like a
scary name but it's basically just we're
going to weight by the confidence the
network has and the result is is that it
tries to make high confidence guesses
where it has good information and low
confidence guesses where it doesn't and
that's really nice because we have the
ability then to look at this and give
the network the chance to say I don't
know what the right answer is and for
machine learning that's really useful
like you know if you have a self-driving
car
boy I'd like it to say I don't know what
to do and go to a default option like
hitting the brakes you know like there
are times when it's good to say I don't
know okay so let's go back to our
network which is now finished training
and you can see it took 209 iterations
to finish and a minute in 42 seconds
with a particular performance score and
at the end you can see it had some
validation checks that it did so let's
click the performance icon here at the
bottom and what you can see is that over
time this cross entropy metric improved
if you want you can actually watch this
live while it's training it's pretty
mesmerizing I do it quite a bit and if
you're using this data set it shouldn't
take that long to run in fact we can
more or less rerun it now so let's do
that
so if I click performance we can watch
it evolve over time and this is an
actually the most interesting part of
this so of course that's going to
improve let's pop up training state
training state here shows the gradient
the network is following so you expect
that you you might think this is always
going to go down and I think for this
network it more or less will but
actually for more complicated problems
you'll see this jump around be
because the gradient is one of the knobs
that the optimizer will use to make
things converge better or worse more
interesting is this valve fail so the
way that neural networks are trained in
MATLAB and you'll see the same trick
used in tensor flow pretty frequently is
it will use a validation data set and
this validation data set is basically a
second data set that it checks that is
never used for training and what it will
do is continue to train until this
validation data set actually gives a
worse answer when you perform a training
loop and the idea is this is data it
hasn't seen before so if the network is
generalizing well the validation data
set should continue to improve over time
and if we start to see that not
happening it's a sign that overtraining
is occurring we can also generate for
example the receiver operator
characteristic curve so this is the ROC
curve and this is a standard tool used
in data science so here I have an ROC
curve for a classifier so all it's
trying to do is tell the difference
between two different data sets and as
you can see the true positive rate is
plotted on the y-axis and the false
positive rate on the x-axis and the idea
is that depending on where we set a
threshold for our classifier we will
have a higher true positive rate that as
we're going to flag more often but as we
flag more often the false positive rate
increases because we're making more
mistakes so ideally if we set a very low
threshold we'll barely ever flag but on
the other hand while we won't make many
mistakes we won't make many correct
guesses on the other hand if it's at a
very high threshold we may flag
constantly but we'll be making mistakes
all the time so we'll get everything but
we'll also make tons of mistakes so the
ROC curve is a really nice way to check
how we're doing it you what you want to
see is a curve like this one where
we are very close to the top-left corner
the area underneath this curve between
the 1-1 line that you see diagonally and
this top corner actually is called the
area under the curve and is a measure of
how good is our ability to differentiate
between the classes were interested in
so here we have 24 classes and you can
see that actually only this purple one
is being struggled with it all and it's
the largest class okay so next let's
talk about how you deal with the outputs
here so if I run the neural network I
can do it basically just like a function
so here I'm going to plug in all of the
training data I had and output a set of
predictions labeled Y so here we're just
using a like it's a function the network
is just a black box I can put in my
vectors in and it will go ahead and
classify them so what does this actually
look like well remember I said we're
predicting 24 outputs here you can see I
again predicted 24 outputs conveniently
though there is a command which inverts
Veck to IND and will actually classify
everything more or less automatically
for us so if I run that command there we
go classes 1 to 10 and you can see what
it actually classified them as so if you
remember our labels before if I plug
this into there we go ok let's just do
the first 3 so the first 3 we're blue
purple and cyan is what the outputs were
and the inputs for those with those real
labels were were blue blue and light
blue so here it got the first one
exactly right blue and blue whereas
purple was classified as blue and cyan
as light blue
now cyan and light blue are almost
identical purple you can make some
arguments over where the line is between
purple and blue to
honest looking at the data I don't
particularly trust the internet labeled
things in the most accurate way so for
me this seems like a very reasonable
result and I will show the output from
the models here as you can see if we
visualize the outputs of the model for
the color blue there's a little bit of
purple and a little bit of light blue so
let's actually compare the blue 3d
render to the cyan render and you can
see the cyan color is actually on the
border of the blue color so this
explains why would have trouble
distinguishing between blue and cyan
they touch the same thing actually
happens with purple and this is weird I
didn't expect this but there is a little
sliver of purple in blue and I think
that this is inherent to the English
language words so finally one of the
things that I skipped when I did this
before was the confusion matrix so when
you're doing the training you can pop
this up but honestly with 24 classes it
gets pretty bad so here I just wrote
some code to calculate it anyway and the
idea is basically I'm going to look at
what the output class was and what the
input class was and we're just going to
count how often it got each class mixed
up so let's take a quick look at that
and as you can see here's a here's a
display of that label and you can see
right away first of all this data sets
imbalance there is way more blue brown
green pink purple and red than any other
color so people are way more likely to
call something red for example than they
are pink and they're way more likely to
call it green instead of dark green or
light green like light green and dark
green are both relatively rare in this
data set in fact you could argue that
according to the people surveyed on the
internet blue brown green pink purple
and red are really the only colors and
maybe magenta so
this confusion matrix essentially shows
how often things are getting mixed up so
you can for example see light purple and
light green were never classified
correctly light purple and light green
were always shifted so for example light
purple usually got classed as purple
whereas light green usually got classed
as green and probably the reason for
this is just there's so much extra green
there that the things that we would call
light green you would say well okay
they're also green so there's sort of
overlap between the spaces a confusion
matrix here lets you work out where the
neural network or any classifier for
that matter is getting confused like
which things are not distinguishable and
if you wanted you could try and push it
to get the ideal situation which is a
diagonal line here but the truth is is
in this case is I don't think there's a
right answer I don't think that I don't
think it's worth pushing a neural
network to try and produce diagonal here
because actually I think there's a
fundamental ambiguity between the colors
green and light green and dark green and
I don't feel that it makes sense to push
it likewise it also always be confusion
because people on the Internet are bad
people
no offense to those of you watching my
video I mean the other people and so
because of that there are always going
to be some junk answers in here where
people have filled in the wrong answer
and there can also be completely
well-meaning differences of opinion in
what the color is as well as differences
of opinion due to monitor variation and
there may even be some people who are
colorblind to also felt the survey in in
all of these cases I think there's sort
of good reasons not to try and force the
neural net to capture things what is
cool here is that we can see the real
color space so I'm going to close on
that here is
a animation showing exactly what each of
these colors is and I hope you enjoy
watching both this video and this
animation because I think it's really
cool to see what these colors really are
anyway guys have a good day and I hope
you will subscribe to this video if you
enjoyed it
