And you thought we were done
with the ML5 neural network
tutorials.
But no.
There is one more because
I am leading to something.
I am going to-- you will
soon see in this playlist
a section on convolutional
neural networks.
But before I get to
convolutional neural networks,
I want to look at reasons
why a convolutional layer.
I have to answer this question
like, what is a convolution?
I've got to get to that.
But before I get to that,
I want to just see why
they exist in the first place.
So I want to start
with another scenario
for training your
own neural network.
That scenario is an
image classifier.
Now you might
rightfully be sitting
there saying to yourself,
you've done videos
on image classifiers before.
And in fact, I have.
The very beginning
of this whole series
was about using a pre-trained
model for an image classifier.
And guess what?
That pre-trained model had
convolutional layers in it.
So I want to now take the time
to unpack what that means more
and look at how you could train
your own convolutional neural
network.
Again, first though,
let's just think
about how we would make
an image classifier
with what we have so far.
We have an image.
And that image is being sent
into an ML5 neural network.
And out of that neural network
comes either a classification
or regression.
And in fact, we could
do an image regression.
And I would love to do that.
But let me start
with a classifier
because I think it's a
lot simpler to think about
and consider.
So maybe it comes out
with one of two things,
either a cat or a dog and
some type of confidence score.
I previously zoomed in
on the ML5 neural network
and looked at what's
inside, right?
We have this hidden
layer with some number
of units and an output
layer, which, in this case,
would have just two if
there's two classes.
Everything is connected, and
then there are the inputs.
With post net, you might
recall, there were 34 inputs
because there were
17 points on my body,
each with an xy position.
What are these?
Let's just say, for
the sake of argument,
that this image is
10 by 10 pixels.
So I could consider
every single pixel
to be an individual input
into this ML5 neural network.
But each pixel has
three channels,
and R, G, and B. So that would
make 100 times three inputs,
300 inputs.
That's reasonable.
So this is actually what
I want to implement.
Take the idea of a two
layer neural network
to perform classification,
the same thing I've
done in previous videos, but,
this time, use as the input
the actual raw pixels.
Can we get meaningful
results from just doing that?
After we do that, I want
to return back to here
and talk about why this
is inadequate or not going
to say inadequate but how
this can be improved on
by adding another layer.
So this layer won't--
sorry.
The inputs will still be there.
We're always going
to have the inputs.
The hidden layer
will still be there.
And the output layer
will still be there.
But I want to
insert right in here
something called a
convolutional layer.
And I want to do a two
dimensional convolutional
layer.
So I will come back.
If you want to just
skip to that next video,
if and when it
exists, that's when I
will start talking about that.
But let's just get this working
as a frame of reference.
I'm going to start with
some prewritten code.
All this does, it's
a simple P5JS sketch
that opens a connection
to the web cam,
resizes it to 10 by
10 pixels, and then
draws a rectangle in the canvas
for each and every pixel.
So this could be
unfamiliar to you.
How do you look at an
image in JavaScript in P5
and address every single
pixel individually?
If that's unfamiliar
to you, I would refer
to my video on that topic.
That's appearing over
next to me right now.
If you go take a look at
that and then come back here.
But really, this is just looking
at every x and y position,
getting the R, G, B values,
filling a rectangle,
and drawing it.
So what I want to do
next is think about,
how do I configure this
ML5 neural network,
which expects that 10 by
10 image as its input?
I'm going to make a
variable called pixel brain.
And pixel brain will be
a new ML5 neural network.
I should have mentioned that you
could find the link to the code
that I'm starting
with, in case you
wanted to code along with
me, both the finished code
and the code I'm
starting with will
be in this video's description.
So to create a neural network,
I call the neural network
function and give
it a set of options.
One thing I should mention
is while in all the videos
I've done so far,
I've said that you
need to specify the
number of inputs
and the number of outputs to
configure your neural network.
The truth is ML5
is set up to infer
the total number of
inputs and outputs
based on the data
you're training it with.
But to be really
explicit about things
and make the tutorial
as clear as possible,
I'm going to write
those into the options.
So how many inputs?
Think about that for a second.
The number of columns times
the number of the rows times
R, G, B. Maybe I would
have a grayscale image.
Maybe I could just
make it I don't
need a separate input for R,
G, and B. But let's do that.
Why not?
I have the 10 by 10 in a
variable called video size.
So let's make that video size
times video size times three.
Let's just make a really
simple classifier that's
like I'm here or not here.
So I'm going to make that two.
The task is classification.
And I want to see debugging
when I train the model.
Now I have my pixel
brain, my neural network.
Oops.
That should be three.
Let's go with my usual
typical, terrible interface,
meaning no interface.
And I'm just going to train
the model based on when
I press keys on the keyboard.
So I'll add a key
press function.
And then let me just
a little goofy here,
which I'm just going to
say when I press the key,
add example key.
So I need a new function
called add example.
Label.
So basically, I'm going to make
the key that I press the label.
So I'm going to
press a bunch of keys
when I'm standing
in front the camera
and then press a different
key when I'm not standing
in front of the camera.
Now comes the harder work.
I need to figure out how
to make an array of inputs
out of all of the pixels.
Luckily for me,
this is something
that I have done before.
And in fact, I
actually have some code
that I could pull
from right in here,
which is looking at how to
go through all the pixels
to draw them.
But here's the thing.
I am going to do something
to flatten the data.
I am not going to keep the
data in its original columns
and rows orientation.
I'm going to take the
pixels and flatten them out
into one single array.
Guess what?
This is actually
the problem that
convolutional neural
networks will address.
It's bad to flatten the data
because its spatial arrangement
is meaningful.
I'll start by creating an
empty array called inputs.
Then I'll loop through
all of the pixels.
And to be safe,
I should probably
say video dot load pixels.
The pixels may already
be loaded because I'm
doing that for down here.
And I could do something
where if I'm drawing them,
I might as well
create the data here.
But I'm going to be
redundant about it.
And I'm going to say--
ah, but this is weird.
Here's the weird thing.
I thought I wasn't going to
talk about the pixel array
in this video and just refer
you to the previous one.
But I can't escape it right now.
For every single pixel
in an image in P5JS,
there are four spots in
the array, a red value,
a green value, a blue
value, and an alpha value.
Alpha value for transparency.
The alpha value, I can
ignore because it's
going to be 255 for everything.
There's no transparency.
If I wanted to
learn transparency,
I could make that an input
and have 10 by 10 times 4.
But I don't need
to do that here.
So in other words, pixel
zero starts here, 0, 1, 2, 3.
And the second pixel
starts at index four.
So as I'm iterating
over all of the pixels,
I want to move through the
array four spaces at a time.
There's a variety of ways
I could approach this,
but that's going to make
things easiest for me.
So that means right
over here, this
should be plus equals four.
Then I can say the red value
is video dot pixels index
I. The green value
is at I plus one.
And the blue value
is at I plus two.
And just to be
consistent, I'm going
to just put a plus zero in there
so everything lines up nicely.
So that's the R,
G, and B values.
Then I want those
R, G, and B values
for this particular pixel
to go in the inputs array.
The chat is making
a very good point,
which is that I have all of
the stuff in an array already.
And all I'm really doing is
making a slightly smaller array
that's removing
every fourth element.
I could do that with
the filter function
or some kind of
higher order function
or maybe just use
the original array.
I'm not really sure why
I'm doing it this way.
But I'm going to emphasize
this data preparation step.
So I look forward to
hearing your comments about
and maybe reimplementations
of this that just
use the pixel array directly.
But I'm going to keep it
this way for right now.
So I'm taking the R, G,
and B and putting them
all into my new array.
Then the target
is just the label,
a single label in an array.
And I can now add
this as training data,
pixel brain add
data inputs target.
Let's console log something just
to see that this is working.
So I'm going to
console log the inputs.
And let's also console
log the target,
just to see that
something is coming out.
So, a, yeah.
We can see there's
an array there.
And there's the a.
And now if I do b, I'm getting
a different array with b there.
So I'm going to assume
this is working.
I could say inputs dot
length to make sure
that that's the right idea.
Yeah.
It's got 300 things in it.
OK.
Next step is to train the model.
So I'm going to say, if
the key pressed is T,
don't add an example but
rather train the model.
And let's give it
train it over 50 epochs
and have a callback when
it's finished training.
Let's also add an
option to save the data,
just in case I want to stop
and start a bunch of times
and not collect the data again.
And I'm ready to go, except
I missed something important.
I have emphasized
before that when
working with neural
networks, it's
important to
normalize your data,
to take the data that you're
using as inputs or outputs,
look at its range,
and standardize it
to some specific range,
typically between zero and one
or maybe between
negative one and one.
And it is true that ML5
will do this for you.
I could just call
normalized data.
But this is a nice opportunity
to show that I can just
do the normalization myself.
For example, I know--
this is another reason
to make a separate
array sort of.
I know that the range
of any given pixel color
is between zero and 255.
So let me take the opportunity
to just divide every R, G,
B value by 255 to squash
it, to normalize it
between zero and one.
Let's see if this works.
I'm going to collect it.
So I'm going to press--
this is a little bit silly,
but I'm going to
press H for me being
here in front of the camera.
Then I'm going to
move off to the side,
and I'm going to use N for not
being in front of the camera.
So I'm not here.
And I'm just going to do
a little bit right now,
and then I'm going
to hit T for train.
And loss function going crazy.
But eventually, it gets down.
It's a very small amount of
data that I gave it to train.
But we can see that I'm
getting a low loss function.
If I had built in the
inference stage to the code,
it would start to
guess Dan or no Dan.
So let's add that in.
When I'm finished training,
then I'll start classifying.
The first thing I need to do if
I'm going to classify the video
is pack all of those pixels
into an input array again.
Then I can call
classify on pixel brain
and add a function to
receive the results.
Let's do something fun
and have it say hi to me.
So I'm going to make this label
a global variable with nothing
in it.
And then I'll say, label
equals results label.
After I draw the pixels,
let's either write hi or not
write hi.
So just to see that this
works, let's make the label H
to start.
It says hi.
Now let's not make
it H. And let's go
through the whole process.
Train the model.
And it says hi.
Oh, I forgot to
classify the video again
after I get the results.
So it classified it only once.
And I want to then
recursively continue
after I get the results to
classify the video again.
Just so we can finish
this out, I actually
saved all of the data
I collected to a file
called data dot JSON.
And now I can say, pixel
brain load data data dot JSON.
And when the data is loaded,
then I can train the model.
So now I've eliminated
the need to collect
the data every single time.
Let's run the sketch.
It's going to train the model.
I don't really even
need to see this.
When it gets to the end, hi.
Hooray.
I'm pleased that that worked.
I probably shouldn't,
but I just want
to try having three outputs.
So let's try something
similar to what
I did in my previous videos
using teachable machine
to train an image classifier.
And we'll look at this
ukulele, coding train notebook,
and a Rubik's cube.
So let me collect a
whole lot of data.
I'm going to press U for
ukulele, R for Rubik's cube,
and N for notebook.
Save the date in case I need
it later and train the model.
All right, so now ukulele,
U, N for notebook.
And can we get an R?
I stood to the side when I
was doing the Rubik's cube,
so that is pretty important.
So it's not working so well.
So that's not a surprise.
I don't expect it
to work that well.
This is why I want to
make another video that
covers how to take this
very simplistic approach
and improve upon it
by adding something
called a convolutional layer.
So what is a convolution?
What are the elements of
a convolutional layer?
How do I add one
with the ML5 library?
That's what I'm going to start
looking at in the next section
of videos.
But before I go, I
can't resist just
doing one more thing
because I really
want to look at and demonstrate
to you what happens if you
change from using pixel input
to perform a classification
to a regression.
So I took code from my previous
examples that just demonstrated
how ML5 in regression
works, and I
changed the task to regression.
I had to lower
the learning rate.
Thank you to the live chat
who helped me figure this
out after like over
an hour of debugging.
I had to lower the learning
rate to get this to work.
I trained the model
with me standing
in different
positions associated
with a different frequency
that P5 sound library played.
And you can see some examples
of me training it over here.
And now, I am going to run
it and see if it works,
and that'll be the
end of this video.
So I had saved the data.
And now it's training the model.
And as soon as it
finishes training,
you'll be able to hear.
All right, so I will leave
that to you as an exercise.
I'll obviously include
the link to the code
for this in the
video's description
on the web page on
the codingtrain.com
with this particular video.
I can come back
and implement it.
You can go find the
link to a Livestream
where I spend over an
hour implementing it.
But I'll leave that
to you as an exercise.
See if you followed this video
and have image classification
working, can you change
it to a regression
and have it control something
with continuous output?
OK, if you made it this far,
[KISSING NOISE] thank you.
And I will be back
and start to talk
about convolutional
neural networks, what
they mean in the next video.
[MUSIC PLAYING]
