In this video, we're going to discuss convolutional
neural networks.
And in this section, we're going to do things
a little differently, and we'll build our
network, train it and then in the next video,
we'll analyze the results to try to better
understand what's going on.
So, first we'll go over the convolutional
neural network constructor, discuss the forward
step, and then we'll go over training.
And just to note, we can apply many of the
methods we've learned in chapter 5 and apply
it to the convolution kernel, just like we
applied it to the linear terms in our neural
network.
So, here's a standard picture you might see
for a convolutional neural network, everything's
pretty much the same as a regular neural network,
we have the fully connect layer.
But we have another set of parameters for
the convolution kernels, and they're also
obtained via training.
So, the training step is identical, and just
our network architecture is different and
we have to include the convolution operations.
So, we're not going to get at the details
of training, but it's pretty much identical
to a fully connected network and the math
is just a little more complicated.
So, let's start off with a really simple example
and this is the simplest example I can come
up with.
And we're going to try to distinguish between
a horizontal line and a vertical line.
In the vertical line, Y will equal 0, and
in the horizontal line, y will equal 1.
And to make things interesting, we're going
to add some noise.
So, for this iteration the values might look
like this, and for the next iteration those
little black points will be in different areas.
So, we'll take our input image and let's just
get rid of that to clarify everything.
And we're going to build very simple network
and for our first convolutional layer we'll
have two channels.
And then we'll get our activation map, well,
two activation maps.
We'll apply an activation function and pooling
and we'll only have one output; so, because
we have two inputs, we will require two kernels.
And if you recall, for each input, we'll have
our own kernel.
We'll convolve the kernel with its corresponding
input, and add the results together, and we'll
get something like that.
And let's just clarify it in an independent
step.
So, next thing we're going to do is apply
an activation function and then max pooling,
so let's put it in our 3D diagram, so we have
a smaller output.
And finally we'll take this layer over here,
and we're going to flatten it.
And remember this is just an activation value,
and we're going to use this activation value
as an input to a fully connected layer.
So, this has two hidden layers, but for our
actual example, we will only require an output layer.
Here's the final diagram.
So, this is kind of hard to fit on one page;
so, let's do one more diagram to clarify everything.
So, we have the two channels corresponding
to our first layer, and the second two channels
corresponding to our second layer, and we
add the activation max pooling.
So, we take the two outputs from our first
layer, convolve them together, then we'll
apply the activation function and pooling,
and then connect to our fully connected layer.
Before we use it as an input to our fully
connected layer we're going to have to flatten it.
So, we'll take the output of this layer, which
is just a set of activations, plug the activation
function in pooling, flatten it, and then
these will be the input to our fully connected layer.
So, just to note, to make things a little
easier, for our network when we have two parameters,
the output layer … the number of outputs
for a first layer and the number of outputs
for our second layer.
So, here's our object constructor and we'll
have one in channel because we're using grey-scale,
the parameter out, in this case will be 2,
and the kernel size and padding size we will
hardcode them because they take a little
more work to determine.
So, we'll add our relu activation and max
pooling layer and again for max pooling and
stride size, we'll have to take those into
account when we calculate the size of our
flattening layer.
Now we'll construct our second convolutional
layer and notice how the number of outputs
for our first layer is equal to the number
of inputs for a second layer, and the number
of outputs for second layer will be 1, then
we'll add our relu, then we'll add a relu
and max pooling layer.
Finally, we'll create a linear layer, notice
that the number of inputs to this layer is
equal to the number of outputs in the preceding
layer.
And just to note, these numbers over here
are not easy to calculate, you have to take
into account the size of the image after all
these layers.
So, this line of code we'll assume we're going
to flatten our image, and because we only
have two classes, we'll have two outputs.
So, let's go over the forward step.
And again, this is kind of big, so we're going
to shrink this down a little.
So, let's add our second set of layers, so,
we'll perform activation and max pooling,
convolve two layers by activation and pooling
again.
And we'll take that output, resize it, and
then apply it to our final layer, and that'll
be our output; 2 outputs.
We can create a training data set and validation
set data set, and again, technically this
is a test set because we don't have any full parameters.
And again, our only promise for a model, will
be the number of outputs for our first layer
and the number of outputs for our second layer.
So everything else is pretty much the same
as softmax for a regular neural network.
Because it's multiple classes, we'll create
a cross-entropy loss, clear optimizer, and
our input will be the model.
We'll create some lists to store our parameters.
We'll create a train loader object and validation
loader object and we'll perform training via
gradient descent.
And the only difference is, we're not going
to have to reshape our tensor, because we're
going to leave it as a rectangular image.
And that's pretty much it; it's almost identical
to softmax or a fully connected Network.
