LAURENCE MORONEY: Welcome
back to "Machine Learning
Foundations for
Google Developers."
I'm Laurence Moroney
from the TensorFlow team,
and I'm here to be your guide.
In the last video, you
learned all about convolutions
and how they can use filters
to extract information
from images.
You also saw how to create
pools that can reduce
and compress your
images without losing
the vital information that
was extracted by the filters.
In this video, you're going
to get hands-on and create
your own convolutional neural
networks, so let's get started.
In earlier videos, for
the simple neural network
for spotting fashion
or handwriting digits,
you defined a model
architecture like this.
You use layers and
primarily dense layers
for densely-connected neurons.
To use convolutions
and pooling, you
have the Conv2D and
MaxPooling layers, like this.
Now, they can be stacked on
top of your dense network.
You define a convolutional layer
with a number of parameters.
In this case, the
64 is the number
of filters for this layer.
Remember that the filters
will be randomly initialized,
and then the best
filters to match
the pictures to their labels
will be learned over time.
The 3 by 3 is the
size of the filter.
Earlier, we saw filters
for the current pixel
and its immediate
neighbors that were 3 by 3,
and that's what
we're defining here.
As before, we have
an input shape,
which is the shape of
the images being fed in,
and that's 28 by 28
with 1-byte color depth.
Similarly, the pooling is
done like this, with a layer,
and the 2 by 2 defines the
size of the chunks to pool.
So in this case, 4
pixels will become 1.
There's also MinPooling,
AveragePooling, and stuff
like that, but we'll
focus on MaxPooling here.
These layers can then be
stacked on top of each other,
so the results of the 64
filters from the top layer
will each be pooled,
and then their results
will each be filtered 64
times, and they, of course,
will get pooled again.
So let's take a look
at the model's summary
so we can see how
the data is changing
as it goes through the network.
You'll see something like this.
There's a lot going on
here, so let's unpack it.
First of all, the initial
output probably looks weird.
Our images are 28 by 28
and we get 64 filters,
so we'd expect our output to
be 28 by 28, but it's 26 by 26.
This looks like a bug, but it
isn't, so let me explain why.
Consider a picture like this
one of a very sleepy doggy.
On the left, I've zoomed into
the top left of the picture
so you can see the pixels.
When doing a filter,
you scan every pixel
and take its neighbors.
But what happens if we pick
the top pixel like this?
It doesn't have any
neighbors above it
and it doesn't have
any to the left.
Similarly, the
next pixel doesn't
have any neighbors on top, but
it does have some on the left.
It's not until you
get to this pixel
that you'll have one that has
neighbors on all sides, which
you can see here.
So a 3-by-3 filter requiring
a neighbor on all sides
can't work on the pixels around
the edges of the picture.
You effectively have to remove
one pixel from the top, bottom,
left and right, and this
reduces your dimensions
by 2 on each axis.
So a 28-by-28
becomes a 26-by-26,
which you can see here.
Each filter will learn 9 values
for the filter coefficients,
plus a bias, for a
total of 10 parameters.
So the 64 filters have
640 learnable parameters.
Our pooling reduces
the dimensionality
by half on each axis, so 26
by 26 will become 13 by 13
but no parameters are
learned on this layer.
The 3-by-3 filter then
reduces 13 by 13 to 11 by 11
by removing a pixel
border, like before.
The MaxPooling halves
that, rounding down,
so we end up with 5 by 5 images.
At this point, we
have 64 filters
and the images are 5
by 5, for 25 pixels.
Multiply all that out,
and you get 1,600, which
then gets fed into the Flatten.
This set of 1,600
values can then
be classified with a
dense network, as before.
So now that you've seen
how the code works,
let's take a look at a lab that
updates your fashion classifier
from last time to use
convolutions, as well
as dense layer types.
So let's take a look at
improving computer vision
accuracy using convolutions.
Here's the deep neural network
that you've created already
for the fashion_mmist data set.
And we can see that
we have Flatten(),
followed by a Dense
with 128 neurons,
followed by another
Dense with 10 neurons,
because we have 10 classes.
When I run this, I'm just
going to train for five epochs.
Let's see how quick it is and
let's see how accurate it is.
First, it needs to
download the data.
And we can see after five
epochs, it's up to about 89%
accuracy on the test set,
and a little over 87--
almost 88% accuracy
on the validation set,
which is really, really strong
performance, considering it's
only been five epochs.
So now let's take a
look at what happened
with a convolutional
neural network.
So here you can see
the model architecture.
We have our same
Flatten(), Dense(),
Dense() that we had earlier.
But in this case,
on top of that,
we have a couple of
convolutional layers,
and these convolutional
layers have
their associated
MaxPooling layers.
Note that the input
shape is 28 by 28
by 1, because the
convolutional layer expects
it to be in three dimensions,
with one dimension
for the color depth.
And that means we have to
reshape our training images
and our test images arrays.
They were 68,000 by 28 by 28.
We have to add another
dimension onto it--
10,000 by 28 by 28 by
1 for the test images
with that extra
dimension added onto it.
So now when I run it,
it's going to compile.
It's going to show me
the model architecture.
And it's going to
start training.
, Now this is going to
be a little bit slower
because it's doing a
convolutional neural network.
But if you're running
using the GPU runtime--
change runtime type
and make sure it's GPU.
Do that even before you begin--
and you'll see it's not too bad.
It's five, six
seconds per epoch.
And in this case, with
only five epochs training,
it's gone up to about 93% on
the test data and 91% and change
on the validation data.
So we can see it's
actually improved.
It's a significant step
in the right direction.
So have a play with it
yourself, and as you're
working through
the Codelab, take
a look towards the
bottom of the Codelab,
where you can visualize
the convolutions
and pooling to see
what they look like.
And there's also some
exercises at the bottom
where you can try different
things for yourself.
Once you've done
with that, you'll
be ready to take
this week's exercise.
So now you're ready
to experiment.
Pause the video and
give this lab a try.
See how far you can get,
and have fun experimenting
with the visualization
of the convolutions.
Welcome back.
Now that you've had a chance
to play with convolutions,
it's time to do the exercise.
Give the one at this URL a try.
I'll share the
code for the answer
next time, so don't forget
to hit that Subscribe
button for more great videos
and the rest of this series.
[MUSIC PLAYING]
