Welcome back. Let's continue a short
example on back propagation. So in our
last video, we had a very basic
perceptron that could see the features
for a dog, a cat and a dog. The features
were that have whiskers, and is a good
boy. And the cat had a value one zero, the
dog had a value zero one. We fed those
through a structure that multiplied each
of the inputs by a certain weight.
We call these neurons and these are the
weights that find neurons across layers.
Once we went from the input layer to the
output layer, we had a certain value 0.93
and 0.06, which we passed through an
activation function that transforms
those into one and zero, and this is the
value that we wanted. So these random
initialization values worked. What do we
do if we don't? We use a procedure called
back propagation, so let's have the same
example but with different random
weights when we start. So I have two
neurons, one that accepts the feature has
whiskers and one that accepts the
feature is a good boy, but now our
weights are 0.47 and 0.38. And let's just
say for the sake of the exercise that
these are assigned at random. We have the
same activation function which we'll
look at in a moment, and we have the same
input 1 0 for cat, 0 1 for dog. We want
the same output 1 for a cat, 0 for dog,
when we process the input cat, we have
one x 0.47 plus 0 multiplied by 0.38 equals 0.47.
That's the value that goes from
the input layer to the output layer for
the input cat. For dog, we have zero x
0.47 plus one x 0.38,
0.38 is the value that arrives at the
threshold of the output layer.
Then we're gonna use the same activation
value, just rounding to the nearest
integer. 0.47 would round down to zero,
and 0.38 would also round down to zero.
So these would be - we would take the 0.47
for cat, pass it to the activation
function, which is just rounding to the
nearest integer, and the output would be
zero because it's 0.47, so
in this case, the network made a mistake
for cat. A cat was misclassified as a dog,
and 0.47, 0. We wanted
a 1, but instead we got a zero. Notice
that it was correct for the dog. We
wanted a 0, and 0 is what we got from the network. But how do we correct
this? How is the network going to learn
the right answer for cat? The first thing
we need to do is calculate how wrong we
are. We're going to call this a delta and
there's many iterations of these kinds
of functions. I'm just gonna give you a
very simple example. How wrong we were is
the weight that the multiplication that
I got from the weights, so 1 multiplied
the input for whiskers, multiplied by
this plus the input for a good boy,
multiplied by this, so this is the value
that arrives at the output layer, 0.47
for cat, 0.38 for - for dog. The output that we
wanted was 1 for a cat, and 0 for a dog,
so the difference between the output we
wanted and the ou - the what we got in the
output layer was minus 0.53. This is
just 0.47 minus one and 0.53 for dog. The -
we got 0.38 and the output that we
wanted was zero,
so the delta there is plus 0.38. So these
are going to be the deltas. This is going
to be the error that we have in between
what we got from the input layer and
where we want to be. Another thing that
we need to do is to define how fast we
should learn. We're gonna call this the
learning rate, and we're gonna discuss
much more about this later. But for the
sake of example, let's set it at 0.1
which means that it's gonna learn at
about you know 10% steps of the error. So
once we have these numbers - 0.53 and
0.38, and you might want to pause the
video and get like a piece of paper to
write things down, so write the delta
values for each of them, and write the
alpha, the learning rate, on your piece of
paper, and I'll give you a few seconds.
Welcome back. We'll use these numbers to
calculate the new weights and we can do
it like this. The new weight that connects the output
layer to neuron 1 will be the old weight
minus the difference for the category cat, so
the error was - 0.53 for the cat,
multiplied by the input in the first
layer for cat, which is a 1 and all this
multiplied by the learning rate. So how
much should we learn from this error?
0.1 so indeed the cat was the
one that made the mis - where we have the
mistake, so we should learn a little bit
from that here. We have the delta for the
dog which are 0.38, the input for the dog
for the neuron one, it has whiskers
equals zero and the learning rate but
because this is zero,
and we get a 0 here, and this makes
sense because dog was correct, we don't
really want to learn as much from it as
we want to learn from the cat. This is
the old weight 0.47 minus what we're
learning from the cat minus what we're
learning from the dog for the first
neuron. And so the new weight is going to
be 0.523 so we're going to replace
the 0.47 we had and have this as the new
weight for a network. Likewise we're
going to do this with the connection for
the second neuron. This is the weight of
the old neuron 0.38 minus the delta for the
cat multiplied by the input for the cat
in the second neuron 0, and the learning
rate zero minus the delta for the dog 0.38,
multiplied by the input for the dog 1,
multiplied by the learning rate, and
notice that here we want to minimize the
contribution of this neuron a little bit
because the cat has zero, there so we
mostly wanted to learn that the
important information is here for cats.
This diminishes the weight of that
neuron to 0.342. And indeed we replace
the old value with 0.342. So look at what
we just did. We forward propagated the
feature information, so it took the
features for cat and dog, we multiplied
them by the weights, and we got - we pass
them to the activation function, and then
got output values then we measured how
wrong those values were, and we use those
values to back propagate to the network
so that our weights would be corrected.
And we're taking our errors into account
when we are correcting the weights. This
is called back propagation, and as you
can see, it has resulted in a change
of the weights. And by the way, performing
this procedure where you go forth and
then back and learn a little bit is
called an epoch. So we just performed one
epoch of training on our network. This is
the shape that our network will have in
this second epoch. It's still the same
structure, two neurons in the input layer,
two weights connecting it to the output
layer, same activation function, but it
now has different weights. Then we
calibrate it from back propagation, from
figuring out how wrong we were, and then
returning that error back into the
network. Let's run the whole thing again
so now we get - for in the in - for cats, in
the input layer, we have one for has
whiskers, zero for is a good boy times the
new weights, so this is one times 0.523,
plus 0 multiplied by 0.342. The summation of both of those
is 0.523. This is what we get here in
the output layer, for dog has whiskers 0,
is a good boy 1, 0 multiplied by five - by 0.523, plus 1 multiplied by 0.342,
is 0.342. Which is what we
get in the output layer. Again we pass
those through the activation function, so
the 0.523 that we have for cat ge -
rounded to the nearest integer, is 1
and this is the output that we wanted, a
1 for dog. 0.342 rounded down to the
nearest integer is 0 and this is the
output that we wanted. So this worked.
We used the error of the network to back
propagate it, change our weights and then
get weights that actually worked in
giving us the answers that we need. So
we have in effect trained our neural
network using one epoch of back
propagation. Notice again that here we
have a structure that takes the input 1
0, and transforms it into a 1, and it also
takes the input 0 1, and transforms it
into a 0, therefore completing a
classification task. In the next video,
we're going to work on this together.
We're going to go through multiple
epochs and you're going to be
calculating all the numbers.
