Hi everyone and thank you to be here
with me in this video. So I decided to go
into English for this video because the
topic we will talk about in this video
it's the BACPROPAGATION. And it's a really popular topic in the Artificial
intelligence field. So I really recommend
you to take a pen and a paper. And
try to do by yourself what we
will do in this vide. If you have any
question please leave a comment and by
the way if you like the video if you
like the channel please don't forget to
subscribe and to like the video.
THANK YOU :) ! 
Okay start to go deeper into the
backpropagation algorithm ! let's go ! :)
So we know that a neural network can be
represented globally like that. Neurons
can connected to each other through what we call a layer. In the case of
supervised learning back propagation is
a common method to train a neural
network by adjusting weights through
error calculation in each iteration. I
could describe the backpropagation
algorithm through a lot of letters and
complex numbers but I will describe this
algorithm by a really simple
step-by-step demonstration. Please do not hesitate to pause the video and do the
calculus by yourself. It's an exercise I
really recommend to each artificial
intelligence engineer. Knowing the
mathematics hidden behind lines of codes
is really important. For our
demonstration in this video you just
need to know how addition multiplication
and subtraction works + partial derivative.
Our demonstration will be
based on a simple neural network.
My inspiration for his demonstration has
come from Matt Mazur do not hesitate to
check his blog post in the description
of this video. So the neural network we
will use has one input layer one hidden
layer and one output layer with two
neurons on each. We will feed our neural
network with two numbers here in yellow.
And we are looking to project 0 & 1 here
in red.
First of all we initialize the weights
of our neural networks randomly. We also
set a bias in each layer here in green.
The first phase here is the forward
propagation. We will in this phase find
the value for each neuron here in blue.
we will calculate the value of the j1
neuron in order to do so we will use the
two inputs and the two edges connected
to j1. We calculate the value of j1 by
this formula. A really simple formula
isn't it ? We replace all the values and
we obtain 0.5 for j1, but the work
doesn't end here. 0.5 is the value for j1
in the network. But now we need to
calculate the value of j1 will deliver
to the rest of the network. To do so most
of you know that we are going to use
what we call the activation function, in
order to break the linearity of our
neural network. We will use here of a
sigmoid function as our function
activation. And we obtain as output for
the neuron j1 0.62. We repeat exactly the
same process for j2. But please note that
we change the edges and by doing so the
values. Here we use W3 and W4 as
weights. Here the output of j2 is 0.61.
And the input around 0.46 we can
update our schema with those new values.
Now we can calculate the values of our
output layer that we'll compare later to
our targets. We hope here to get as close
as we can to 0 and 1 but as the
initialization was random we will have
to be very lucky.
The formula is exactly like previously
the simple combinatorial formula of the
sum of the input multiplied by the
respective weights. Exactly as previously
here we can calculate the input and the
output of each neurons of our output
layer.  For o1 we got 1.42 as input and 0.80 as output.
Following again the same process for all
- we got 1.51 and 0.82. We can
also report all those values on our
schema. Do not hesitate to pause this
part of a video and try to retrieve the
same numbers. You can also put new
weights values at the initialization and
change the value of input and output and
feed you own neural network. Now that we have calculate those values we can
notice that we have not been lucky at
all. Instead of getting zero
we got 0.18 and instead of getting 1
we got 0.82. The second
value is not too bad however. Now the
real work of a back propagation
algorithm can start. We will calculate
the error of each part of our neural
network and adjust the weights to get
closer to the values desired in output.
Here in this video we will just do one
iteration because it could take hundreds
of iteration before getting to a value
we want. But do not worry in the next
video we will implement this method in
Tensorflow.
and we will see how machines are our
friend to be quicker here. We will
calculate the error for each neuron. We
can decide to remove the 1/2 if we want
here here we introduced have to cancel
the times 2 introduced later by the
derivate of the error. So it is just for
simplification and it doesn't matter if
we introduced a constant here as the
error will be multiplied anyway by the
learning rate that we will see later. So
we calculate the error for the output
layer. Here the value we were expected
was 0 and we got 0.80.
So the error of oh one is 0.32 using this
formula doing the same for the error of
o2 we got 0.0162.
So the total error of this first iteration is 0.3362. Now that we know this total error we are
going to try to find the impact of each
weight in order to reduce this error in
the next iteration. So we will calculate
the impact of W8 here on the total
error. It is what we call the partial
derivative of E_total on w8. We can also
call it the gradient with respect to W8. 
And later we can calculate the same for
W4 and for any over weight of any other
layer in our network. So let's calculate
the impact of w8 on the total error by
applying the chain rule.
You can refer to the Wikipedia article
if you need it or you can trust me ! :D
We can write this formula visually it is
what we do to calculate the impact of
w8 on E_total. We need to go back through each step of compute that allowed us
to calculate the error total during the
forward propagation phase. For the first
part we had previously defined the error of o1 and o2. And we defined the
total error as the sum of the o1
error and the error of o2. So we can
derivate this formula with respect to the
output of o2. And we got only terms
where out_o appear. Do not hesitate to
pause the video and to calculate on your
own this derivate function. Here we got as value -0.18. Let's get on the
next term which is really simple to
compute as between the input and the
output of our neuron we only have the
activation function here the Sigma it as
reminder. So we just need to derivate
the sigmoid function and we obtain this
really simple formula which lead us to a
value of 0.1476.
And finally we need to retrieve the
impact of w8 on the input of o2 to do so
we took the formula of the input of o2
that we previously used and we derivated
the formulae with respect of w8 .And we only got the
output of a neuron j2 which is 0.61. So
by combining all those terms we can
retrieve that the partial derivative of E total with respect of w8 is -0.016.
Now we can calculate the
updated value of w8 which is the value
of W8- learning rate which
multiplies the partial derivate we just
calculate and we obtain a new value of
 0.5328 to compare to a
previous one of 0.52.
Through was compute we can simplify a
formula to calculate the partial
derivative for each weight. We can now
compute the updated value for each of
the weights on this layer. I recommend
you to do that by yourself. Now that we
have seen how we can apply the
backpropagation on a output layer let's
apply the algorithm to a hidden layer.
Here we are going to compute the new
value for W4 . So as previously with
the chain rule we can write the partial
derivative of the error total with
respect of W4 as is formula. First we can decompose the partial derivative of
the error total as the sum of a partial
derivative of the error o1 and o2.
And then we can decompose each partial
derivative we can notice that more we go
back deeper more the complexity increase. Imagine doing so for a layer of hundreds
of neurons in a network of thousands of
layers... It is why we have deep learning
frameworks such as tensorflow that we
will use in the next video.
By replacing with values we got the value for the Error of o2 with respect of the output of j2.
By following the same process we also got the value for the error of o1
with respect of the output of j2.
Finally by combining those two last
terms we got the partial derivative of E_total with respect of out_o2
for the second term it's really simple
to compute and we got 0.16. Similarly for
the third term here which is just i2 when we derivate with respect of
w4 we got 0.02. And finally by combining
all those value we got a really small
number as you notice. It means then the
impact of w4 on the error total is
really low. And finally we can adjust the weight of w4 which does not change a lot.
You now know how you can compute those values by applying
the backpropagation algorithm. So and at the end you got the new weights for the
hidden layer and the new weights for the
output layer. So I hope you learned a lot
in this video. It could be a really
complex for a lot of people. I really
recommend you to go back in this video.
Don't hesitate to pause when you need
it and try to calculate the other weights by
yourself. In the next video we will see
how we can implement this backpropagation using tensorflow. And it will
be the start of coding and I know a lot of
you are waiting for that ! :) I hope you
enjoyed this video and see you in the
next video bye bye ! :)
