neural networks are one of the most powerful machine learning algorithm
and today we are going to mention how learning is handled in neural networks
learning is handled by backpropagation algorithm
it is called backpropagation because
prediction is applied from left to right
on the other hand, learning is applied from right to left
reverse direction of prediction
learning or training
we would propagate backwardly
from right to left
learning in neural networks is actually based on updating all of these weights
and this algorithm decides this updating ratio
for example
suppose that this network produces netoutput-y output
as prediction
and suppose that it should produce actual as expected
this means that network made an error
network made an error in ratio of actual one
minus prediction netoutput-y
and mostly mean squared error is used as an error metric
in this case,
we need to calculate power of two of this difference
and over 2
MSE is common because its derivative is in simple form
now question is that
how much this calculated error should be reflected to ... for example ... w15
this question refers to calculate ...
derivative of total error with respect to w15
similarly
how much this error should be reflected to w12
this means
we need to calculate derivative of total error with respect to the w12
now we will mention these calculations
I have extracted the related pieces of w15
backwardly
from right to left
w15 connects h5 unit to y unit
as seen
and we need to calculate the
derivative of total error with respect to the w15
here chain rule will help us
and
I'll write it here
chain rule
says
if a variable z depends on variable y
and variable y itself also depends on variable x
in this case
variable z depends on x as well
and we can show derivative of z with respect to the x
as like that
this calculation can also be expressed as
derivative of z with respect to the y
times derivative of y with respect to the x
this is chain rule
and we will apply this chain rule
like that
before that, every unit serves activation function
for example, suppose that we would apply sigmoid
sigmoid function as an activation function
in this case
input would be changed as
1 over 1 plus e to the power of minus x
and we will apply this chain rule  for this calculation
like that
derivative of total error with respect to the  netoutput-y
times derivative of netoutput-y with respect to the netinput-y
finally, derivative of netinput-y with respect to the w15
let's calculate all of these calculations
for example, let's say 1, 2 and 3
before that i need to clear this
give me a second
for number 1
I need to calculate
please remember that
error was (actual minus netoutput-y)
squared
over 2
then, derivative of error with respect to the netoutput-y would be
with respect to the netoutput-y would be
power, 2, comes to beginning
times
actual minus netoutput-y
power would be 1
derivative of the term in the paranthesis would be
-1
because derivative of -netoutput-y would be -1
times 1 over 2
multiplying 2 and 1/2 would be 1
and here
netoutput-y minus actual
it is the result of 1st equation
let's calculate number 2
again, this unit applies activation function
and
suppose that sigmoid is the activation function
for this example
please remember that
sigmoid function is 1 over 1 plus e to the power of minus x
and derivative of sigmoid function would be
y times (1 - y)
you can watch sigmoid lecture
I've also published that
based on this rule
derivative of netoutput-y with respect to the netinput-y
would be netouput-y times (1 minus netouput-y)
and finally
equation for number 3
please remember that
w14 and w13
are connected to this unit
and here h5, h4 and bias unit
connected
I mean that netinput-y is calculated as
(w13 x 1) + (w14 x netoutput-h4) + (w15 x netoutput-h5)
now, I need to calculate derivative of netinput-y with respect to the w15
then, this would be ... derivative of this term is 0
after then, derivative of this term would be 0 too.
because they do not depend on w15
but this term depends on w15
that's why, derivative of this term would be ...
netoutput-h5
we have calculated all of these 3 calculations
we need to multiply them
the 1st one is ...
netoutput-y minus actual
that is the number one
number one's result
then, number two's result would be
netoutput-y times (1 minus netoutput-y)
times result for number 3
that would be netoutput-h5
that would be derivative of total error with respect to the w15
notice that we can say all of these terms as sigma h5
sigma h5 times
sorry
sigma y times netoutput-h5
I'll save this screen
after then,
we need to ...
for example,
how to change w15
it would be like that
w15 is equal to w15 minus alpha times
for example, alpha can be 0.1
that is a small number
and this one
derivative of total error with respect to w15
we can also ...
for example, here I have extracted related pieces for w12, in this case
again, we'll calculate ...
derivative of total error with respect to w12
we can apply chain rule here
like in our previous example
derivative of total error with respect to netoutput-y
n-o-y, I said
after then, derivative of netoutput-y with respect to the netinput-y
after then,
we can jump here
derivative of netinput-y with respect to netoutput-h5
after then
derivative of netoutput-h5 with respect to netinput-h5
and finally,
derivative of netinput-h5 with respect to the w12
we can calculate all of these equations
like in our previous example
and multiply them
after then,
change the w12 like that
w12 is equal to w12 minus alpha times
we can say 0.1 for alpha
and derivative of total error with respect to w12
and all of those multiplications
would be derivative of total error with respect to w12
so, in this video we have mentioned the math behind backpropagation algorithm
it can be evaluated as complex
but
if you understand the concepts
you can apply it in practice easily
thank you for watching
and see you next time
