let's understand harder the neural
network model work for this we are going
to understand backward propagation
algorithm and in this process we are
going to understand several terms like
case updating batch updating hard weight
and bias updates and then intuitive
understanding of why it works then we
are going to understand the stopping
criteria and the analyst decisions that
can affect the operations of neural
network so first thing what you need to
understand the neural network is an ADD
creative approach and it solves problem
through iteration say at any given stage
windii in the beginning when you are
starting you put input nodes equal to
number of independent variables and you
assign random weights to these synapses
we are calling input to input a hidden
one as W 1 1 from input 2 2 hidden node
1 as W 2 1 like that
so you assign these weights usually you
assign between 0.5 to minus point 5 to
point 5 you assign random biases like
again these biases you can call it 0.5
to 5 0.5 minus point 5 to point 5 and
then you calculate your output that is
the wipe which you are trying to predict
you have a an actual output Y actual
minus y predicted gives you error that
how much your prediction is away from
the actual stuff now we will use that
error to correct these Thetas and each w
j-- that's what is backward propagation
for this it actually can do two ways one
way is that you pass one record at a
time check calculate the error and then
correct for the error so you correct the
theta bias and you correct the weights
so here you are going one record at a
time
there can be another way where what you
are doing you are actually passing so
this is case updating I talked where
what you did in fact weights are updated
are for each recorded run through the
network and once all the record has
passed through this is called one epoch
is over you can also call it super
hydration after one epoch is completed
you return to the first record or repeat
to the process that is case updating you
can also do it this way that you pass
first record calculate an error and keep
a track of that
so essentially you recorded the error
but right now you are not going to use
it for correcting these Thetas and these
w j-- so what you are doing you just
calculating the error and recording it
you passed another observation again
calculate error and keep record so if
you think of what you are doing you are
passing all the records and keeping the
record of errors now you need to adjust
bias and weights with the function of
all the errors you can take sum of
square of errors and based on that you
can correct all the errors all the theta
n WG that is called batch processing so
if you think of in batch processing what
you are doing you are in fact
calculating all the errors you know when
you're passing record one by one and at
the end of the day you are using as a
function of all the errors to correct
this not with each input so that is why
it is called batch processing because
you pass the whole all the records in
one go calculated all the errors and
then you use a function of all the
errors to correct weights and Thetas not
think of when you are passing these
errors back to these theta NW why it
gets corrected
this is hi each so suppose your current
weight and bias of the neural network
produces output call its YK and the
expected output is y this YK hat and
this is YK so what is your error error
age you calculate error is like this YK
hit hat which is the prediction 1 minus
YK hat and this is actual output minus
what you are getting now what you will
do suppose your output
it's a 0.6 and expected outwards 2.4
then 0.6 this is YK here 1 minus point 6
this is this term and 2.4 minus point 6
is the this particular term you take
this error and then you used to correct
weighted biases how do you correct it
usually like this so if these are your
old weights you will add an into error
so these are your old weights and old
auntie's are old weights and old biases
this old weight world biases these are
errors and this is learning rate
learning rate point 5 means you are
going to take it half the error learning
mean straight to means you are going to
take double of the error this is the
error that you have and if you think of
what you are doing if your error is
positive you are going to add to theta
and W if your error is negative you are
going to reduce theta and you are going
to reduce W in fact because errors are
you used to adjust the weight and it's
estimated proportionate to the error big
errors will lead to big change in
weights and a small error will lead to a
small change in the weights almost if
they there is a very small error it will
leave the weights almost unchanged when
you do this many times so with many such
updates many such high trations the way
it's keep changing until the
are associated with the weight is
negligible after several attrition the
weights are not going to change too much
and at this point which changes very
little and you think this is a final but
how will you stop how will you know that
this is the time to stop because all
said and done weight will change a
little bit here and there every time so
this is how will you stop when weights
and biases hardly changes because then
you understand it's very near to the
optimal value when errors are below a
threshold when errors are itself hatch
you know each in acceptable range you
take it fine or when maximum number of
hydration has reached if you have put
maximum address in thousand whatever has
happened in thousand titrations you will
take it as a final so these are the
three ways by which you can stop neural
network from further attrition and if
you think of there are several things
that analysts can take a decision and
that can make an impact on the neural
networks operation how many internal
layers should be there what should be
the learning our optimization rate to
adjust the weights of synapses and how
to adjust the weight of biases let me
just add that particular term here how
many nodes should be in each layer how
what should be the activation function
usually I mean in the best case when
like I should say most of the people
take input notes equal to number of
independent variables just one hidden
layer and thats how they proceed what
should be the activation function in
different node most huge function you
have already seen exp by 1 plus exp
however people are free to take other
activation function as well how to avoid
overfitting and what is your fitting
over fitting is like this error more
so let's say these are you know like a
like responder and these are non
responder if you are making a model like
this it's fine
suppose you are going for non linear and
you develop a model like this also this
also is fine but when you try to do a
like you know where you try to take care
of each and every model point and then
try to make a model that's like
overfitting and what is the problem the
problem is that this will work very good
in the development data but in the
validation data it will create problem
so never go for out overfitting this
should be you are good enough and you
should leave with some errors some
errors are bound to happen and you
should be ready to live with that so
that is how this whole neural network
works where you are using the errors to
adjust the bias and weights and through
several attrition it actually takes the
shape by which it can do the job
