Hello people, welcome back to my channel.
The topic for today's video is how we can
generalize a particular neural network.
So in the last video, we saw how we represented
weight in a matrix representation so that
our computation becomes very easier for our
Computing engine.
Especially our Tensorflow flow or Pytorch.
So in today's video, we'll be covering what
biases and where it is particularly represented
that we have not covered in the last video.
At the first hidden layer, and then we have
two neurons and output neuron.
So then what we get is we have the inputs
that are receiving to this as this wave in
this way in this way and to this, we get like
this.
We get like this and finally, they get like
this and our output would be some via estimate.
So we know this part becomes our input layer.
That is il and this is are.
Hidden layer That is the first hidden layer.
And this is our Second hidden layer, this
is 2 and then we have a final output layer.
Now, this is used for binary classification.
That's why we have only one neuron else if
it's for multi-class classification, then
we can have a number of neurons at the output
for representing different classes.
So what we said is say, I'll just write one
particular equation say we have two parts
of this neuron representing for simple linear
computation.
And then we have some sigmoid function that
is the nonlinear function for transforming
it into some nonlinear equation.
So in the real world, not everything is linear
in nature.
So if everything was linear then each and
everything in the world would be easier.
So we have some relationships which are not
linear.
That's why we are mainly using this nonlinear
function and it also helps a particular neuron
from collapsing so it can be understood with
the analogy like the human brain has both
left and right brain concept.
Nautical so that is the same logic.
So we essentially right one output save we
have Z1 at layer 1.
So we are talking for this that is whatever
comes between this summation and all between
these inputs.
So we have X1 plus X2 plus X3 plus X4.
So along with this, you have dates as well.
So that is the very first when it is getting
and from the first input it receives.
So if you try to remember in this way, then
your equation will not go wrong.
Now, what comes our function that is for transforming.
So you apply this and you get some output
to say that this represented as a 1-1 at this
output from this particular unit.
Now if we remember like we have talked about
in linear regression that this we try to fit
a line that is y is equal to 1 Felicity So
where the slope, in this case, is represented
by the weights.
So that represents the strength of that particular
neural connection.
So this weight or the strength of the correlation
can be represented within the range minus
1 to 1 so your weight can be negative and
also it can be positive.
So it represents how strong your neural connectivity
is.
So now what we need to have this we need to
have a constant.
We need to have an intercept so our error
function when we differentiate this equation
or any equation will depend on our weights
as well as the bias.
So now we don't find any bias in that.
So bias does not depend upon input.
So what do you mean by that?
To say this is our neuron and we have our
linear function and nonlinear function.
We basically have x 1 and x 2 and we get some
output so we have this bias representing at
this particular portion.
So for each neuron, there is one and only
one single bias.
So when you have weights CW WW1 and WW2 What
goes inside this particular neuron for computation
are be into W 2 so it is a product of the
linear combination of bite bias and the wait.
So all those transformed will give you a final
estimate that is y cap.
So essentially we can see there are 1 2 3
4 5 & 6 so accordingly, we can have six different
biases.
So say the bias is represented as b 1 a First
layer B2 at the first layer B3 at the first
layer similarly You have b 1 and the second
layer be to add the second layer and you have
b 1 and B third layer.
So now our equation becomes so far this particular
unit.
You add a bias constant.
That is this one similarly when you do the
second computation for the second unit you
have I from b to form the first layer be 3
for the first layer and so on.
So essentially what we can have is say we
have this Matrix.
So we represented our Matrix that this output
from the first layer that is from all these
things.
We are represented as we had a column vector.
So this was a column Vector that was the output
from the first layer so our bias says if you
are representing B1.
So our bias would also be like this B3 angle.
So this is also a column Vector.
So now each and everything that is our weight
and bias and all the inputs.
Also, we are representing in Matrix representation.
So that becomes easy for calculation.
So that becomes essentially are tensor quantity
and matrix multiplication, which is done across
the neural net.
Look, so now what we try to do is we need
to generalize this particular equation.
So say any particular output at any layer.
We are representing it as capital Z.
So that is given by the rate at that particular
layer and we have the inputs now inputs should
be coming from the previous layer.
So input for this particular hidden layer
would be coming out of this.
So essentially the inputs that this is getting
would be you have a 1 1 1 you have a 2-1 a
3-1.
So this is the input.
So essentially we can generalize it as L minus
1 And then you have the bias at that particular
layer.
So this is the generalized equation for representing
any neural network.
So this is equally of the form Y is equal
to MX plus C. So where your particular neuron
will compute different weights and it will
adjust each and every way that each particular
layer your bias will basically adjust that
particular computation.
So essentially If you had something like this,
so this is for the case of curve fitting linear
regression.
So by is essentially represents where it has
to fit so assume like you have this particular
structure with some wireframe and you have
some units so initially you build some particular
structure and then you are tried trying to
adjust each and every neuron so that shift
in each and every particular neuron up or
down is done with the help of this bias.
So essentially you can So see geometrically
like how you can shift this either-or either
down.
Now before considering the bias, if we were
trying to build a matrix, so you can essentially
say that the rate at this particular connectivity
between this hidden layer and between this
input layer so that weight Matrix say Weight
Matrix of w and what would be the dimension
of this?
So you essentially say like it is a 3 cross
4 Matrix 3 units and four For this it will
be 2 cross 3 and 4 the third that is the connectivity
between this you have one cross two.
So earlier it was 12 plus 6 plus 2 that is
you get a summation total of 20.
So by adding the weights at each particular
layer weight will be you add 3 to this.
Plus 3 you add a 3-2 this here, you will be
having two and there you will be having my
so that is essentially 20-plus 626.
So essentially by considering the gates, you
will be learning not only the weights but
also the bias is so essentially you need to
have the idea of having how to adjust the
weights and how the neuron should be there.
So essentially that becomes the question of
General Vision so if you have a large number
of layers and thereby can have a large number
of neurons and this essentially will shoot
up.
This number will not be 26.
But in the real-time, it will be any higher
number.
So you need to learn all those parameters
to generalize your particular neural network.
Now essentially generalization is affected
or you do the generalization of any particular
neural network-based upon three different
factors.
So we will just quickly summarize those factors.
So the generalization is affected by three
factors.
So the very first factor is the training set
size.
So we build this particular neural network
with the help of our drilling set.
It is a normal case when we do building with
other models as well.
So that depends essentially upon your training
set say we are representing by capital n Second
is that you consider the architecture of your
neural net.
So this is also one major important factor
when we need to generalize your model save,
you initially have built a very large kind
of neural network and then after certain epochs
now a pop means how much you use your training
set to do one particular computation of your
neural structure or do the entire processing
and then you find that you need to To remove
some particular neurons or which are not adding
value to your particular computation.
Then you need to re generalize your model.
So they're the architectural neural network
becomes very crucial.
So essentially you took some huge amount of
time then you build on your own and over again.
Then you find that these things are these
modes are out essential or these connectivities
between these two are not required then it
takes a computation amount of time to remove
that so it becomes difficult.
And second is the Problem Complexity or the
complexity of the problem at hand So the third
problem is not or the third factors affecting
the generalization is not given that much
attention by because you're doing this why
because you want to have like to solve a large
problem.
So that's why you can't blame like your problem
is complicated.
That's why I'm not building.
Its neural network is built for that mainly
for doing your computations easier.
So essentially you just See, there are only
these two so you can achieve the generalization
by keeping either to of this constant.
So by keeping the training set size, you can
generalize the architecture of your neural
network.
So if it is found that if you RN is said fixed.
That is the number of training set size for
building.
Your neural network is fixed.
Then the complexity that is Big O notation
is given by this quantity
It is.
W represents the weights or the number of
Free parameters Now free parameters are all
these 26 parameters in this case.
There can be any number like 100 150 200 250.
So those are the number of free parameters
that is the weights and absolute represents
the permissible.
Error, say like you have 0.01 % error.
So only this much percentage of error is like
assumed to have in this particular setup.
So thereby your complexity will reduce so
like I said before if you want to remove particular
connection or particular neuron, then this
equation would suddenly change.
So that's the generalization of a particular
neural network will affect significantly.
So well, that was all regarding the generalization of a neural network in deep learning.
So hope you guys enjoyed this video, if you
got educated by this video, Please do like,
share, comment.
And if you're new to this Channel, please
consider subscribing.
Thank you very much for watching this video.
