so good morning ah and we will be starting
with todays lecture and thats on neural networks
for visual computing so what essentially this
would cover down as while tell the previous
ah two lectures and the one lab which we have
covered we have learned how to ah operate
on images some basic operations using the
classical way so as you start with any kind
of a visual computing task and say that my
computing tasks over here is the recognition
tasks so you would be starting down with the
taking an image then expressing an image in
terms of its features and thats basically
to compress down all the information which
you have in that big corpus of pixel space
available to you now from that when we eventually
go down as you have seen that there are features
which you have extracted out the next question
which comes down to you is can be classified
and you know in order to classify them the
simple definition which you have learnt on
on some of the related other subjects as well
as what we had defined in the first few lectures
was that you need to be able to relate certain
features to a certain kind of an object category
and this kind of a relationship is what is
called as a classification problem ok
now in order to make it even simpler so what
it would essentially mean is that if i have
an image represented in terms of some ah different
parameter say x one x two x three x four and
these are all may be scalar parameters now
if i arrange these scalar parameters into
sort of a matrix thats what we would call
down as a vector or in the standard parlance
of our definitions we would also be calling
this as a feature vector now once you have
that feature vector given to you how do i
associate a feature vector to one single categorical
label and that single categorical label may
be whether its a cat dog horse or rose flower
bus any anything of that sort and this whole
associating a whole set of these attributes
collected from a feature to something of a
categorical variable is what is called as
classification and where we stand down is
that neural networks as we know when they
came down in the early they all started down
from the perspective of ah visual computing
itself and now from that perspective here
is where we start down so what todays lecture
will contain down as in simple form called
as we will be starting down with a simple
neuron model and from there we will go down
to ah the neural network formulation and then
we go down to something called as a learning
with the read back provision and gradient
checking and optimizations now what essentially
this means is that what we would do is we
would define what a neuron is so as in a neural
network you would always have a neuron
so then what do you called something by neuron
what is its mathematical definition how do
we learn to relate down something from feature
space which are certain numbers to another
number which is a categorical variable which
you get down so from a simple neuron we will
enter into what is called as a neural network
or in same simple terms if i have some sort
of a network like behavior which is one parameter
connected onto multiple parameters or multiple
parameters connected to one parameter and
it can even go down by a successive stages
of parametric connections which goes down
and finally is what enters into what we call
as the learning and and in this learning phase
we have learning with aided back propagation
and what this does is that we define something
as a neural network and then say that this
neural network is able to classify images
and as we go down over there now one important
fact which you will have to take care over
here is that as we are learning over here
then that would mean that somehow with experience
going by that definition somehow with experience
which we are gaining we will be able to do
that task much better
so that means necessarily that as we show
it more number of samples of features and
which are associated number of categorical
variables over there then this whole network
over here which i call as a neural network
would be able to really associate any unknown
feature sample coming down to it to a categorical
variable and classifier and in order to achieve
this learning you will have to go down through
the way of gradient checking and optimizations
and then how what what rule display so as
we go on eventually you will get down to no
more in details about them so a simple neuron
model ah is something which is defined like
this so we have the mathematical equation
but lets lets get down into what it looks
like so say i have three different variables
x one x two and x three and these can be three
features ok
so features of an image as in say brightness
of the image over here is the contrast of
the image and say over here is the average
entropy of the image ok so we can have three
different features over there for one single
image given down now once we have these three
so each of them is a scalar value as you see
over here because brightness of the image
is scalar now then the contrast of the image
x two that is also a scalar well and entropy
of the image x three is also scalar and now
together what we can do is we can associate
a categorical variable say p hat over here
which is called as a predicted class or predicted
a categorical category label variable over
there now we can associate each of these scalars
with certain vector and then we can sort of
sum them up ok so x one will be multiplied
by a weight w one x two will be multiplied
by a weight w two x three will be multiplied
by a weight w three and then on top of it
what we add is a certain scalar value with
the dc offset which is also called as a bias
so essentially and then put it into a summation
such that i get this term y over here and
then my output over here so the form of y
is something which is written down as w naught
plus w one x one plus w two x two plus w three
x three so if i have some n number of variables
over here and i can have n number of bits
associated with each of them and then this
will be a n plus one term summation which
comes down over here now from there as we
see we can also write this kind of a form
in terms of a matrix representation and that
sort of a matrix representation is what is
given down over here ok so what w is basically
its its a matrix of collection of all of these
bits and b is basically so over here as in
since we have only one predictor variable
this is supposed to be a a scalar value bias
and that is equal to w naught itself and x
is another vector of all the ah scalars which
are arranged in terms of a matrix over there
so this is a matrix form of representation
now once i have that one what i can do is
i can relate this y to my predicted class
level in terms of some sort of a nonlinear
function and we will say that is an f nl now
this f n l can have two different forms some
common forms are something like this the first
one which is also called as a sigmoid form
of non linearity the second one is what is
called as the tan hyperbolic form of non linearity
now what it essentially does is that this
my x s over here i dont have any sort of a
control over my x so these can be anything
from minus infinity to plus infinity for the
purpose of simplicity we keep down the fact
that let these be real valued numbers and
not complex valued numbers that that does
assist us in getting down a mathematical tractability
to the whole problem
now all of these w s for me they can also
be ah some weight some scalar weights which
can vary in an open range and open range in
a sense that they can be from minus infinity
to plus infinity now that i have these also
open ranged then what i get down from this
y over here can be an open range problem so
that can be anything from minus infinity to
plus infinity but this p hat over here if
that has to be a categorical label so maybe
a true false question or like i want to classify
it into two classes whether this is a ball
or this is not a ball so it says zero and
one problem over here now if this y comes
to me that can be anything from minus infinity
to plus infinity which would typically make
a real challenge over here so you will need
to some sort of like map it down into a zero
to one problem now one way in which you can
map it down to a zero to one problem is that
just put a threshold over there say that if
the value is greater than zero make it one
if the value is less than zero make it zero
that thats obvious and then its its pretty
much possible there is no harm in that but
what we do is we also make use of these kind
of functions say a sigmoid non linearity so
what it would do is that as my y becomes tending
towards minus infinity this is a value which
goes to zero and as it tends to as plus infinity
or a very high number this is a value it goes
to one
so typically my sigmoid function over here
gives me a value in the range of zero to one
now in the same way as i go with my second
non linearity which is called as a tan hyperbolic
so you can put down your values of y is varying
from minus infinity to plus infinity and you
can very intuitively see that as the value
of y goes down to minus infinity this value
tends to minus one as the value goes to plus
infinity this value tends to plus one and
ah so there is the you can while you can as
well use down a simple threshold which is
say just give down the argument that if it
is greater than zero make it one less than
zero make it zero ah yeah but you can also
be using this kind of an argument over here
now as we go down a bit later on we would
find out why this is an argument which is
like these two kind of nonlinearities are
something which are preferred for making that
decision
so from there lets get into now that we know
about a simple neuron model the next point
is how do we construct a neural network and
now for that lets look into this one so if
i have three scalar values over here so what
i can do is i can associate it to some different
number of patterns which i want to different
number of predictors which i want to do so
it may be that based on these three features
so x one being the average brightness x two
being the contrast and x three being the average
entropy on the image i want to classify whether
that is a ball in the image yes or no and
whether the image is a photograph or image
is a painted one so maybe two different ones
so p one is the standard problem of whether
there is a ball in the image or there isnt
a ball in the image and now i can write it
down in this form so what happens here is
that as you see that these weights now got
down subscripted dually now with it with this
dual weight subscription what happens technically
is that the first subscript over here is the
target pattern or total the target class to
which i want to classify and the first subscript
is the one which connects down which feature
is being connected to which target neuron
over here ok
so thats how its done so the feature x two
connected to this class predictor p one is
via through the weight of w one comma two
accordingly so this is what it you can get
done for y one and p one in terms of an equation
now if i get down another parameter p two
and thats what i was saying that do you have
another different thing to predict and that
may be that whether its a natural image or
this was a sketched image now for that you
will have a similar set of equation which
you get down over here now you can clearly
see that using the same set of features by
just varying down the weights over here you
can make two different classification outputs
two different questions can be asked and their
outputs can be designed over here
now similarly i can extend it to some n number
of some some k number of neurons over here
so on the input side my j th neuron will be
linking on to my k th output neuron why are
the via the weight of kj and this is a generalized
formulation which you would get done now what
happens after this is that if you arrange
all of these scalars which are y is in terms
of matrix arrangement because each is independent
so you can arrange you would see that all
these weights also start getting down into
a matrix arrangement so all these weights
are initially row vectors and all of these
y s together is what forms a column vector
so you can stack down all of these row vectors
now once you stand on all the row vectors
you get down a rectangular matrix over there
of w and v and that is this with this matrix
combination which you see over here in this
equation and then accordingly your predicted
neurons will also be stacked into one single
matrix and that is called as a p while this
non linearity which you see that non linearity
was applied on a scalar and thats why this
can be extended on to a matrix valued form
and then so a scalar non linearity function
can be applied anywhere on a vector valued
function and that will give you the same sort
of a vector output which comes down over here
now essentially what this helps you is that
you can relate down some j number of input
neurons to some k number of output neurons
in in straight simple terms now from there
once you are able to relate it down next what
comes down is that i will have certain sort
of error when i am able to relate it down
it means that so using these three features
and some combination of weights which are
present over here i will be able to predict
whether that is a ball or not
now for every image whether there is a ball
or not there is a true value which i know
and there is a value which is coming down
from this neuron itself ok now the difference
between the true value say in the first case
there was a ball but it predicted there wasnt
a ball so there is its an error its a clear
case of an error so but here what i would
get done is that the error value is one ok
in the other way round where say there wasnt
the ball so this one was zero but it predicted
that there was a ball thats also an error
in case both of them predicted that there
was a ball and the ground truth is also a
ball then its ok you dont have an error if
it predicts it there wasnt at the ball and
the ground truth also says that there isnt
a ball it means that it is ah its its a correct
case so similarly we will have it for the
second predictor as well now if you see all
of these predictors are independent of each
other so it means that the errors are also
independent of each other now in that case
typically what we do is that instead of trying
to take down a sort of direct summation over
that the best way is to actually take a euclidean
distance between the predicted vector over
here this p that becomes a matrix
now and the actual ground truth which is so
between your p hat and your p so this will
give me a single scalar value and thats what
is my error e ok now the whole statement of
learning over here is in a sense that i should
be somehow able to get down a network such
that this error over here is supposed to become
now supposed to come down to zero now what
essentially happens in that case is we use
a method called as error back propagation
so what it would do is say i have a set of
observations x one ok so this one over here
is no more related to one particular feature
but that one which we are putting down over
here in this relationship is actually which
is related down to
which particular sample number i am speaking
about ok
so what i do is that i take one image the
first image for which i know the ground truth
for all the predictions i want to do and i
have my predicted output from my network coming
down i take the second image which is ik and
and its features are x two and i have my ground
truth and i have my predicted value similarly
i keep on doing such that i have n number
of images in something which is called as
a training set so what happens in a training
set in this kind of a problem which is a supervised
learning problem is that you have a set of
images some n number of images and for each
image you also know what is the class label
given to it so here we were asking down two
questions whether this there is a ball in
the image or not and whether this is a natural
image or its a rendered image or hand drawn
image kind of thing
so there are two vectors over here which i
there is a two dimensional vector or two parameters
which i want to predict down to class levels
so that should also be known to me so there
are n number of such images on which this
is given down and thats whats called as the
training set ok now on the other side i am
going to predict out all of this with a certain
given form of my weights over there now initially
what i would do is i would start with a neural
network in which all of those weights are
randomly initialized so there are some random
values now once i start over there so i would
be able to get down this difference so this
difference is coming down for each so i get
the euclidean distance for each sample so
for x one x two x three similarly it up to
x n i get down this difference coming down
for n number of samples nomadic once i get
down this difference coming down for all the
n number of samples i can take a simple algebraic
summation over this whole dataset and that
will give me the error in the data set now
i write down that in terms of something called
as a jw or a cost function j in terms of this
parameter w the reason why we do that is if
you look carefully into this one
so my p hats are what are dependent on all
my excess over here ok but these x values
they dont change in the whole data set right
the only thing which changes within the network
now which can impact this whole function is
the weights of the network and that is why
this j is written in terms of jw so we will
eventually get into that part as well now
the whole objective is that i want to get
down a particular value of w which is the
only thing variable and adjustable within
my neural network such that my cost function
over here is minimum and this has to be minimum
when you need to have done the minimum error
so as you can see if all p ns match all the
pn hat you would get down your minimum error
over there as you get down your minimum error
in this case this has to be zero so your jw
is zero so what i would do is i want to get
down this weight w somehow coming down such
that my total error over here be comes down
to c and what we do is we use this particular
form of something which is called as a gradient
decent way of solving this one
so the problem the the statement as it goes
down is something like this that you initially
start with a random guess of w within a k
th iteration so my k at the start of it will
be k equal to zero ok so at k equal to zero
i start with some w over here now with that
w i will be able to get down my these predictions
over here p one to p n hat from there i would
get down my jw and accordingly i can solve
out this differential equation over here which
is a partial derivative of the cost function
with respect to my weight at that particular
instant now once i am able to get down this
partial derivative i will just subtract it
up over there and get down my new estimate
of weight which is my w of k plus one now
with this w of k plus one i would again start
with the whole process i would get down all
of these predictions from there i would get
down get down as the j w for w k plus one
and then i would iterate it over such that
at some point of time i would reach down this
minimum value of w and then just stop over
here ok
now essentially through this whole thing what
we are doing is what is called as gradient
descent learning so in this gradient descent
learning what happens is that you have your
w and and if you look over this is what is
the gradient over there now as you see this
negative sign so this is something which is
going opposite to the gradient or what we
called as descending against the gradient
so there is a whole decent with this along
the direction of this gradient which goes
down now in that what essentially happens
is something like this that as i have my different
epochs varying from zero till k so these changes
so as my k varies from one to three and goes
up to k over there so my weights will be changing
and accordingly my cost function over here
would be changing and i would be getting down
this some sort of a curve in terms over here
and finally when it when i reach the minimum
point over here this is my stopping criteria
this is what i would observe now ah
once we see that this is what we have observed
the interesting part is that lets look into
what happens in the weight space itself so
if i plot down to different plots over here
say that i have a very simple neuron we just
associate two features to one single output
to give it much simpler so that you have just
a two dimensional space over here and this
third axis over here of the third dimension
is the cost function now as we see we start
with some random initial value over there
and with this random initial value so we will
have a point on the weight space as well as
in this space of epochs versus my cost now
accordingly i update so i go down to a next
new value of w s w one and w two new value
based on my update rule over there on the
gradient descent now with that i will get
down a different value of cost coming down
over here accordingly i get down to my next
one and then it keeps on updating and as this
whole update happens this will come down to
a point which would go down to this minima
and
now if you can clearly see the interesting
part is that this looks like an egg gasket
design or where you have a lot of minimize
and maximize which are just spread around
and this is a pure case which can exist i
mean in in most of our neural networks this
is the major fallacy which comes down and
as we go down deeper and deeper we will come
down to a much better understanding of why
this fallacy exists now typically the objective
is that maybe you are at some value of w which
is not exactly the minima so in case you are
at the minimum value of w in the first epoch
itself you get j w is equal to minimum or
equal to zero basically the absolute minimum
and then you dont have to worry about it in
case you are at some other value which is
more than that then it would eventually scroll
down and come down just to this minima and
thats your learning problem so with this the
basics of actually comes to an end and we
will be doing it in the next lecture where
we have a lab session ah with with a understanding
of how to implement this one and then through
that implementation you will be able to relate
down the ways of connecting these input images
via certain features to or class of outputs
now finally if you want to read more in details
about these neural networks and the best book
to go through is actually simon akins book
on neural networks and learning machines and
for toolboxes well we would be using cycads
learn on python
you are also welcome to use down matlab with
the neural network toolbox which is has a
much better gui and given the fact that a
lot of people are most experienced on matlab
you might be able to use it much faster but
we would be sticking down to our options of
doing it on python the other implementations
are obviously based on lua torch which also
has acceleration with cu dnn as well and and
the next one on which we will be doing our
deep learning things in the next week onwards
is what is called as pytorch and pytorches
basically this original library which was
implemented on lua that got full completely
ported on to for integration with python
so its a syntax and and the base library of
torch which is one of the fastest ones as
of date integrated within python to work it
out ah so thats all i have for this particular
lecture and just keep on tuned for the next
one where we do some hands on as well so with
that
thanks
