Hey all, welcome to this lecture of
artificial neural network. In this
lecture, we will be discussing about the
learning process of an artificial neural
network. Before going to that, what is
learning in the sense of an artificial
neural network?  The learning process in the case of
neural networks constitutes of the following
sequence of events :  In the first sequence
the neural network or NN is simulated
by an environment .Then the NN undergoes
changes due to this simulation and
finally the neural networks respond in a
new way to the environment due to this
change. So, the learning process is as
simple as this , if you look at this you
can see the similarity between these
sequence and how our brain learns, right?
Now coming to the formal definition of
learning : Learning is a process through
which the free parameters of a neural
network are adapted through a continuing
process stimulation by the environment in which the network is embedded and the
type of learning is determined by the manner in
which the parameter changes take place.
So based on this we can define five
learning mechanisms or learning rules
or learning algorithms.They
are error-correction learning, memory
based learning, Hebbian learning, competitive
learning and Boltzmann learning so in
this lecture we will be concentrating mainly
on error-correction learning and memory
based learning. Now without wasting
time we'll go to the error correction learning. Suppose
we provide a stimulus vector x(n)  
as the input to the network in which
neuron k is embedded.Then the actual output response to this
x(n) by the neuron k will be yk(n) and the desired response of neuron k to input x(n) be dk(n).
So we can see that there is a difference between the desired response dk(n) and the
actual response yk(n) at time n and we
define this quantity dk(n)-yk(n) as error signal, ek(n). So
what is error signal? Error signal is the
difference between desired response dk(n)
and actual response yk(n). Now the ultimate purpose of error
correction learning is to minimize some
cost function
based on ek(n). Now there are a lot of cost functions used
in this learning based on the
environment inputs etc. and one of
the most common type of cost function
is the Mean Square Error criterion which is defined as:
As i already said in each iteration, we try to
bring the value of E(n) to the minimum
possible value. So suppose E(n+1)< E(n)
Then we will
update the synaptic weights using Delta rule which is given as:
This is also called Widrow-Hoffman rule. Another name for
this is delta rule , also called error
correction learning rule, where this
η is a positive constant that
determine the rate of learning. In other
words we can say that adjustment made to
the synaptic weight is proportional to
the error signal and the input to the
synapse in consideration and this
will update the weight of the synapse
such that:
Now
with what we have learnt till now we
will try to draw the signal flow graph
of error-correction learning. So as
you all know this is the signal flow
graph for a model of  a neuron
now for the simplicity, I will consider
only one of the inputs xj(n)
and I will delete all other inputs. For a signal flow graph for
the first case let this input node be x
the synaptic weight be w and the output be y. In the second case let
the input node be w the synaptic
weight be x and the output be y. In both
cases we know that the output would be xw, right? Using the same principle, I
exchange the synaptic weight and input
position here so the input node will
contain wkj(n)and the synaptic
weight will xj(n). Now the
first thing we have to generate is the
error signal ek(n) which is dk(n)-yk(n)
for that I will
update the signal flow graph in such a
way such that I will take another node
which is fed from the output yk(n)
and is multiplied by -1. Now I will take another source node with the dk(n) as input and
produce its output to this node so what
will be the output over here? The output
over here will be ek(n), right? See from
this part we will get -yk(n) and
from this part we will get +dk(n) which  is nothing but ek(n)
Now the second part of the equation is:
And we know that:
so in order to get wkj(n),
I will take this ek(n) and multiply it with ηxj(n).
Now I'll fed this input to here. Thus we
have completed half part of this
equation, that is we have provided Δwkj
Now we have to provide wkj(n) to this
load. For that, I will take the wkj(n) from here and i will fed this
into here. Now the output of this node will be summation of this
signal and this signal. What is this
signal? This signal is nothing but
ηxj(n) * ek(n) and this signal
is nothing but wkj(n). Now this
wkj(n+1) should be the weight in
the next iteration , in other words we can
say that this wkj(n+1) should be
fed to this node only in the next
iteration. For that, we will introduce a
delay element over here. What does the
delay element do? For example: x(n) * delay element gives us
x(n-k). So using this logic, the wkj(n+1) will be used as
this new weight in the next iteration. So
to summarize, this is the signal flow graph
of error correction learning. Now another
thing to discuss is the shortcoming or
drawback of correction learning. We
know the cost function:
Now the objective of the error correction
learning is to minimize this
function. So if you draw a graph
between E(n) and synaptic weight wkj(n) where j is held constant.
now suppose we started the iteration from n=0 and with each
iteration, the graph went on like this.Now we
take the lower most point of this
function. Let this be n1. If we
take the next iteration, that is n2 ,we
can see that E(n2) greater than E(n1)
So the learning process assumes that
we have reached the lowest point at n1
and terminate the learning process but
if you extend the graph we may get
another point which is even lower. Let this be n3 and the error function E(n3)
will be less than E(n2). So
actually the algorithm should have
terminated only at this point, but due to this
local minima over here, the algorithm
terminated over this point. This is
a shortcoming. So in a multi-dimensional
space with nonlinear inputs, the
algorithm may get trapped at the local
minima and may never reach the global
minima.
Thus we have concluded the error
correction learning rule for artificial
neural networks. Now we will move on to
the next
learning rule,  that is memory based learning. In memory based learning, we
mimic the way the brain recollects
information. To elaborate on this point,
let us take an example: Suppose you are
writing an exam and you encountered a
question which is similar to one you
have encountered during learning.So even
though the values of the questions are
changed, you still know the way to
proceed to answer. So how does our
brain do this? The brain simply compares
the question with the all sets of
questions you have seen before and find
the one which is closest to this
question. That is how the brain recollects
or remembers the question. Now
this is what is implemented in memory
based learning. In memory based learning,
we feed the system or make the system
memorize or store in the system a lot
of input-output pairs. For example : Let x̅
be a input vector. This
input vector may contain a lot of inputs,
x1,x2,...etc up to x(n).
and the corresponding desired output to
the input vector x̅ is also fed into
system. Now this d can also be a vector like d1,d2
d3..etc to dn. But for the sake
of simplicity let's take that d is a scalar quantity
So for the corresponding x̅i, we get the output as di and by output I
mean desired output. Now we feed the
system with a lot of such
input-output pairs. and by lot, I
meant large chunks of data. Now suppose
the system encounters a test output x̅t and system hasn't seen this input
before so what it does is, it will compare this x̅T with all the set of
inputs it has already been fed with
and find the closest one to x̅T
suppose x̅j is the closest one to x̅T, then the desired output dj
corresponding to x̅j will be taken as
the output of x̅T.To elaborate on
this, let x̅j belong to the set of
inputs that is x̅1, x̅2,.. etc to
x̅N. Now how did we
determine that x̅j is closest to the
test input x̅T? By taking the
Euclidean distance between x̅T and
all the set of inputs that is x̅1, x̅2, ..etc upto x̅N. Now for x̅j
to be the closest, then Euclidean distance between x̅T and x̅j
should be the minimum of Euclidean
distances between x̅T and x̅i.
Now this is the most important criterion
of memory based learning , so keep this in
mind. Now to clarify this, let us take an example: Consider the neural
network to classify the inputs into two
classes, let's say class A and class B, so
basically a binary classification system.
What the algorithm does is that, when
an input pattern is provided, it will put
the pattern either to class A or class B. So
initially we must have fed the algorithm with a large number of inputs
and the corresponding outputs. Let us map that inputs in a space.
Let's take a space of inputs.
Now suppose the output to the input at
this point is A, that is ,the input at this
particular point of space is classified
into class A and let this also be A. Let
this point also be classified into A, let this point also be classified into A. Now we will
have some inputs which is classified
into B. So let the
input at this point be classified into class
B, the input here be classified
into B. Let the input here be classified into B.Here also B. So now suppose that we have
a test input over here , so intuitively we
can say that the output of this
particular input will be B. Why? Because we can see that the closest
input to our test input is in B, so this is
how memory based learning works but here is
a drawback. Suppose there was an
error while we fed the data into the
neural network and accidentally we got
one of the inputs over here classified into A. But we know that this is a
false output and the correct output
should be B and now consider the whole
situation again, we have a test input
over here. Now the closest input pattern
to this test input is A. So the
algorithm will classify the test input
into class A, but we know that this
is wrong. So in order to overcome
this, we will compare the test input
to the nearest neighbors. Stress on that,
nearest neighbors, not neighbor. So what
it does is that, it will take the
distance of test input to all the
closest neurons in the neighborhood and
we can define the size of the
neighborhood beforehand. So when the
algorithm processes the data, it will find
that the test input is closest to more
number of B's than A. Thus it will rule
out the falsely fit A and classify the
test input into B itself. Thus we have overcome the drawback.That's all
for memory based learning and now we
will move on to Hebbian Learning.
Hebbian learning is the one which tries to
resemble the physical structure of a
brain and by physical structure, I mean how the brain tries to strengthen
or weaken a neuron's synapse. It has been found by experiments that, if we
repeat a process continuously, the part
of the brain where the neurons
that are responsible for coordinating
that activity, will fire constantly and
if the neurons fire constantly, the
synaptic weights of that neuron will
increase over time. This is how the brain
perfects a process.
The opposite is also true, that is,
if we do not practice an activity for a
long period of time, then the synaptic
strength will weaken, that is, the neurons
won't be firing, so the synaptic strength weakens.
Now in Hebbian learning, we try
to implement this. We'll talk about Hebbian
learning in the next lecture. So stay
tuned. Thank you:)
