so welcome ah to this next lecture and then
here is when we would be ah going down as
a continuation of what we had done in the
earlier one with introduction to deep neural
networks and as i said in the earlier class
that we would be discussing on to the history
of deep learning with neural networks as it
has been evolving so ah this is more of centered
around the theme of family history of ah deep
learning and how these deep neural networks
have been ah going around over there so if
you look into the origin and growth of these
networks so as we had also done been discussing
for quite some time and so what these neural
networks over here essentially are that they
are not something new
all though deep learning as such has come
to a line right in just the recent past within
ah within even less than half a decade as
of now but then neural networks have been
therefore quite a long time and as we say
that say some somewhere around nineteen fifties
is what is called as the edge of these neural
networks and this is when new the mathematical
definition of a neural network and the basic
perceptron as we see and what we had studied
in the last weeks lecture was about so these
this mathematical model was what was proposed
ah just around the edges of nineteen fifties
and then eventually what it laid down was
from a very simple model of mcculloch and
pitts of neural network in nineteen forty
three through the unsupervised way of learning
falling down a hebbian rule and then going
on to supervised learning with the rosenblatts
perceptron in nineteen fifty eight and then
eventually after a lot of delay in from from
nineteen eighty by pam and then hopfield in
nineteen eighty two was the associative memory
concept and these are what lay down as precursors
to what deep learning is today based on
so around in the time of nineteen sixties
there were some more interesting things which
started happening so initially till around
the year of nineteen fifties what was going
on is that the mathematicians were independently
working and then they were not at all they
there wasnt much of an interdisciplinary interaction
going down between different fields over there
now around in nineteen sixty they they started
to be these inter disability collaborations
between mathematicians who were working out
on developing neural networks and neuroscientists
so this is the first time when you could see
electrical engineers people of mathematics
information theory and then also neuroscience
ah researchers coming down into together and
the whole objective was can you find on whether
this whole mathematical model of a neural
network has some sort of an analogy or does
provide a plausible explanation of how biological
neurons within say the human body or within
living organism any kind of an [vocalized
noise] living organizing organism does have
the neural network and the neuro transmission
pathway
so whether it was down to the same sort of
a neuro transmission pathway over there in
another living organism so thats what was
going down in nineteen sixty so the first
one was by hubel and wiesel and what this
gives down is the visual sensory ah cells
which respond down to edges and and what they
found out eventually had a very interesting
culmination because when we get down into
those initial neural networks and trying to
do down with digit recognition you would find
out that the first few layers or the first
few hidden layers over there they would be
what are responsive to more of edges and complex
patterns of edges and in fact these discoveries
of nineteen sixty is did help us find
that within our biological neurons within
our vision system from our eyes the first
few things which we recognize that basically
they just line like behavior straight lines
curved lines or odd circular arcs these are
the ones which are the first level of behavioral
recognition which happens in order to make
us recognize a particular object and then
associate it to classifying it out so the
next one was a feed forward multi layer perceptron
and thats the standard multi layer perceptron
which we are looking over here and which we
had studied so in the subsequent lecture we
would will be going down to a mathematical
depth sent to them then around nineteen sixties
came down what is called as a neo cognition
and these the first theories which were being
proposed on with this kind of an association
with ah neuroscientists in terms of understanding
whole images so what they found out is apparently
it turns out that these neural networks as
we were initially thinking are fully connected
structures but then within the biological
system and within our bodies ah they are not
fully connected but they are sort of like
what is called as a convolutional
so instead of so if you remember clearly in
the first weeks lecture on neural network
where i was writing on the mathematical model
so you had an x into w so there is a each
is a unique weight which is associated with
one neuron and associates to another neuron
where as what it comes down from this neo
cognition perspective is that these weights
over here are not a huge family of weights
so each neuron does not have a unique weight
but its basically a combination of weights
which has a translational property so that
would mean that you can operate these with
a convolutional kind of operator so x is convolved
with a weight matrix called as w and the resultant
is the convolutional some of this coming out
over there ah then we got down into something
called as a weight replication which is across
so if my left eye has a certain sort of weight
my right eye will also have a replica of those
weights this is what it came down and as we
go into more understanding of these deeper
networks we would find out that weight replication
within these kind of sterio networks or pairwise
networks is
again a common thing which either you impose
it implicitly or if even if you dont impose
it implicitly it would turn out that they
would learn down directly naturally using
all learning rules now from there went onto
this new discovery of what is called as a
max pooling and was a very very important
concept as far as neural convolutional neural
networks within deep architectures as of today
go down then came down this idea of back propagation
or the learning rule so what we were doing
down yesterday was that gradient descent over
there but then that gradient descent so you
remember that we did take a derivative of
the cost function with respect to the weights
of the network now when we try to solve this
whole derivative over there you would see
that there would be something for a multi
layer perceptron that it will be going down
across the different depth layers so from
the final target output layer via the immediately
next hidden layer to the next hidden layer
and eventually coming down to the input layer
itself and as this whole
progresses along the depth from output to
the input thats why its called as a back propagation
so well we will come down to the mathematics
and more details of it in the subsequent layer
so this is where the history was at this crucial
learning rule on which hole of deep learning
resides today is a discovery which was from
nineteen eighty five and [laughter] thats
almost close to thirty years as of now so
going down from there is more things which
came down in nineteen eighties to two thousand
and this was a point where we had even more
complicated problems so one of them was what
is called as the recurrent neural network
which started coming down around the nineteen
eighties to two thousand then came down the
local learning within feed forward neural
networks and advanced gradient descents then
sequential led for construction which is quite
critical because what happens is that when
you have a complex problem to solve you would
not like to solve it from start to end but
then go down by a certain route and then keep
on solving it out one at a time
so its like breaking down a bigger complex
problem into through multiple number of smaller
problems over there then came down unsupervised
pre training or what we would also be doing
as auto encoders subsequently and then as
ah as we go down in the next few lectures
so we will be initially starting with going
from multi layer perceptron onto an auto encoder
and then understanding what is the relationship
between a multi layer perceptron and an auto
encoder and then that is what will be going
down through back propagating convolutional
neural networks as well so these very simple
models are what construct down the ah basic
building blocks of understanding a very deep
neural network and they were all which took
place in nineteen eighties to two thousands
from there at the start of this particular
millennium in two thousand is what is more
of heralded as the era of deep learning because
all the theories which was developed before
two thousand is
where you need a lot of compute power and
then around this time is when this compute
power software libraries implementations and
data sets and and you definitely need a huge
amount of data as well so these data sets
and everything is what started coming down
and eventually around in two thousand we had
enough of consumer grade compute power to
get these working so these too much of mathematics
to be made it solvable within a human lifetime
so today if you solve a deep neural network
you can pretty much train a very complex model
on challenges like image knit or something
within one to two days or maybe maximum of
a week with within your computers water within
your reach whereas if you look at the year
of early two thousand this would have taken
more than a month strength or some of these
problems were even what required training
for over a year and that wasnt a feasible
engineering in an idea i mean very few people
had resources to spare enough for this one
and thats one of the prime reasons why deep
learning was
still out of the reach of a lot of people
and and was not coming into the consumer space
so from there in two thousand six some interesting
things which happened was this advent of the
gpus and with nvidia and a lot of other partners
strategically positioning their business around
from just mired computer graphics generation
or some some of this mesh grid like solvers
for multi physics or physical simulations
to getting down more of a compute centric
thing and getting down architectures of memory
interfacing data transfers which are something
which are analogous to support down this high
bandwidth requirement within ah neural networks
for their implementation for data transfers
because if you clearly see i have one layer
and then via certain number of weights i connected
to the other layer so each of these layers
are what are these what require certain memory
the weights over here will also
require certain memory and this operation
in order for it to happen it will require
a lot of memory transfer so whenever i do
a x into w i would x one into w one so there
are two memory fetch operations and then a
product and then write to a memory so for
every one single operation there are three
memory operations of read write which are
going down over there and this is from a very
heavy volume ram so basically your cpu to
ram access bandwidths need to be really higher
and then these getting better and better is
what led down to the advent as of now so from
there on two thousand nine to was a gpu implementation
of d belief network and which was very crucial
in in terms of being able to get down these
belief networks working down and then in two
thousand eleven came down the max pooling
cnn so on the gpu and this was with advent
of certain critical architectures within the
hardware itself that it led to much faster
otherwise earlier max pooling i think which
had to be done only on the cpu side of it
so as we get down more into details you will
get down
where these libraries accelerate and what
are the hardware h constructs which can be
addressed and referenced down by the software
libraries directly for the best access and
then in two thousand twelve was the image
net willing winning model by alexander gritsky
alex net of two thousand twelve which is the
one which so this was the first deep learning
model which was beating down any of the classical
models for filling the image net challenge
which recently closed down in two thousand
seventeen and then got remodeled into others
so this is more of the history and in the
subsequent classes we would be touching down
on one single attribute of this history one
single model and then see how this has contributed
in a big way to what pc deep learning as of
today so ah as we have gone down through the
history the next which comes down is a family
of these deep neural networks now these deep
neural networks can typically be divided into
three families as we call them
so one of them is the fully connected networks
within this fully connected networks comes
down the concept of auto encoders and so they
ah can be auto encoder stacked ordered (( )) parts
denoising as well as convolutional so convolutional
auto encoder is some sort of a relationship
some sort of a hybrid between a convolutional
network and auto encoder itself so what auto
encoders do is typically what we will be studying
in a subsequent but to give you a very gist
of it so if i have a pattern x i would somehow
encode it through certain weights in order
to get down the same pattern x as the output
now essentially you would see that well it
does turn out what is the use of all of this
like whatever i put down in the input i get
the same as output but you see there are multiple
uses of it one is you can do a denoising out
of it so if i have a noisy input side over
here somehow i make this network so that it
gives me a noise free so you can use it as
a claim cleaning image cleaning filter or
denoising option you can use it in order to
find out a latent representation or a compressed
version of whatever it is given on the input
so if my hidden layers keep on getting smaller
and smaller than my input layer or my output
layer so somewhere in between what i can do
is if my input is some one thousand neurons
i can get down a hidden layer of hundred neurons
and if i am able to with through this network
get down a thousand neurons again back so
it means that i can compress down thousand
pixels to one hundred pixels so this is an
image compression which it can solve out so
we will come down to those examples as well
of how to get down an image compression as
well running down with these neural networks
ah then the next one is what is called as
a belief network so the typical one is a restricted
boltzmann machine which is already known quite
widely within the community so this is where
you have some sort of a boltzmann distribution
being carried down so if i have an input and
a output or i connected by a hidden layer
and this hidden layer you see which is which
produces a outputs of boltzmann distributed
so any variable state out in this inner layer
is a boltzmann distributed variable so given
any input you can get an output or given
so and input outputs are not so predefined
over here it is just a pair of x and y so
if you give a y it can also give you an x
given that the hidden layer over here is boltzmann
distributed and then when you stack them one
on top of the other that is what leads to
something called as a deep belief network
so this is where all inputs all outputs and
all intermittent are ones are directly connected
when you change all of these direct connections
or a dot product like connection to a convolutional
like connection and then that would necessarily
help you to get down a space invariance because
now you can have non locality as well address
down then these kind of networks are what
is called as convolutional networks and or
again also briefly termed as convents so today
what you would hear down as say googlenet
alexnet lenet unets then resnet residual networks
resnets these are all what what rely predominantly
on the first few operational layers in terms
of convolutions itself and are typically defined
as convolutional networks
from there comes down the next version which
is a time sort of a and a neural network which
operates on the time space itself so and its
also called as a recurrent neural network
so what happens is that the output of the
neuron gets added down to the input of the
neuron in the next time step so not in the
same time step so if you i am processing down
a sequence so the first time stepping whatever
is the output that output will be getting
down when i am trying to process down the
next in the time sequence data over there
so this is very useful for doing so natural
language processing say you want to do an
error correction measure so you would see
that often when you are typing on your smart
phones if you if you just write start typing
a message after one alphabet it starts showing
you a few alphabets or or even words over
there and as you see as you keep on typing
more alphabets more alphabets it keeps on
getting better and better in you see closer
to the exact word ok
so these are kind of things which are associated
with recurrent neural network behavior so
we will be getting down more and more details
in to of them but while we will not be doing
this say sentence or word correction kind
of behavior we would be exactly using these
recurrent neural networks for our video analytics
problem where its frames which are not so
distinct lead different but somewhat related
but come down and in a series sequence of
time and then can we used some sort of a recurs
property between their object appearance across
different frames as in a video in order to
get down an analysis of a video or classify
your video so thats broadly there are three
families in which they are and today if you
see so this deep learning thing is no more
sort of a science fiction which it was initially
thought of to be so you can get down to these
very interesting examples on so mp contemplation
is just a web site over there is google it
and find it all so what it does is that
if you see over there it it those black and
white dots over there are basically some neuron
outputs of a restricted boltzmann machine
so as it generates a boltzmann distributed
zero or one zero or one kind of output over
there so this is a perfect black bodies output
and you would get down a face corresponding
to it so its [laughter] its a bit creepy because
just by doing certain black and white black
and white or zeros and one sequences over
there you can generate a whole human face
looking down and every time it does generate
a different face coming down the other is
a paper well given a face you can use these
kind of deep neural networks in order to synthesize
different facial expressions so as we go down
into more of regulation modeling on deeper
lectures and then the next courses and in
the in the subsequent weeks which would be
a bit later on we would be coming closer and
closer to how do you even synthesize these
kind of images using some simple
black and white dots as well so from there
going on to application side of it where we
stand today say a facebooks face recognition
or object recognition whenever you upload
an image it it just says whether these people
are present over there or whether its you
or not so thats thats what has been building
up on top of the years of corpus you have
built by tagging your individual faces so
in the initial days if you remember so that
thats like almost close to a decade back when
facebook was starting up only so you could
put down images and then you could draw a
square box around those images and then tag
down your faces or or your friends over there
and that was helping them create a large corpus
and eventually initially those boxes were
all fixed size square boxes eventually they
got into variable sized but square boxes then
it gone down into rectangular irregular shaped
boxes and then not necessarily a square aspect
ratio box coming down then you can annotate
objects then and thats what help down in creating
a lot of corpus of this supervised learning
coming up
so eventually from there so baidu is large
search engine within china and this is where
deep learning power suits retrieval engine
so if you put down an image of a person and
it fetches you all possible images of the
portion and does not restrict only to photograph
so they can be even hand sketched versions
as you see in the last row over there that
can be person who is tilting the head and
some different poses and then which is a really
interesting part because if you put down some
object or a persons face you would like to
get down that portion so all other faces over
there so this is a very critical search task
which this particular kind of technology or
deep learning is helping us achieve in a real
time scenario then there is this this particular
website on cortica which what it do does is
that it pulls down random stream of images
from the web and then it starts generating
small captions over there or more typically
its like what is
present in the image and it can give you one
or two words over there and this is what it
does on the browser side there is nothing
running on the server side it works on your
browser side so and it was really a fun to
watch out so more about them is with this
kind of products on fashion so in fact now
even some of these is a big ecommerce companies
like amazon also have launched it out and
thats about where you can take an image of
somebody wearing a dress and then its it somehow
searches and finds out through its visual
catalogs and gives you the product catalog
category on the on their e store and you can
buy that sort of a dress so this is where
its going down on impacting the consumer space
as well so from there you see a huge ah aspect
of going it into self driving cars and then
autonomous driving full enormous mobility
and not much left behind is microsoft thing
so somewhere in two thousand fourteen they
started up getting this public release
on whats called as adam and today if you see
there is microsofts cortana for speech and
as your assistant for pc systems so they are
like really building up huge in terms of it
it has its own liabilities itself there is
something called as a microsoft cntk which
has a cool api where you can give you can
use that api within your website within your
apps anything which you are developing and
what this can do is given an image it can
say whether its a male female whats the edge
of the male these this kind of intuitive information
and also we have done even from spatial expression
to give down expression analysis whether the
person is angry or some sort of emotional
dependence whether he is happy he is smiling
these kind of things so this is what what
is becoming increasingly deep learning powered
ai as of today and thats where its going but
the challenge with this is even bigger and
thats where we are almost at the end now so
as as trishul chilimbi had put it down was
an interesting observation was that
this this whole thing of deep learning is
quite like quantum physics at the beginning
of the twentieth century and the reason behind
this was more of this that that experimentally
and based on practitioners and software coders
ah these experiments have been far ahead of
its time because we are getting down more
results better results coming down but the
problem is that there is another group of
people who are theoreticians and who come
down to this aspect of explained a i which
is drawing an explanation as to why is this
particular model working and thats something
which we are still not at a point to understand
heavily we know some of these explanations
but not all of these explanations and thats
a prime research which is the major challenge
within deep learning and learning with deep
neural networks as of today so as we go down
through this lectures where i will be covering
a substantial part of what works i will also
be working down on why it works and what are
possible explanations of why this particular
deep neural networks can do
and and expecting that you can also build
up newer architectures on your own side by
going through this whole route or if you have
a different kind of a candidate search which
is you have some n different number say three
or four different architectures then how can
you choose out which is architecture which
is most suited to solve a particular problem
in hand and thats what we will be doing from
theory to experiments and eventually that
thats the whole objective of this ah course
itself so finally as we come to an end i have
a few take home messages for you so one is
you do require hardware resources and what
you can do is get down any of these and then
which which are really good except for lets
say custom workstations you would need some
sort of gpus to build out and in india its
easier to get down a gtx nine eighty t i or
gtx ten sixty and from nvidia and then this
can help you create down a machine so eventually
so we have one session in which we would be
unwrapping
and unboxing a machine and show you different
parts so that you can get a hardware set your
place while we also have done the clustered
axis given down for participants of this course
so that you can get down access to an hpc
so or you can obviously buy down this say
d j its one dev box from nvidia which lets
come at a quite premium price maybe just for
a few institutional purchases but not not
for much of personal things ah then on the
tool boxes all of these are open source as
of now so you can use any of this other than
the one on matlab for which you will definitely
have to pay for the licenses but the matlab
neural network toolbox since two thousand
sixteen does support auto encoders and convolutional
neural networks as well so if you want to
read more about it go to this website on deep
learning book which is now also available
as a printed book itself from mit press you
can get this book and thats thats what we
will be using as a major reading material
over here other than whenever there is something
else
i would be putting now pointers to those exact
materials and to follow down on conferences
its nips and i see large which formed on the
major corpus of what we provide today and
disseminate in terms of newer research in
the field of deep learning so with that we
come to an end of this particular lecture
on the introduction to deep learning so in
the next class i would be getting down started
on what will be the toolboxes and toolkits
of how to get started and eventually we will
get down into writing down the main math of
a multi layer perceptron and getting into
an auto encoder and then subsequently going
into coding exercises for auto encoders as
well so when that this comes to an end and
thanks
