In the last video, we briefly talked about
the history of neural network research,
and we discussed how in the early 2010s
neural network architecture started 
to win machine learning competitions.
In particular, there was 
the ImageNet-competition
which involved classifying images 
in which the neural network entry
did significantly better
than any non-neural network entry.
And, interestingly, this is even so,
the non-neural network entry 
used hand-coded features
and hand-tuned algorithms.
This reminds us a little bit 
of the development
of machine learning in go-playing 
where AlphaGo actually succeeded
and improved by reducing 
the impact of hand-coded features,
and replaced them with features 
learned entirely automatically
from large amounts of data.
Now, what is it that made
the neural network entry
and ImageNet so successful?
Today, we would call 
the kind of architecture
that was entered into
ImageNet and won it
a deep neural network 
or a deep learning architecture.
And, even though 
this is a bit of a buzzword,
there are a few typical characteristics
that define such neural networks
and distinguish them from 
previous uses of neural networks.
So, what happened? 
What was different?
First, the very design and architecture
of the neural network was deeper --
and more structured.
And I'll define what I mean 
by both of those terms in a minute.
Second, there was
a huge data set provided.
The data sets used in these competitions
grew bigger and bigger every year,
and it turns out that neural networks
seems to do really well
when there's a lot of data provided.
Lastly, the team that entered 
this neural network
used graphics cards which were developed
essentially for faster gaming
and better graphics and games,
but that also do an operation 
called matrix multiplication
that's used extensively
in neural networks,
and they use this to train 
the neural network
several orders of magnitude faster than
will be possible with regular processors.
So this allowed them to train the network
for a long time on large amount of data,
and it turns out that just doing 
more data and more training
makes a big deal with neural networks.
So let me now go back to my first point
that these deep networks are deeper,
obviously, and more structured.
Remember, I talked about
multi-layer neural networks
where, instead of just having an input 
that goes directly to the final output,
there's intermediate weighted sums
and non-linearities.
And each layer of those weighted sums 
and non-linearities,
in between the input and the final output,
is called a hidden layer.
In deep neural networks, there's typically
many, many, many hidden layers,
and many more that when used,
even in the 80s or 90s,
where you might have one or two
hidden layers in typical neural network.
Nowadays, it's possible a few dozens 
or easily hundreds of hidden layers
in modern deep architectures.
Now, I should know two things about this.
Prior to the rise of deep architectures,
training networks with
many, many hidden layers
ran into various kinds of
technical difficulties.
However, by tweaking with 
a non-linear function,
it turned out that it actually is possible
to train neural networks
with many hidden layers and resolve
some of these computational difficulties.
The other thing I'd like to note is that 
we don't have a very good sense
about why having more hidden layers
helps improve performance.
We have some ideas.
Generally, if you look at neural networks
with many hidden layers;
for example, I've diagrammed 
this prototypical example neural network
that takes in images,
so it takes in pixel information and output,
for example, the name of the person;
you might see this being used 
in Facebook, for example,
when it recognizes 
your friends from pictures.
In a network like this, when 
there's many, many hidden layers,
and we look at the kinds of patterns
that seem to activate the neurons
in the different layers,
they seem to become, in some sense,
more and more abstract and conceptual.
So, at the earliest layers,
what really turns the neurons on,
are things like edges 
or high contrast spots.
At intermediate layers, the neurons seem 
to be activated by things like
noses, ears, mouths, parts of the face.
And towards the final layers,
it almost seems to be that
the neurons are responding
to what might be called
prototypical faces
or some kind of underlying variation
in the types of faces and 
expressions that people have.
So we see that by adding more layers,
it might be that we're able to capture 
higher and higher-level concepts
and more abstract concepts that 
are then recombined in useful ways.
So essentially, we can think of 
deep neural networks
as encoding some assumption that 
the kinds of data we're interested in
is frequently hierarchical.
It has many scales and it reuses 
some of the lower scale components
in various ways in the higher scales.
The other thing that was different between
more recent deep network architectures
and more traditional approaches 
in neural networks,
was that many of the deep learning 
architectures have a lot more structure.
So here on the screen,
on the left-hand side,
you see a more traditional neural network.
Even though it has many hidden layers,
you see essentially all the neurons
in one layer are connected
to all the neurons in the next layer.
For example, the winning neural network
in ImageNet that we discussed previously,
we show the topology of that network
on the right-hand side
using a kind of block diagram.
Each of the cubes represents 
a whole group of neurons.
Here, you can see that
there's a lot of structure there.
There's kind of two streams,
the sizes of blocks are changing,
some of them are densely interconnected,
some are not interconnected.
So there's a lot of knowledge
and design put into
how the neurons are 
interconnected to each other.
I should also add, especially 
for image tasks including ImageNet,
what's often used are 
so-called convolutional layers.
Convolutional layers have very structured
repetitive weight patterns.
And so they also impose 
a kind of constraints on the neurons
and impose a certain kind of structure
on the connectivity pattern
that's possible for the neural network.
So we see that, unlike more
traditional neural networks,
deep nets are often very structured.
They don't just have everything
connected to everything else
as was assumed to be acceptable before.
As I mentioned, designing such architectures
requires quite a bit of domain knowledge,
and it's actually more 
of an art than a science.
People don't really 
understand how it works,
but it seems to make a big difference
on the performance of the neural networks.
But interestingly, there's been 
some recent work showing
that we can actually train 
machine learning algorithms
to themselves design the topology 
of the neural networks
which are then trained on big data sets.
And this is very interesting because 
it's a kind of meta-learning
or meta-design of machine learning 
algorithms designing
other machine learning algorithms
and doing just as well 
or even better than people can.
So, probably, this is 
the beginning of the singularity.
Now, given this recipe that I mentioned
of large amounts of data,
lots of computing power and training 
on graphics processors,
and structured architectures of 
the connectivity between neurons,
deep networks are coming to dominate
almost all domains of machine learning
or at least many, many of them.
We already talked about image recognition,
classifying images according 
to the object inside of them.
Now, voice recognition.
So many people notice that, 
for example, Siri on the iPhone
or the voice recognition on Android 
got much, much better all of a sudden.
They could suddenly recognize 
what people were saying
with very high accuracy.
A lot of this was due to neural networks
and deep neural networks
being used in this application.
Translation is another aspect.
So machine translation translating from
one language to another in human language
is traditionally an extremely 
difficult task for artificial intelligence,
and it's thought that statistical models 
like neural networks
and many other kinds of 
machine learning algorithms
would never really do 
very well at such tasks
because they're too structured.
There's too much syntax 
and too many rules to follow.
It turns out that, given enough data,
deep neural networks 
actually do great at this task.
And, if you've used Google Translate,
they moved from a system 
that used hand-designed features,
designed over many decades by linguists,
to essentially training
a huge, deep neural net
on large bodies of text from 
the internet,
and it does better translating 
than the hand-designed algorithms.
And finally,
we already mentioned things like 
video games and board games
being supervised learning problems.
Brendan talked about the development 
of go-playing algorithms,
and, actually, a big chunk of 
the machine learning
that was used in AlphaGo
was a deep neural net.
And so, in combination 
with other techniques,
we saw an AlphaGo that deep neural nets
actually solved an AI task
that was thought to be intractable
for many, many, many years.
And there's many other examples 
of deep learning doing very well
at tasks that were thought
to be very difficult.
