Let's dive into artificial neural networks and how they work at a high level. Later on we'll actually
get our hands dirty and actually create some,
but first we need to understand how they work and where they came from.
So it's pretty cool stuff,
I mean, this whole field of artificial intelligence is based on an understanding of how our own brains
work,
so, you know, over millions of years of evolution nature has come up with a way to make us think and if
we just reverse engineer the way that our brains work we can get some insights on how to make machines
that think.
So within your brain, specifically your cerebral cortex which is where all of your thinking happens,
you have a bunch of neurons,
these are individual nerve cells and they are connected to each other via axons and dendrites. You can
think of these as connections, you know, wires, if you will, that connect different axons together.
Now an individual neuron will fire or send a signal to all the neurons that it's connected to
when enough of its input signals are activated. So the individual neuron level it's a very simple
mechanism,
you just have this cell, this neuron that has a bunch of input signals coming into it and if enough of
those inputs signals reach a certain threshold, it will in turn fire off a set of signals to the neurons
that it in turn is connected to as well.
But when you start to have many, many, many of these neurons connected together in many, many different
ways with different strengths between each connection,
things get very complicated.
So this is kind of the definition of "emergent behavior,"
you have a very simple concept, a very simple model, but when you stack enough of them together you can
create very complex behavior at the end of the day
and this can yield learning behavior.
This is actually, this actually works and not only works in your brain,
it works in our computers as well.
Now think about the scale of your brain.
You have billions of neurons each of them with thousands of connections and that's what it takes to
actually create a human mind.
And this is a scale that, you know, we can still only dream about in the field of Deep Learning and Artificial
Intelligence, but it's the same basic concept,
you just have a bunch of neurons with a bunch of connections that individually behave very simply, but
once you get enough of them together wired in enough complex ways you can actually create very complex
thoughts, if you will, and even consciousness. The plasticity of your brain is basically tuning where those
connections go to and how strong each one is
and that's where all the magic happens, if you will.
Furthermore, if we look deeper into the biology of your brain, you can see that within your cortex neurons
seem to be arranged into stacks or cortical columns that process information in parallel.
So, for example, in your visual cortex different areas of what you see might be getting processed in parallel
by different columns, or cortical columns, of neurons.
Now each one of these columns is in turn made of these mini columns of around 100 neurons per mini column
that are then organized into these larger hyper columns and within your cortex there are about 100
million of these mini columns. So again, they just add up quickly.
Now coincidentally this is a similar architecture to how the video card, the 3D video card in your computer
works,
it has a bunch of very simple, very small processing units that are responsible for computing how little
groups of pixels on your screen are computed at the end of the day
and it just so happens that that's a very useful architecture for mimicking how your brain works.
So it's sort of a happy accident that the research that's happened to make videogames behave really
quickly, or play Call of Duty or whatever it is that you like to play,
lent itself to the same technology that made Artificial Intelligence possible on a grand scale and at
low cost, the same video cards you're using to play your video games can also be used to perform Deep
Learning and create artificial neural networks.
Think about how better would be if we actually made chips that were purpose built specifically for
simulating artificial neural networks,
well, turns out some people are designing chips like that right now,
by the time you watch this they might even be a reality,
I think Google's working on one as we speak.
So at one point someone said "hey! The way we think neurons work is pretty simple,
it actually wouldn't be too hard to actually replicate that ourselves and maybe try to build our own
brain,"
and then this idea goes all the way back to 1943.
People proposed a very simple architecture where if you have an artificial neuron, maybe you can set
up an architecture where that artificial neuron fires
if more than a certain number of its input connections are active, and when they thought about this more
deeply in a computer science context, people realize you can actually create logical expressions,
boolean expressions
by doing this.
So depending on the number of connections coming from each input neuron and whether each connection
activates or suppresses a neuron, you can actually do both that works that way in nature as well,
you can do different logical operations.
So this particular diagram is implementing an OR operation,
so imagine that our threshold for our neuron was that if you have two or more inputs active, you will
in turn fire off a signal.
In this set up here we have two connections to neuron A and two connections coming in from neuron B,
if either of those neurons produce an input signal, that will actually cause neuron C to fire,
so you can see we have created an OR relationship here where if either a neuron a or neuron B feeds
neuron C to input signals, that will cause neuron C to fire and produce a true output,
so we've implemented here the boolean operation C = A OR B just using the same wiring that happens
within your own brain, and I will go into the details, but it's also possible to implement AND and NOT
in similar means. Then we start to build upon this idea, we create something called the Linear Threshold
Unit, or LTU for short,
in 1957. This just built on things by assigning weights to those inputs, so instead of just simple ON
and OFF switches, we now have the ability, the concept of having weights on each of those inputs as well
that you can tune further,
and again this is working more toward our understanding of the biology, different connections between
different neurons may have different strengths and we can model those strengths in terms of these weights
on each input coming into our artificial neuron.
We're also going to have the output be given by a step function.
So this is similar in spirit to how we were using it before,
but instead of saying we're going to fire if a certain number of inputs are active, well, there's no
concept anymore of active or not active,
there's weights coming in and those weights could be positive or negative.
So we'll see if that some of those weights is greater than zero,
we'll go ahead and fire on ON or on OFF, if it's less than zero,
we won't do anything.
So just a slight adaptation to the concept of an artificial neuron here where we're introducing weights
instead of just simple binary ON and OFF switches.
So let's build upon that even further and we'll create something called the perceptron, and a perceptron
is just a layer of multiple linear threshold units.
Now we're starting to get into things that can actually learn,
OK? So by reinforcing weights between these LTU's that produced the behavior we want, we can create
a system that learns over time how to produce the desired output,
and again, this also is working more toward our growing understanding of how the brain works. Within the
field of neuroscience there's a saying that goes "cells that fire together wire together,"
and that's kind of speaking to the learning mechanism going on in our artificial perceptron here where
we have weights that are leading to the desired result that we want, you know, we can think of those weights
again as strengths of connections between neurons,
we can reinforce those weights over time and reward the connections that produce the behavior that we
want.
OK? So you see here we have our inputs coming into weights just like we did in LTU's before, but now
we have multiple LTU's gang together in a layer and each one of those inputs gets wired to each individual
neuron in that layer,
OK?
And we then apply step function to each one, maybe this will apply to, you know, classification,
maybe this would be a perceptron that tries to classify an image into one of three things or something
like that.
Another thing we introduce here is something called the Bias Neuron off there on the right and that
says something to make the mathematics work out, sometimes we need to add in a little fixed constant
value that might be something else you cannot test for as well.
So this is a perceptron,
we've taken our artificial network, move that to a linear threshold unit and now we've put multiple
linear threshold units together in a layer to create a perceptron
and already we have a system that can actually learn,
you know, you can actually try to optimize these weights and you can see there's a lot of them at this point.
If you have every one of those inputs going to every single LTU in your layer, they add up fast and
that's where the complexity of deep learning comes from.
Let's take that one step further and we'll have a multi-layer perceptron.
So now instead of a single layer perceptrons of LTU's,
we're going to have more than one and we actually have now a hidden layer in the middle there, so you
can see that our inputs are going into a layer at the bottom, the output are layered at the top
and in-between we have this hidden layer of additional LTU's, linear threshold units, that can perform
what we call Deep Learning.
So here we have already what we would call today a Deep Neural Network.
Now there are challenges of training these things because they are more complex, but we'll talk about
that later on,
it can be done
and again, the thing to really appreciate here is just how many connections there are,
so even though we only have a handful of artificial neurons here you can see there's a lot of connections
between them and there's a lot of opportunity for optimizing the weights between each connection.
OK? So that's how a multi-layer perceptron works.
You can just see that again we have emergent behavior here, an individual linear threshold unit is a
pretty simple concept,
but when you put them together in these layers and you have multiple layers all wired together you can
get very complex behavior because there's a lot of different possibilities for all the weights between
all those different connections.
Finally we'll talk about a modern Deep Neural Network
and really this is all there is to it,
you know, the rest of this course we're just going to be talking about ways of implementing something
like this,
OK?
So all we've done here is we've replaced that step function with something better,
we'll talk about alternative activation functions,
this one is illustrating something called ReLU, that we'll talk about later.
The key point there though is that a step function has a lot of nasty mathematical properties, especially
when you're trying to figure out their slopes and their derivatives, so turns out that other shapes work
out better and allow you to converge more quickly when you're trying to train a neural network. 
We'll also apply softmax to the output, which we talked about in the previous lecture,
that's just a way of converting the final outputs of our neural network or deep neural network into
probabilities from whence we can just choose the classification with the highest probability.
And we will also train this neural network using gradient descent or some variation thereof,
there are several of them to choose from, we'll talk about that in more detail as well.
Maybe that will use autodiff, which we also talked about earlier, to actually make that training more
efficient.
So that's pretty much it!
You know, in the past five minutes or so that we've been talking I've given you the entire history pretty
much of a deep neural networks and Deep Learning and those are the main concepts, it's not that complicated,
right!?
That's really the beauty of it.
It's emergent behavior, you have very simple building blocks,
but when you put these building blocks together in interesting ways, very complex and frankly mysterious
things can happen.
So I get pretty psyched about this stuff.
Let's dive into more details on how it actually works
up next.
