Good day everyone.
This is Dr. Soper here, and today I’ll be
discussing the foundations of artificial neural
networks and deep Q-learning.
If you’re unfamiliar with Q-learning, then
I would recommend that you watch the earlier
video in this series entitled “Foundations
of Q-Learning” before watching this video.
Before we begin discussing the foundations
of artificial neural networks and deep Q-learning,
let’s briefly review what you’ll learn
in this lesson.
By the time you have finished this video,
you will know:
What artificial neurons are;
What activation functions are;
How neural networks work;
How neural networks learn;
What deep Q-learning is; and
How deep Q-learning works.
Once we understand all of these concepts,
we’ll be fully equipped to start building
some AI models that rely on artificial neural
networks and deep Q-learning.
Let’s get started!
Before we can understand artificial neural
networks and how they work, we’ll first
need to learn about artificial neurons and
activation functions.
So, what is an artificial neuron?
Well, artificial neurons are the elementary
building blocks from which all artificial
neural networks are built.
They were first proposed by Warren McCulloch
and Walter Pitts in 1943 as a mathematical
model of biological neurons, which form the
basis of all animal brains, including the
human brain.
It was these biological neurons that served
as the primary inspiration for the artificial
neuron model.
As shown in this diagram, the purpose of an
artificial neuron is simply to transform one
or more input values into an output value.
Each input value is multiplied by a weight,
the effect of which is to adjust the strength
of the input.
For example, if the input value is 0.8, and
the value of the weight is 0.5, the result
after multiplying these two values together
would be 0.4.
In this scenario, the weight would have served
to reduce the strength of the input value.
If the value of the weight had been greater
than 1, then the strength of the input value
would have been increased.
After the weighted input values have been
calculated, they are run through an activation
function in order to produce the artificial
neuron’s output value.
Finally, note that the values of the weights
can be adjusted during training in order to
minimize error.
We will return to this idea in a few minutes.
Next, let’s learn about activation functions.
In brief, an activation function is simply
a mathematical function that an artificial
neuron uses in order to transform its weighted
input values into an output value.
As shown in this equation, the activation
function takes just one input value, which
is calculated as the linear sum of each input
value multiplied by that input value’s associated
weight.
Put differently, this equation tells us that
we just need to multiply each input value
by its associated weight, and then add all
of the results together in order to get the
single value that will be passed into the
artificial neuron’s activation function.
Let’s learn a bit more about different types
of activation functions.
It’s important to note that many different
activation functions can be used in an artificial
neuron.
Each activation function behaves differently
with respect to the way that it transforms
its input value into an output value.
To better understand this idea, let’s consider
these four activation functions, which are
among the most common types of activation
functions used with artificial neurons.
In this first example, we see what is called
a threshold activation function.
The threshold activation function returns
one of just two possible values: 0 or 1.
When the value passed into the threshold function
is greater than or equal to zero, the function
returns a 1.
Otherwise, if the value passed into the function
is less than zero, then the function returns
a zero.
The output values for artificial neurons that
use a threshold activation function will therefore
always be 0 or 1.
This next activation function is called a
sigmoid activation function.
The output of the sigmoid activation function
will always be a number between 0 and 1.
Positive input values will yield output values
that get closer and closer to 1.0 as the size
of the input values increase.
Conversely, negative input values will yield
output values that get closer and closer to
0.0 as the size of the input values decrease.
Again, the output of the sigmoid activation
function will always be a number between 0
and 1.
The third activation function is called a
hyperbolic tangent activation function.
The hyperbolic tangent function is very similar
to the sigmoid activation function, except
that it will always yield an output value
between negative 1.0 and positive 1.0.
Thus, positive input values will yield output
values that get closer and closer to positive
1.0 as the size of the input values increase,
while negative input values will yield output
values that get closer and closer to negative
1.0 as the size of the input values decrease.
This final activation function has become
very popular in the past few years, and is
known as the rectified linear unit (or ReLU)
activation function.
For input values that are less than or equal
to zero, the ReLU function will simply return
a zero.
For input values that are greater than zero,
the ReLU activation function simply returns
the input value itself.
Thus, output values from the ReLU activation
function will always be between zero and positive
infinity.
Before proceeding to the next topic, let’s
revisit our artificial neuron diagram so that
we can be sure that we understand the big
picture.
Remember, the basic process is that the sum
of the weighted input values is passed into
an activation function, which yields an output
value.
The nature of the output value will depend
on the specific activation function that is
being used.
For example, if we’re using the sigmoid
activation function, then we know that the
output value will always be between zero and
one.
Finally, note the important role of the weights.
When we were learning about activation functions,
we saw that the output value always depends
on the input value.
Since the values of the weights directly affect
the input value, the weights also therefore
affect the output value.
This idea will be important when we learn
about how the values of the weights are adjusted.
Now that we’re familiar with artificial
neurons and activation functions, let’s
learn about neural networks.
Put simply, an artificial neural network is
an interconnected collection of artificial
neurons that are arranged into layers.
In the example depicted in this diagram, the
neural network contains three layers – an
input layer, a hidden layer, and an output
layer.
Each of the circles or nodes in the figure
represents an artificial neuron, while each
of the arrows shows how information flows
through the network from one node to the next.
Remember that there is a weight value associated
with each of these paths.
As you can see on the diagram, the outputs
from one layer become the inputs into the
next layer.
Keeping in mind what we’ve learned about
artificial neurons, this means that the output
of an artificial neuron can become the inputs
into other artificial neurons, with each inter-neuron
connection having its own weight value.
Now let’s consider these layers in a bit
more detail.
To begin, the input layer consists simply
of the values for each input (or predictor)
variable.
These variables are often called features.
The artificial neurons that comprise the input
layer typically do not have an activation
function.
This simply means that the values for each
input variable are passed directly to the
artificial neurons in the next layer without
being transformed by an activation function.
Next, note that an artificial neural network
can contain what are called hidden layers.
Such layers are called “hidden” because
they are not visible outside of the model.
Put differently, when viewed from the outside,
we know what the input values are, and we
can see what the output values are, but we
have no knowledge about how those inputs are
actually converted into the outputs.
This information is hidden inside the hidden
layers.
Finally, the output layer contains the neural
network’s predictions, which are often referred
to as labels.
And that’s it!
As you can see, neural networks are not actually
as difficult to understand as many people
seem to think.
We just take some input values, multiply them
by some weights, and then run the results
through an activation function.
This yields output values that are then multiplied
by more weights, and then run through the
activation function in the next layer.
The process is repeated until we reach the
last layer, which is the output layer.
Now, for those of you who are familiar with
linear regression, I would like to make a
very interesting point here.
Specifically, note that a neural network with
no hidden layers and just one node in the
output layer is conceptually identical to
a standard linear regression model!
Without any hidden layers, we make predictions
simply by multiplying each input value by
a weight, and then summing the results.
I hope you’ll agree that this connection
between neural networks and linear regression
is quite interesting!
Next, let’s talk about how a neural network
learns to transform input values into predictions.
The big picture view is that a neural network
learns by repeatedly considering a set of
input values and trying to predict their corresponding
output values as accurately as possible.
This means that we need to provide the neural
network with many examples of input values
with known output values in order for it to
learn.
Neural networks therefore rely on supervised
learning.
From a more detailed perspective, the way
that a neural network learns is by iteratively
adjusting the weights for all of the nodes
in the input layer and the hidden layers with
a view toward minimizing overall prediction
error.
Recalling the role that the weights play in
determining the input and output values for
each artificial neuron, we can now understand
how adjusting those weights can cause the
neural network’s predictions to change.
Each iteration of the learning process proceeds
as follows:
First, beginning at the input layer, one or
more sets of input values is passed through
the network, yielding one or more predictions
– this is called forward propagation.
Next, we measure the amount of error in the
predictions.
Working backward from the output layer toward
the input layer, the neural network next uses
the information about the prediction error
to adjust the weights in such a way as to
reduce that error – this is called back
propagation.
During the back propagation process, the weights
themselves are updated using a method called
stochastic gradient descent, or one of its
many variants.
Finally, after all of the weights have been
adjusted, the process is repeated using the
next set or sets of input values.
This training process continues until we either
run out of training examples, or until the
average amount of prediction error becomes
sufficiently small.
And that’s all there is to it!
You now understand how a neural network works,
and how neural networks learn to generate
accurate predictions from a set of input values.
Next, let’s learn about deep Q-learning.
For this part of the video, it will be useful
to recall what you learned earlier in our
“Foundations of Q-Learning” video about
states, actions, rewards, and Q-values.
So, what is deep Q-learning?
Well, deep Q-learning is simply a combination
of Q-learning and what is known as deep learning.
The term “deep learning” is being used
more and more frequently in the artificial
intelligence and cognitive computing community,
but it’s really not as complex or intimidating
as many people might believe.
After all, a deep learning model is simply
a neural network with multiple hidden layers!
Thus, if you were to take the simple neural
network that we discussed earlier and add
one or more additional hidden layers, you
would have created a deep learning model.
Recalling what you learned earlier in this
series about reinforcement learning in general,
and Q-learning in particular, in deep Q-learning,
the inputs into the deep neural network are
the states of the environment, and the outputs
are the set of Q-values for each action associated
with the input state.
In deep Q-learning, we’re thus using the
neural network to predict the Q-values, rather
than using the Bellman equation!
Just as in regular Q-learning, in deep Q-learning
the AI agent will usually take the action
that has the largest Q-value.
We say “usually” rather than “always”
because, just as with all reinforcement learning
methods, we need a mechanism that encourages
the AI agent to explore its environment!
I’ll explain exactly what this mechanism
is in a few moments, but first, let’s talk
briefly about the outputs of a deep Q-learning
neural network.
Just as with any neural network, in order
to train our deep Q-learning network, we’ll
need to provide the network with examples
of inputs and outputs so that it can learn.
In deep Q-learning, the targets of an input
state (that is, the outputs of the neural
network) are the Q-values associated with
each possible action for that input state.
The Q-values, as always, are our current predictions
of the sum of all future rewards that we would
receive if we were to take each action.
These Q-values are computed using part of
the temporal difference formula.
As shown in this equation, the target value
that the deep Q-learning network is trying
to predict for each possible action for the
current input state is equal to the sum of
the reward received from the action taken
in the previous state, and the largest Q-value
available for any action in the current state
multiplied by the discount factor.
As with regular Q-learning, the discount factor
(gamma) provides us with a way of discounting
future rewards.
Finally, let’s return to the issue of exploration.
In deep Q-learning, we encourage the agent
to explore by using what is called the softmax
function.
Remember the exploration-exploitation dilemma
that pervades all of reinforcement learning:
the AI agent needs to explore its options
in order to identify how each option might
influence the agent’s goal of maximizing
its total rewards.
This means that it cannot simply always choose
the action that appears to yield the greatest
reward.
Instead, the AI agent will occasionally need
to take actions that appear to be sub-optimal
at that moment in time in order to try to
discover new paths or approaches that may
yield greater overall rewards in the long
run.
When applied in the context of deep Q-learning,
then, the softmax function converts the set
of Q-values for a state into a set of probabilities
for each possible action that can be taken
in that state.
The softmax function thus yields a probability
distribution for our Q-values!
The action chosen by the AI agent is then
determined by taking a random draw from the
probability distribution that was returned
by the softmax function!
In this way, the AI agent is most likely to
take the action that that appears to yield
the greatest reward, but it will occasionally
take actions that currently appear to be sub-optimal
in order to try to discover new information
that may yield greater overall rewards in
the long run.
Now that we have a good understanding of the
principles of artificial neural networks and
deep Q-learning, we can begin to create some
sophisticated artificial intelligence models
that rely on these technologies.
In the next video in this series, we’ll
see how neural networks and deep Q-learning-based
AI can be used to create a self-driving car.
We will definitely be getting a lot of practical,
hands-on experience in creating an AI model
in Python in the next video, so I hope you’ll
join me as we continue our adventures in cognitive
computing and artificial intelligence!
Well my friends, thus ends our lesson on the
foundations of artificial neural networks
and deep Q-learning.
I hope that you learned something interesting
in this lesson, and until next time, have
a great day.
