Hi there and thanks for joining me again
today. In this video I want to take quite
a relaxed look at the topic of neural
networks and work through a bit of an
online course as a way to introduce
people to the subject. Neural networks is
not something that I am an expert in,
it's something that I'm just learning
for my own interest and so for people
who are wanting to do the same, hopefully
you can sit back and watch this video
and enjoy just learning a few new things
along the way. This study series is part
of a partnership I have with Brilliant.org
so thanks to them for letting me
show you through their course. There will
be a link in the description if you
would like to sign up for your own
account so that you can follow through
the course with me.
So I'm here on introduction to neural
networks and I'm going to go down to the
quiz I want to work through today and
that is this one here called neural
networks. So far in the course they have
gone through a bit of an introduction
into artificial intelligence and the
challenge of trying to create a machine
that can essentially think or play games
in a way that a human player might do as
well. It's not such an easy job because
as it says here, "the human brain
automatically performs tasks that don't
break easily into algorithmic rules for
a computer to follow." So to create an
intelligence that resembles that of a
human what we're trying to do in neural
networks is kind of model the way that
the brain is structured and that is
with neurons. So it says here, "the brain
contains more than 80 billion cells
called neurons, which influence one
another via small pulses of electricity.
A particular neuron doesn't talk
directly with every other neuron; they are
connected into structures that perform
specialized functions. These structures
are biological neural networks." We've got
a little picture of a neuron here. So for
us, "every time we learn a new game, a
dance or a mathematical skill, neurons
strengthen their lines of communication
with some neurons and prune their
connections with others. The structure of
neural networks evolves as you gain new
abilities." "Artificial neurons, the basic
units of an artificial neural network,
behave more or less like biological
neurons. An artificial neuron responds to
information in signals it receives, and
sends out its own signal to other
neurons." We have one artificial neuron
down here with four inputs and each of
these inputs can either be on or off.
When the neuron is completely colored
in black like it is now that means it
is fully activated by the inputs, and if
it was fully white that means its
activity is inhibited. And if it's some
mixture in between maybe something like
this halfway then it is just partially
activated. Some of these inputs are
increasing the activation and well those
are the ones in green, and the one in
pink is decreasing its activation; see if
I turn them all on and then turn the
pink one on it actually goes down. So,
inputs can either be positive or
negative in that sense. In this next example
we have a snack dispenser with cookies
or a celery and if either the cookies or
the celery or both are turned on in the
input, the neuron will be fully activated.
This little white triangle here is the
bias of the neuron so we could increase
that and now when we turn on either of
these inputs they don't quite reach the
bias, so the neuron isn't activated. The
bias needs to be in the middle here to
be activated by either of these. The
point is that with these little
artificial neurons, they follow rules
mechanically as we go from inputs to
outputs and our artificial neurons can
have a different response to the same
inputs if we make slight changes to its
internal configuration. The real power of
the neurons comes from wiring them
together into a network. Our neural
network is kind of a waterfall of
individual neurons. We feed the output of
one neuron into the input of another and
it cascades like that along the network.
It says here that, "By stacking neurons
in layers we create the potential to
make increasingly complex predictions...
as long as we can find the right
connections between the layers."
And that's what it seems to come down to
is forming these correct connections
with the correct weights to know what
inputs we want to cascade throughout the
network. In this little example here, we
have an artificial network that can
recognize the numbers 0 through 9. So we
can draw something down here and
hopefully this neuron up here
corresponding to 2 will be quite filled
quite activated and that will indicate
that it knows that it's a 2. You can see
that it's never completely binary so
there's, as we go on, elements of
activation in many of these neurons in
the top layer. But it seems to be doing a
pretty good job at recognizing which
number I'm drawing. Maybe my 4 doesn't
look quite like theirs maybe I need an
open-top 4 here. There we go that worked better.
So as I'm drawing on the input grid the
input moves into two layers of 25
neurons which then result in our guess
in the top layer up there. We'll come
back to this particular network and
actually have a look at the real code
that is behind it a little bit later in
the video but we'll continue going with
the course for now just keep in mind
that we'll come back to this.
We've jumped into a quiz called 'decision
boundaries' and here they are modeling a
binary neuron which just means that it
can either be completely activated or
not activated on or off: there's no
in-between, and they're building this
model of the binary neuron just by
thinking about a little laser and an LED.
So the LED will turn on when the input
intensity from the laser is enough to
reach the bias setting on the LED. So
anything above the setting, the LED will
turn on, anything below and it's off. We
could move that bias to be a little bit
lower and yeah we'd be able to turn it
on with less intensity of light. That
enables us to answer how our intensity
and the bias related when the LED is
activated, well it's when the intensity
is greater than or equal to the bias
level. We could also have an LED with two
lasers. Remember that this is the input
to our neuron, and if it's in a neural
network it might be the output from the
previous row of neurons. In this case to
activate the LED we would need some
intensity from either one or both lasers.
All that matters is that the intensity
from laser 1 plus the intensity from
laser 2 is bigger than or equal to the
bias level. In this example in the quiz
we're imagining that we're playing a
game against a robot opponent, and at
some point the robot decides that it's
had enough and doesn't want to play
anymore. And we're going to plot some
data to try and see if we can predict
when the robot will want to keep playing
and when it will want to quit. So on this
plot here we have some data points where
the input 1 along the x axis is the
number of games that
we have won, and input 2 on the y-axis is
the number of games that we have lost. If
the data point is colored in green it
means that the robot wanted to quit and
if it's colored in pink that means it
wanted to keep playing. Then we can apply
our little model of the neuron where we
were firing lasers at an LED, and in this
case input 1 is like our first laser and
input 2 is like our second laser. So
we're getting a total intensity of light
depending on our inputs, and we're also
able to control our bias. And in this
case, because we're plotting it on this
graph, moving the bias will actually move
this line through our plot. You see this
bias line is in fact the place where
intensity 1 plus intensity 2 is equal to
the bias. If we set our bias to 7 then we
are able to separate the inputs into
what we want. We want these two green
inputs to be able to activate the LED,
that's when the robot wants to quit, and
we want these pink ones to not activate
the LED - so that looks good. The fact that
the line goes through this green point
up here is actually fine since anything
equal to the bias or above will be able
to activate the LED, as opposed to if we
put it at number 6: this one would have
been activated and we don't want it to
be. So I've set the bias of this led to 7
like we worked out in the previous graph
and we can try the little problem that
they give us. So they say we've won three
games and the opponent has won four and
indeed the LED is activated here, so that
would mean that it is predicted that our
friend will want to quit. We could check
it graphically it just means that our
combination of inputs is putting us up
here in the green region that turns the
LED on. You see we have an input of 3 and
an input of 4, we're actually on that
bias line there. On the next page it
tells us that
line in the input space is called a
'decision boundary' and placing this
boundary optimally is how a neuron or a
neural network makes predictions about data.
It is also possible to have a
decision boundary with a negative slope.
Jumping them to the next quiz which is
called 'classification' we then have
another example but this time using
continuous data. The particular example
that they've given us is a bunch of
marbles, and some of the marbles are
defective. So ordinarily the marbles
should be balanced, but some of them are
defective because they have impurities
inside them. The person in the example
has measured the diameter and the mass
of every marble and plotted that on a
graph and we're going to try and
separate, based on diameter and mass,
which marbles might be defective and to
recognize that our data is now
continuous we have our little model of
the neuron back and we can adjust these
sliders to try and push the activation
over the bias - and there we go. Our inputs
range from zero to nearly ten but it
says here on the next page a neuron
doesn't understand grams and centimeters
or any other units for that matter. So to
map between our raw data, which is mass
and diameter, and the neuron's input space,
which is the intensity one and intensity
two, we need to introduce two unknown
conversion factors. W_d multiplies by the
diameter to get that total intensity, and
W_m would be the multiplier where you times it
by the mass to turn it into input.
We still need our total intensity to be
greater than or equal to the bias for
the neuron to activate, so our new
activation equation would be W_m times m
plus W_d times d being bigger than or
equal to the bias. That inequality that
we just came up with imposes a decision
boundary on the scatter plot of data.
So we can adjust the value of W_m and W_d
as well as our bias to adjust this line.
accordingly. It asks what aspect of the
decision boundary do W_m and W_d control?
Well just from fiddling around
that they seem to be influencing the
slope of the boundary. This gives us more
flexiblity than just being able to
adjust the bias alone. See, only adjusting
the bias we'd never really be able to
get a good fit here. Now, what's important
here is that these conversion factors
these W_m and W_d these are actually
called the input weights or often just
the 'weights'. One weight multiplies each
input to the neuron. Weights can be set
to whatever values produce the best
classification of your data. Throughout
this little quiz here, we've been
adjusting the neurons weights and bias
to place a decision boundary on the
scatterplot. It says that "Adjusting the
weights and bias of a neuron to classify
labeled data is known as training."
So really what I hope we've gained here is
a little bit of intuition about the idea
of training a neuron, or even training
an entire network of neurons. It would
seem that this idea of adjusting the
weights is really crucial in making our
network do what we want it to and become
good at classifying data. And with that I
think we will jump back into our first
quiz that we looked at and go and look
at some of the actual code that would
run this thing.
So I'm back here at our digit identifier
but this time it is one where all the
connections between layers are
randomized. So it's one where the weights
of our inputs are not correct as they
were before. So let's see if I try to
draw a 1 in here, well it's not very
good anyway, but you can see it has no
idea. What about a 2, or a 3? It's really
giving us no clear guess here. So we can
see how important getting the
connections right is. Just as a high
level look at how it learns before we
take a look at the code it says that "during
training an artificial neural network is
fed examples of digits. For each input
the neural network makes its best guess
about what digit is presented and its
guess is compared to the correct answer.
If it's guess is correct then nothing
happens but if it's incorrect then the
computational machinery within the
neural network is updated so it's more
likely to be correct the next time."
Now the code that I'm going to show you
isn't actually part of the Brilliant
course I got a copy of it by contacting
the course's author and asking him if I
could take a look but the intention of
this course is to just give us this more
of an intuition at the overall ideas
that are happening in neural networks,
not for us to understand the code or to
learn the Python, which is the coding
language behind it. But I thought it
would be interesting for me to show you
anyway. So that's just a disclaimer to
not expect to find the code if you're
working through the course yourself, but
this is the actual code which is running
our digit identifier. Some of you might
know more coding than others and if you
know some already you might be able to
read into this a bit more, but I'll just
go through it quite briefly and give you
a real example of what these neural
networks actually look like if you were
to code one for yourself. At the start we
import a bunch of things including
TensorFlow, which is a library that I
believe was created by
Google for doing a lot of this
artificial intelligence stuff.
And TensorFlow will kind of do a lot of the
hard work for you. The code then moves on
to reading in some data. We have some
training data and some test data. And
actually the data set is called MNIST,
which is this big collection of
hand-drawn digits. I believe it was a
bunch of digits drawn by high school
students and other people in the U.S.
We move down to here where we're making
lists for our training and test data and
this processing involves actually
calling this function up here, which is
to shift the image. And that is going to
shift around the images in our dataset
by some random number of empty rows and
columns. And that will ensure that the
test and training data that we're using
is more distributed. Not every digit will
be perfectly in the center and yeah
they'll be shifted around a bit more,
which will hopefully lead to a better
performance for our neural network. The
next important step is organizing the
labels for the training and test data.
Now, our labels will actually be a vector
of length 11 and that would be our 10
digits and our blank and this vector will
have zeros in every spot except for in
the numbers that it thinks it is so if
it was confident that it was a 2 then
the vector would be 0 everywhere except
there would be a 1 in the spot
corresponding to the 2. It would be zeros
and ones for the training data because
we know the ground truth we're given
that with the data set but for when it
is actually trying to work it out for
itself it wouldn't be as binary. So like
we have here, we have maybe 0.9 in the
spot corresponding to 2 and we have a
little bit in the spot corresponding to
3 maybe 0.1. Try a little squiggle
here and yeah they've got maybe 80% in the 7
and maybe 20% and the 2 a little bit in
the 3 even. And here's the part where
we actually make the model. We are using
TensorFlow
and some part of the TensorFlow library
called Keras, which must be the part
that deals with the neural networks, and
we're setting up a sequential network
probably meaning that we're feeding the
output from one layer of neurons to be
the input for the next layer and it sort
of cascades like that. We're setting up
each layer to be dense, so each neuron
will be connected to every other neuron.
We have a row of 25 a row of
25 and then a final row of 11.
Our activation function is a
sigmoid shape so our neurons are not
binary they can not just have 0 or 1
inside them they can have you know
anything in between that we then compile
the model down here we're using
something called 'Adam' it's a kind of
gradient descent which is a way of
learning. A way of trying to optimize a
function by looking for the bottom by
descending the gradient of the function.
And we define our loss which I believe
is quite important because the loss is
really the thing that you are trying to
optimize. You want your loss to be as
minimal as possible
because that would mean that you're
getting closer to getting the answers
correct. Our loss is the distance between
our prediction and the ground truth and
with the training data we know the
ground truth that's the number that it
actually is we're told that in the dataset so we can compare that to our
prediction and get this scalar value, our
loss, and we want that to go down, we want
that to be as small as possible. Then you
can see here what it looks like when we
start to train the network. We have a few
iterations I guess showing up here. And
then all this stuff at the bottom is
just to do with extracting our weights
and biases so I think they can plot them
on this nice little visualizer here.
And there you go that was the code
behind making something like this. I hope
that this video jumping from a little
bit of the intuition of individual
neurons to kind of how they fit together
in the actual code example has been
interesting for you. You can jump along
to the link in the description if you
would like to view this Brilliant course
for yourself. So thank you for watching
and I'll see you next time :)
