What’s going on everybody?
Today I’d like to talk to you guys about
neural networks.
There’s a lot of information out there about
how to code different machine learning algorithms..
and most of them in Python and that's freakin great…
but just importing Logistic Regression or
SVM… and then fitting them to the data doesn’t
really help to understand what’s going on
behind the scenes…
Before starting to code a neural network , I
think it's more important to understand what the neural part of
it actually means ..
And yes the Logistic Regression I was talking about earlier, is a neural network,
more specifically a single layer neural network..
So today I’ll give a brief explanation on
what exactly a neuron is, and then explain
how single layer neural networks actually  work.
If we understand that, then deep learning part of it,
will be a piece of cake.
Ok, so to better understand the idea behind
neurons, I’m gonna start with a classification
example.
A binary classification example where we have
two classes.
We have a positive and a negative class.
For example let’s consider an email classification
example, where the algorithm decides whether
the email is spam or not.
In order to classify an email, we need to
first define a decision function that takes
a linear combination of the input values (x)
and their corresponding weights (w).
This will be a linear combination in the form
of y equals the sums of the inputs multiplied
by the weights.
The classification will be made given a defined
threshold, let’s say 0.5.
If the net input is greater than 0.5 then the
prediction will be positive, otherwise if the net
input is lower than .5, the prediction will be negative.
And this is super simple right?
And the learning rule is simple as well.
We initialise the weights to be close to 0,
or very small numbers.
And then for each training session we compute the prediction and then we update the weights.
But how do we update the weights?
The initial update will be
in the form of Delta W – equals
- the input x multiplied by the learning rate
and then multiplied by the difference between the
true label and the predicted label.
And the learning rate will be a value between
0 and 1, depending on how big we want the
update to be.
And this is a beautiful rule, as if the prediction
is correct, the weights remain unchanged,
otherwise the weights are moved towards the
direction of the positive or negative true
label.
This.
Is the first concept of a neuron, and it was
proposed in 1957.
It’s called a perceptron, and it’s the
first example of a classification algorithm.
In this basic diagram of this algorithm you
can see how the input function is calculated
from the inputs - multiplied by the weights.
And then that input function is given to the
threshold function that classifies the data
in either +1 or -1, the binary classification that we were talking about.
And from that function, the delta is calculated
and then the weights are updated for the next run.
An improvement to this algorithm came just
three years after - in 1960, and it was named
the Adaptive Linear Neuron.
With this algorithm, the weights are updated
with the help of an activation function rather
than a unit step function that returned either
1 or -1.
Here, the activation function is the actual
function of the net input.
The type of activation function is hinted
in the name, Adaptive Linear Neuron.
So it’s a linear activation function.
Here the net input is passed to the activation
function, so the true labels are compared
to the output of the activation function and
the weights are updated from this rather than
from the predicted labels given by the threshold
function.
As was the case of the Perceptron.
For supervised machine learning algorithms
we need a function that we can optimize during
the training process.
This function is usually a cost function.
And a great algorithm to minimise it,
is an optimization algorithm called gradient
descent.
I’m not gonna go into it now, as it’s
pretty complex.
But the main idea is that gradient descent
tries to find the local minimum of a function,
by taking small steps… and these steps are being defined by the learning
rate.
So it goes up and down this convex line, until
it reaches it’s minimum.
It starts at a random point and then takes
steps in the opposite direction of the gradient.
Another important thing is that the weight
updates are made based on all the samples
in the training set.
And this is what we call an epoch.
So an epoch is when we use the whole training
set to update the weights.
As I was saying earlier, in the beginning
we set the weights close to 0 and then the
learning rate decides how big the next weight
update will be.
So these are the building blocks of a learning
algorithm.
With these fundamentals, I think you’ll
have an easier time understanding other classifiers like the logistic regression
and then move to more complex deep learning
algorithms.
Alright, don’t forget to click the subscribe
button and make sure you check out my other
machine learning basics videos.
I put a link to the Machine Learning Fundamentals
playlist down in the description.
I’ll see you in the next video.
