Hey there i'm rohan kokkula and welcome to my youtube channel.
In this video i'll be explaining 10 activation functions in 10 minutes
So what are activation functions?
In a neural network activation function helps neuron to get activated or not
So if you look at the sample model over here inputs are multiplied with the weights
And their summation is added to the activation function along with the bias
so this is just like x1*w1 + x2*w2 .. + xm*wm + bias
The calculated answer is fed as input to the activation function and depending on the threshold of the function
It decides whether to fire it or not
Okay, that's cool. But why do we use an activation function?
So if we don't have the activation function the weights and bias would simply do a linear transformation
And that neural network becomes a linear regression model
Derivatives of activation functions are very important in back propagation when updating the curve of loss function
Activation functions help the gradient descent algorithm to converge to their local minima
So let's start with very first activation function sigmoid or logistic function
so the formula is 1/[1+e^(-βx)]
Here β is responsible for the smoothness of the curve
So if you keep on increasing the value of beta
There will be a point where sigmoid function will be a replica of stepwise function for a particular neuron
X = Σ inputs*weights + bias
And this value of x goes as an input to the sigmoid function
So basically sigmoid function normalizes this input into 0 to 1 range
so whatever your input value be the output will be always in the range of 0 and 1
So there are few advantages and disadvantages such as differentiable
smooth gradient & normalizing
You can check out this web app from the description
So, let's quickly jump to our next activation function hyperbolic tangent
If you'll compare this with the sigmoid function, you'll observe that sigmoid function is not zero centric
Which is actually a disadvantage for sigmoid function. Let's get back to Tanh
This overcomes the disadvantage
of sigmoid function being non-zero centric
Just like beta in sigmoid function. θ is responsible for the curvature
so unlike sigmoid function
Range of hyperbolic tangent function is between -1 to plus 1
If you are working with the binary classification problem, this is the best fit
Use this in the hidden layer and sigmoid function in the output layer
Major advantages are zero centered and being smooth gradient
the disadvantages are vanishing gradient slow convergence and slower computation due to
the exponential calculations
Third activation function is rectified linear unit
Let me put in simple terms
if value of x >0 then output will be x and if it's < 0 then output will be 0
so no matter what your value is beyond 0 if your value is x then output will be x
But if your value is negative output will be zero
Which means the neuron is dead for all the negative values which contributes to the disadvantage
The Dying ReLU problem
Also, it is not a zero centric function
Advantages are function and its derivative both are monotonic. It is computationally efficient
And its non-linear to avoid dying of neuron for negative values
Leaky ReLU comes to the rescue
The formula stays same for the positive values that is linear and for negative values it is
0.01*x
Which is 1% of the negative value
Which helps the neuron not to die in case of negative values
since 0.01 is a constant multiplier of x
What if we'll convert this into a learnable parameter which comes from back propagation?
Oh, yes. We already have it in parametric ReLU function
If you'll replace this alpha with 0.01
It will act as a leaky ReLU function
So by doing this we give more power to the neuron in terms of learnability for back propagation
What if for negative values we need exponential functionality instead of linear multiplicity?
Well, that's our next activation function exponential linear unit
Note that for positive values of input ELU is same as ReLU
Here alpha is a constant multiplier which acts as a learnable parameter
If we'll talk about advantages
There is no Dead ReLU problem and also this is zero centered
Since there is a presence of exponent calculations you need to pay for it
On moving further we have swish function
This is simply x * sigmoid(x)
This function was released by researchers at google brain in 2017
If the value of beta is way too high then it becomes the replica of relu function
Let's move to next activation function softmax
Based on input values to a neuron softmax function converts them into their probabilistic occurrence
So if you have 5 inputs, then there will be 5 outputs based on each other's probability
So if suppose some input values are large then their probabilities will also be on the larger side
Let's try it out
if there's a single element then obviously the probability will be 1
and if there are more than one element, for example 6 and 4
The probabilities will be higher for 6 and lower for 4
One thing to notice that the summation of the outputs is always equals to 1
Which is a major advantage of using softmax functions
Here comes the next activation function SoftPlus this is just a smoother version of relu function
And as I previously said if there is an exponent you need to pay for higher computation
Suppose input is minus 0.001
So in case of relu due to negative value it deactivates the neuron
Let's go to our last function max out
Max out activation is generalization of the ReLU and leaky ReLU function
It basically outputs the maximum of calculated inputs
Let's try with our examples
Let's select input value x1 as 0 and its corresponding weight as 2
so 0 * 2 = 0,
 - 2 and output of I is -2
Suppose we have another input neuron with value as three and its corresponding weight as 3
So 3 * 3 - 2 = 7
So max(-2,7)=7
Let's change value of w1 and bias to 0
I am doing this to create a replica of ReLU function
So now we have max(0,x)
And if we'll further change the values of x2 to some negative value
The output of max out function will be zero
This is how max out function can act like ReLU function
Thanks for watching
Are you on linkedin let's connect
Like this video if it helped to build your fundamentals in neural networks
Share this to your friends who is a beginner to neural network or who needs a recap to activation functions
And finally subscribe to my channel and press the bell icon so that you will be notified for my further videos
