Hello everyone.
Today, I'm here to cover the
most feared topic in machine learning
that is interpreting the AUC ROC curve.
Hopefully by the end of the video you will have a clear
idea about what a AUC ROC is. So let's get started.
AUC ROC curve is basically the plot
of (Sensitivity) versus (1 - Specificity)
What are these terms?
So, these are like complex terms for simple terms.
So Sensitivity is nothing else, but
your positive recall.
Again, for people who
are still confused what Recall is
Recall
is basically out of all positives samples.
How many samples was my classifier able to pick up
So sensitivity is also called as true positive rate.
And the formula for this is TP that
is true positive upon TP plus. Fn where fn
is false negative. I hope this is clear
Now, Specificity is basically negative recall.
That simply means out of all the negative
samples. How many samples was my classifier
able to pick up.That's it. That is what specificity is
1 minus specificity. So the formula for
this would be TN. Upon TN plus FP.
Then you calculate 1 minus specificity.
The value turns out to be 
FP upon TN + FP
This is simple maths.
I take this term
1 - TN upon TN + FP the denominator goes up
TN is subtracted and you're only left with
FP so you reach this term. So I hope the
values are very clear to you about what sensitivity
and specificity are.
So let's move ahead now.
In order to understand the concept.
Let's first take a very simple example of a classifier.
So I've built a classifier which has four
samples.
Say for example, the actual values
of those samples were 1, 0, 1, 0
I had some training data and I fitted a basic
logistic regression model and I got the output
in terms of probability scores as
0.8
0.6
0.4 & 0.2
now.
What AUC ROC curve does is it takes threshold values
from zero to one depending upon the number
of samples you have in your data set that
your have trained on and it basically gives you
the values at different thresholds
of True Positive rates that we've covered previously
and false positive rates.
Okay, so we start
by finding out the value of the classifier
at threshold zero.
So at y = 0
all my values
that is my probability values are greater
than 0 so my classifier if I build with a
threshold of 0
all the values would be classified
as 1 1 1 and 1 since all the values at threshold
0 here the values are greater than 0 so this
is what you get at threshold 0
Similarly at threshold point 0.2
My 0.8 probability score will be classified as 1
0.6 would be 1
0.4 would
be 1 and I've kept a threshold of greater
than equal to so 0.2 would be classified as 1
okay so far so good
now let's go to 0.4
At y equal to 0.4
Now the first sample would be classified as 1
second sample would be classified as 1 
third will be classified as 1
but the fourth one now would be classified
as zero because 0.2 is less than 0.4
so you will be classifying that sample as zero
similarly you will do something similar
for 0.6
so for 0.6, you will have 1,1,0,0.
For y of 0.8
you'll have the values as 1, 0, 0, 0
because just one value is greater than equal
to your threshold value
so you'll have 1, 0, 0, 0
and at one since none of the
values are even equal to one in terms of the
probability score, you will have value such
as 0, 0, 0 and 0
okay all the values
are calculated zero.
Now let's calculate the
True Positive rate & False Positive
Rate at threshold 0, 0.2, 0.4
0.6, 0.8 and 1.
Okay
so at zero.
at threshold zero.
My true positive rate.
Is equal to, so if you consider
this one and one is true positive one and one
is true positive, so I have to two True positive samples
again divided by two positives plus
I have. the denominator now have as false
negatives i do not have a zero here i have
zero, but there are no false negatives, so
this value turns out to be one.
Similarly the
false positive rate is basically the ratio
of false positives, so you have this value
is was false positive this value as false
positive so false positive is basically my
actual value was zero i classify that as one
that is a false positive, so i have two false positives
Divided by true negatives i don't
have any true negative here, there is no negative
value here, so true negative is zero plus false
positive is 2 so i have.
A value of FPR
at threshold zero to be one one, ok
similarly I do the same for threshold equal to 0.2
At threshold equal to 0.2 my true
positive rate by calculation comes out to
be 1 & false positive rate comes out to
be 1 which is basically the same calculation
because y point two is equal to y zero so
the calculation remains the same i hope this
is clear till now.
Next up i will calculate the
threshold at 0.4 ok so at point four
now the true positive rate if you keep calculating
the true positive rate would be one and false
positive rate would be 0.5.
The same calculation goes again.
At 0.6 we get
a TPR of 0.5
and a FPR of 0.5,
okay.
At 0.8
You got a TPR of 0.5
FPR of 0
And similarly at 1 you will get a TPR of 0
and FPR
Of 0 again, okay.
I hope all of this is making sense
now what we have to do is since
I have a set of points. I just have to plot
these points on a graph.
Where in my x axis
would be my false positive rate.
My y axis would be true positive rate
True positive rate
and false positive rate cannot exceed 1, so
that is my maximum limit
and.
This is the
value starts from 0, okay, 
so let's plot a bigger graph to make things clear.
Now I plot True positive rate here
I plot false positive rate here, okay
this value ranges from 0 to 1
this value ranges from 0 to 1
if you notice the area of this square.
is 1
1 into 1 is 1,
so my AUC ROC or the area
under the curve when you're fitting this of
curve cannot exceed more than 1 okay,
so that is the underlying assumption.
Now let's start
plotting, okay.
At threshold zero I had a TPR & FPR of 1 & 1.
so this is one point.
At threshold 0.4, I had a TPR of 1 & FPR of  0.5
So this value is somewhere here.
At threshold 0.6, I had a TPR & FPR of 0.5 & 0.5
so this value would somewhere
be here so this is at this is at threshold.Zero
this is at threshold 0.4.
This is at threshold 0.6.
Then at threshold
0.8, I have my TPR as 0.5 & FPR as 0, 
so this value is here.
& at TPR 1 or sorry at threshold 1 my TPR
and FPR are zero and zero
so this is at threshold one
this is at threshold 0.8, okay
drawing a straight line.
Which is like the
bare minimum line on the AUC ROC curve.
The AUC values can never go below 0.5
it is always above a straight line that
is considered here now if I have to draw a
curve to interpret the values of area under
the curve.I join the points like this.
Okay now. All this area.
Is the area under the curve.
okay?
This value if you see is
a right angle triangle inside a square, so
this area is 0.5.
This area sums up
to half the square of this area, which would
again result into 0.25.
So the total area
under the curve for this classifier is 0.75.
So whenever your evaluating classifiers,
it's a good practice to check their AUC scores.
Now you don't have a lot of control when you've
built a classifier you cannot modify the area
under the curve, but what you can do is you
can always modify what your classifier is
good at
If you want to introduce some false positives
this would be a good threshold
but if you're very particular that you want
more true positives and no false positives to
enter in your system this ideally is the best
threshold at 0.8 that you can choose
what will happen because of this is you will
have true positive rate to be 0.5 which
ideally should you would be wanting it to
be more but at this case you won't have any
false positives coming in okay.
This is 1 way to interpret the threshold so you can
keep a threshold of point eight,
but now if
you want your true positive performance to
be really good what you can do is in turn
of having threshold at point eight, you can
have a threshold at point four where in your
allowing half the false positives that would
normally occur but you are guaranteeing that
almost all of your two positives are captured
so this can also be a good classifier again
deciding which threshold to choose is a business
decision that you will take based on the data
that you have
so yeah, that's it from my end.
I hope this video was informative
please do subscribe to the channel, thank you so
much.
