Welcome to this quick introduction 
to the confusion matrix. If you've ever
looked at a confusion matrix 
for the first time, you've might have found it,
well, confusing. But it doesn't have 
to be. A confusion matrix is a simple way to
lay out how many predicted categories 
or classes were correctly predicted and how
many were not. It is used to 
evaluate the results of a predictive model with a
class outcome to see the number of
classes that were correctly predicted as
their true class. To understand 
what's going inside this confusion matrix of
correct classes versus incorrect 
classes, we first need to understand true
positives, true negatives, 
false positives and false negatives. Kind of confusing,
right? Well let's relabel these terms
to make it a bit clearer. Essentially, the
confusion matrix is just keeping 
track of Class A correctly predicted as Class
A, Class B correctly predicted as 
Class B, Class A incorrectly predicted as Class B
and Class B incorrectly predicted as
Class A. Where true and false comes into
it, is we want to know if our target
Class A was correctly predicted as A
which is true, or incorrectly predicted
as B when in fact was A, which is false.
Our target Class A is our positive 
and the other Class B is our negative, so
then a true positive and a true 
negative is a positive Class A correctly
predicted as Class A, and negative 
Class B correctly predicted as B. We want to
get as many predictions of A and B 
as possible, aiming for more trues rather
than falses. So then how do we 
organize this in a way to lay out the number of
correct A's and B's versus
incorrect A's and B's? Well, we draw a
grid. We place these into a matrix grid
where the x-axis is the predictions made
and the y-axis is the actual
class label. So let's just say we have 200 subjects
of which 100 are from Class A 
and 100 from class B. 60 of the actual A cases on
the y-axis were correctly 
predicted as their true class, A on the x-axis.
For class B, 30 actual B classes 
on the y-axis were truly predicted as Class B
on the x-axis. If you look at the
diagonal counts, that's how many subjects
were correctly predicted as their
classes. So these are all the trues for
the positive and negative classes.
So now you have a way of identifying
which class is predicted
correctly most of the time compared to
other classes. And evaluate 
whether your predictive model is, you know, guessing
right most of the time, or is it 
guessing wrong on each of these class. One last
thing, how do we decide what is 
Class A and what is Class B? What should be the
positive class, and what should be the
negative class? Well, most of the time it
doesn't matter which class you 
assign to positive or negative, as the confusion
matrix would tell you how many 
subjects were correctly, you know, predicted from
each class. But here are some 
examples of a target class you might want to
differentiate from a non target class. 
In standard binary classification you could
be interested in returning 
customers as, you know, the positive target class
versus new customers as the 
negative class. Or it could be one target class
versus all other classes, such as
aggressive cancer versus all other
passive type cancers as the 
single negative class. Either way, you're
comparing how many were 
correctly or incorrectly classified from each class.
The confusion matrix will tell you how
many times actual class a was predicted
as B and vice versa, or if they were
correctly, you know, classed as their true
labels. And that sums up the 
confusion matrix. Thanks for watching,
give us a like if you found it 
useful, or you can check out our other
video tutorials at 
tutorials.datasciencedojo.com
