Hi everyone, welcome back to my channel. So
today in this video. We'll be seeing what
is the KNN that is the K nearest neighbors
algorithm. So we already saw what was the
naive Bayes classifier. And today we'll be
seeing what is KNN so KNN is basically a supervised
classification algorithm in which you have
some data points or data vectors which separated
into a different number of several categories
and it tries to predict the classification
of a new sample. That particular population
set now this K nearest neighbours algorithm.
That is the KNN algorithm is a lazy algorithm.
Now what I mean by lazy is that it starts
to only memorizes
The process and it does not like learns itself.
So it's like just consider like you have certain
points and if you say KNN algorithm to come
here, then it will come to this point or if
you tell KNN to go there then it will merge
to that point. So it's a kind of like it does
not take any own decisions of itself. So that's
why it's a lazy algorithm now it classifies
any new
Points based on a similarity measure so that
similarity measure can be like anything you
have the euclidean distance. So it basically
calculates the euclidean distance between
two points. So it's a proximity measure and
then it's tries to identify who our parents
neighbour. So let's understand the working
of KN. How does it work? So basically on this
axis, you have some set of points you have
that is 30 points that is 30 vectors or 30
data points. Now some data points are labelled
in black and some are in blue and some are
in red now. We just
Chose one point that is we are interested
in this point. Now we want to find the neighbourhood
of this particular point. Now what it does
is it basically calculates at distances from
this query point to each of this current point.
So these points are the current point and
this is the query point and these all are
the example points.
Or the current points. So what this basically
does is it calculates? It distances of this
query point from each current Point like to
this to this to this to all these points and
it basically uses the Euclidean distance or
you can use any distance measurement in distance
or Minkowski distance or any other distance.
So it will be basically the X 1 minus x 2
square plus y 1 minus.
By 2 square and it tries to have a distance
Matrix for all these particular data points
now which all points are there in its neighbours.
It just classifies or that will be a particular
neighbour. Now, for example, say if you have
K is equal to 3, then which all points which
are closer to this. That is the query point.
Then it will label all those as its neighbours.
Now, there should be two things. Are you need
to remember about two points? About the K
parameter. So the K must be an odd parameter
like 3 or it can be 5 or 7 now why it should
be odd because if you just assign K as any
even number like 2 4 C's then there will be
a tie in the classification of that particular
label. So for example, if there is now you
have this point and you have these two points,
so there are two blue points and there is
one black point now to which
The is basically a majority scheme. So the
majority voting scheme is there.
So here blue win since there are two of the
blue points and you have only one Black Point
as its neighbours. So this point or this query
point would be assigned the class label as
blue so blue will be the class level of this
query point. Now another condition is that
K must not be a multiple of the classes. So
if you have to say for example seven classes
now, then K should not be equal to
7 into 2 that this 14 classes so This also
would be a condition for the tying above the
class assignment. Now, there is one particular
variation or there is one particular scenario
where you have K is equal to 1 so in that
case, what happens is like you have particular
data points, then it will be assigned certain
regions into the partitions is say, for example,
this is into one region and this is into one
region. This is another region. So this is
region 1 region to Region 3 and region
So what it basically does is it partitions
the space into several regions? So this is
called a Voronoi partition space.
So what it suggests is like if any data element
that belongs to this region or if any data
point that belongs to region one would be
assigned to this class label that is this
circle or if there is any data element, which
is falling into this region. It would be assigned
as this class level similarly its work like
that. So this is a particular scenario or
a case where you have K is equal to 1 so it
divides into Voronoi my partition space now,
let's talk about the
Algorithm of KNN. How does it work? So we
basically start with the loading of the data.
You have a number of data points you loaded
from a CSV. Or dot XLS file then you initialize
the K that is a key is a hyperbola amateur
in this case. So you just assign the nearest
neighbours to it.
That is evil assign K is equal to 3 or k is
equal to 5 or k is equal to seven any odd
numbers. That is the only condition now for
each of the samples in the training data.
So you will be having your training data and
test data so for each data point in the training
data, you have to calculate the distance between
that query point and the current point so
that have explained here that is this will
be our query point and this will be our current
point. So with any distance measure like euclidean
distance or Manhattan distance
And even distance measures you need to calculate
the distance of that query point with that
current point and you have to add the distance
as well as the index of the example to an
ordered collection means there would be some
collection like this. So there would be some
index 0 1 2 3 4 5 and for each traversal,
it would assign each of the distance to each
of these indexes. Now you have this ordered
collection you need to have a sorted collection.
So for that, we need to sort the ordered collection
of distances and the indexes from small to
large. That is we need to sort it into ascending
order.
Means which is the first point which is visited
and which has the minimum distance from that
query point. Now, you need to pick the first
K entries from a certain collection. So so
you have 30 points. So from the query point,
you would be calculating all the Euclidean
distances so that you will put into your memory
or the ram then with the K specified which
is here. You will pick the first that many
k entries and then you sorted now.
Need to get the labels of the selected K entries
which are there. So if it's a regression problem,
then you need to return the mean of the K
levels means how many number of labels are
there / that many labels or if it's a classification
problem. Then you need to return the more
of the K levels and if it's K is equal to
3 then 3 will be the more of that. So this
KNN algorithm can be used with both regressions.
as well as the classification for Just classifying
a particular problem in supervised learning.
So well, that was all regarding in this video.
So hope you guys enjoyed this video. If you
found this video helpful Please do like, share,
comment and if you are new to this Channel,
please consider subscribing. Thank you very
much for watching this video.
