hello everyone welcome to my channel in
this particular video we will talk about
one of the popular machine learning
algorithm called KNN K nearest
neighbor where K represents our number 1
to n and the algorithm means that I am
as good as my K neighbors ok
so by the end of this particular video
you will understand what is KNN and why
you should make use of it
now before I go ahead just a reminder
please subscribe to my youtube channel
and press the bell icon you can also
follow me on the social media I have
provided the link in the description ok
so let's start now
kNN or K nearest neighbor is a
supervised learning algorithm you know
that there are three types of algorithm
called supervised learning unsupervised
learning and reinforcement learning
reinforcement learning is sometimes also
called semi-supervised learning now
supervised learning is of two types
classification and regression linear
regression is regression type and km
in its classification type now just to
remember that KNN can also be used
for regression but in this particular
video we will talk only about
classification because that's the
primary area where KNN is used so let's
go ahead and understand it now to
understand this particular algorithm
let's come up with a problem here is the
problem statement the problem statement
states that let's assume that a group of
people are traveling to London for a
marathon
ok now let's also assume that we have
the data of the participants till last
here okay now the problem state that
based on the distance they traveled just
the continent they are arriving from now
the problem is simple we have to get the
number of kilometers or miles they
traveled and we need to guess the
continent they are coming from now here
is how our data looked like which we
have collected till last year so we have
collected data from 2000 kilometer to
10,000 plus kilometer in east-west
north-south direction ok so this graph
is self explanatory
ok now let's move further and let's
try to classify the people based on the
continents okay now what we can do is
that we can safely assume that people
from two thousand to five thousand
kilometer east side are coming from
Europe now one important thing when I am
talking about east of London East means
if you open the Google map right side of
you will be east the left side of you
will be west okay this is how we can
relate this now again one disclaimer
that this is a totally imaginary data it
is only for learning purpose and it may
not accurately represent the continents
okay now two thousand to five thousand
kilometer in East Side Europe beyond
that Asia beyond that Australia in West
Side it will be either North America or
South America in North and South within
two thousand kilometer it will still be
Europe in south ranging from eight to
ten thousand kilometer beyond is Africa
so this is the data which we have now
let's try to solve our problem what will
happen that if somebody comes and ask me
please tell my continent okay this is
the problem we need to solve
so let's first try K equal to one
remember in the beginning I told you
that K is a number number of neighbors k
equal to one means find the first
nearest data point that is the person
matching the distance traveled okay so
from the visual reference you can see
that the nearest person is in Africa and
this person okay so looks like things
are simple the result is that this
person is from Africa or traveling from
Africa now what happens when we change K
is equal to two in this case we need to
consider two nearest data points so
let's consider two nearest data point
from this person first data point is in
Africa also from visually you can see
that the second nearest person is also
in Africa no issue both neighbors are
from Africa which means that this person
is also traveling from Africa that's the
result now with k equal to two consider
this person
this person wants you to tell which
continent he is traveling from but there
is a catch this person is equidistant
from the two nearest person both are in
different continent so you can see that
person in Africa is also in equidistance
person in North and South America is
also equidistant so what will be your
choice
what will your algorithm says that
whether this person belongs to or
traveling from Africa or North South
America well there is no clear answer
you can guess it you can generate some
random number generate some divider or
something like that and gets get out
some value which can represent either
Africa or North South America now the
problem is even numbers with two options
are bad because in that case there are
no clear decision points so you can
think that okay let's not use even
number that that's a good assumption you
know even numbers you will always get
this problem even if you are getting two
2 three 3 four 4 what will you decide ok
so let's not use even number for now ok
now let's try to solve this problem by
using K equal to 3 which means that we
have to consider three nearest data
point so from this particular picture if
you try to find a third nearest person
that 3rd nearest person also resides in
North or South America so in this
particular case with the voting majority
wins and that's how KNN works if there
are more than one options the voting
decides which particular neighbor it
belongs to in this particular case this
is actually belonging to North or South
America so now you understand how KNN
works and why you should not use even
number you can use odd numbers is it
let's see one more example in this case
it is North South America but if some
person like this ask you to find the
his or her continent now the problem
is that this person is equidistance from
3 neighbor all three of them are
different continue
one is in Africa one is in North South
America one is in Europe now we are back
to square one you got into the same
issue even if we use odd numbers so if
you are getting into this kind of issue
that means your K value is not
optimized you have to optimize your K
value based on your data set and this is
what you need to remember when you
decide your K value K people generally
randomly select 1 2 5 10 15 20 but you
have to understand that if any of your
data points getting into these kind of
situation then your K value is not
correct
okay now KNN is easy to implement okay
I will not recommend you to implement
and use some machine learning algorithm
library called scikit-learn or any other
library if you want to really make use
of your data to find valuable
information but if you want to just try
out your programming skills this is the
best algorithm to start with if you are
just learning and want to brush up your
skills and see how your programming
knowledge go with that
just think about implementing this
algorithm other thing is that it takes
more memory because everything is has to
be loaded into the memory so there is a
limitation beyond which KNN will not
work and you know we talked about
distance between two data points to
consider which is the nearest neighbor
in KNN we can use multiple ways of
calculating the distance some of them
are Euclidean Distance Hamming
distance Manhattan distance minkowski
distance in general people use
Euclidean distance but you should try
each one of them to see where you get
the best results so thanks a lot guys
this is what about KNN and I hope I was
able to explain the idea behind KNN
algorithm in the best possible way
please subscribe for programming machine
learning and cloud computing videos
thanks a lot thanks for watching good
day
