Hello, I will explain how SVM algorithm works.
This video will explain the support vector
machine for linearly separable binary sets
Suppose we have this two features, x1 and
x2 here and we want to classify all this elements
You can see that we have the class square
and the class rectangle
So the goal of the SVM is to design a hyperplane,
here we define this green line as the hyperplane,
that classifies all the training vectors in
two classes
Here we show two different hyperplanes which
can classify correctly all the instances in
this feature set
But the best choice will be the hyperplane
that leaves the maximum margin from both classes
The margin is this distance between the hyperplane
and the closest elements from this hyperplane
We have the case of the red hyperplane we
have this distance, so this is the margin,
which we represent by z1
And in the case of the green hyperplane we
have the margin that we call z2
We can clearly see that the value of z2 is
greater than z1
So the margin is higher in the case of the
green hyperplane, so in this case the best
choice will be the green hyperplane
Suppose we have this hyperplane, this hyperplane
is defined by one equation, we can state this
equation as this one
We have a vector of weights plus omega 0 and
this equation will deliver values greater
than 1 for all the input vectors which belongs
to the class 1, in this case the circles
And also, we scale this hyperplane so that
it will deliver values smaller than -1 for
all values which belongs to class number 2,
the rectangles
We can say that this distance to the closest
elements will be at least 1, the modulus is
1
From the geometry we know that the distance
between a point and a hyperplane is computed
by this equation
So the total margin which is composed by this
distance will be computed by this equation
And the aim is that minimizing this term will
maximize the separability
When we minimize this weight vector we will
have the biggest margin here that will split
this two classes
To minimize this weight vector is a nonlinear
optimization task, which can be solved by
this conditions (KKT), which uses Langrange
multipliers
The main equations state that the value of
omega will be the solution of this sum here
And we also have this other rule.
So when we solve these equations, trying to
minimize this omega vector, we will maximize
the margin between the two classes which will
maximize the separability the two classes
Here we show a simple example
Suppose we have these 2 features, x1 and x2,
and we have these 3 values
We want to design, or to find the best hyperplane
that will divide this 2 classes
So we know that we can see clearly from this
graph that the best division line will be
a parallel line to the line that connects
these 2 values here
So we can define this weight vector, which
is this point minus this other point.
So we have the constant a and 2 times this
constant a
Now we can solve this weight vector and create
the hyperplane equations considering this
weight vector
We must discover the values of this a here
Since we have this weight vector omega here,
we can substitute the values of this point
and also using this point we can substitute
these 2 values here
When we place the equation g using the input
vector (1,1) we know that we have the value
-1 because this belongs to the class circle
So we will have this value here, when we use
the second point, we apply the function and
we know that it will deliver the value 1
So we substitute here in the equation also
Well, given 2 equations we can isolate the
value of omega 0 in the second equation and
we will have omega 0 equal to 1 minus 8 times
a
So, using this value, we put the omega 0 in
the first equation and we will reach the value
of a, which is 2 divided by 5
Now we discover the value of a and now we
substitute the first equation and also discover
the value of omega 0
So by dividing here we will come to the conclusion
that omega 0 is minus 11 divided by 5 and
since we know that the weight vector is a
and 2 a we can substitute the value of a here
and we will deliver these values of the weight
vector
So in this case, these are called the support
vectors because they compose the omega value
2 divided by 5 and 4 divided by 5
And we substitute here the values of omega
(2 divided by 5 and 4 divided by 5) and also
the omega 0 value we will deliver the final
equation which defines this green hyperplane
which is x1 plus 2 times x2 minus 5.5
And this hyperplane classifies the elements
using support vector machines
These are some references that we have used
So this is how SVM algorithm works
