Since we're generating an immeasurable
amount of data it has become a need to
develop more advanced and complex
machine learning techniques. Boosting
machine learning is one such technique
that can be used to solve complex
data-driven real-world problems hi
everyone I'm Zulaikha from Edureka and
I welcome you to this session on
boosting machine learning let me quickly
run you through today's agenda we're
going to begin the session by
understanding why boosting is used after
that we'll understand what exactly
boosting means in machine learning we'll
move on and discuss how the boosting
algorithm works and we'll finally
discuss the different types of boosting
which include adaptive boosting, gradient
boosting and XGBoost we finally in the
session by looking at a practical
implementation in Python where will
understand how boosting machine learning
algorithms can be used to improve the
accuracy of a model so as before I move
any further make sure that you subscribe
to Edureka youtube channel in order to
stay updated above the most trending
technologies now let's take a look at
our first topic so why exactly are we
using boosting machine learning
techniques before I tell you what it is
let's understand what led to the need of
boosting machine learning in order to
solve complex and convoluted problems we
require more advanced techniques right
now let's suppose that given a data set
of images containing cats and dogs you
are asked to build a machine learning
model that can classify these images
into two separate classes now like every
other person you will start by
identifying the images by using some
rules let's say the image has point to
yours is a rule now if the image has
point ears then it's a cat right
similarly let's say you've created
another rule which is the image has cat
shaped lights which means it's a cat
again
now if the image has bigger limbs and
then it's a dog and if the image has
sharpened claws then it's a cat
similarly if the image has a wider mouth
structure then it's a dog
now these are some rules that we define
in order to identify whether it is a cat
or a dog using one of these rules to
classify the image it does not make
sense okay let's say that the cat is of
a different breed and it has bigger
limps and you give an input image and
the rule sees the image has bigger limbs
is classified as dog now each of these
rules if applied individually on an
image will not give you an accurate
result right you have to apply all of
these rules and make sure that image go
through all these rules and only then
predict the output so each of these
rules individually are called weak
learners because these rules are not
strong enough to classify an image as a
cat or a dog individually okay what I'm
saying is if you just use one rule to
classify an image as a cat or a dog then
your prediction will mostly be wrong you
cannot take one feature into
consideration and classify the images
either cat or dog so to make sure that
our prediction is more accurate we can
combine the predictions from each of
these weak learners by using the
majority rule or weighted average and
this is exactly what a strong learner
model is so in the above example what we
did is we've defined five weak learners
and the majority of these rules give us
a prediction that the image is a cat
that's why our final output is a cat
right here you can see that three of
these rules classify the image as a cat
and two of them classify it as a dog so
a majority says that this is a cat so
we're going to go with cat this is what
a strong learner module is it'll just
combine all the weak learners in order
to give you a more precise and more
accurate prediction now this brings us
to the question what exactly is boosting
boosting is an ensample learning
technique that uses a set of machine
learning algorithms in order to convert
or combine weak learners to strong
learners in order to increase the
accuracy of the model
so guys boosting is actually a very
effective method in order to increase
the efficiency of your model in most of
these competitions that you see on
Kaggle or any machine learning
competition maximum
of the computers or you know the winners
usually implement boosting and bagging
or any other ensamble learning technique
now for those of you who don't know what
ensamble learning is don't worry I'll be
covering that in the next line so as you
can see from the figure by combining the
outputs or the predictions that we get
from all our vehicle owners or our rules
in order to get a strong learner right
so this is the basic principle behind
boosting now let's understand what
ensamble learning is and Semba learning
is basically a technique that is used to
enhance your model performance and its
accuracy this is exactly why in-sample
methods are used to win market leading
competitions such as the Netflix
recommendation competition and other
Kaggle competitions right so maximum of
your winners will always be implementing
ensamble learning models so under
ensamble learning we have two types we
have sequential and symbol and parallel
and symbol so guys before you get
confused let me tell you that boosting
is a type of ensamble learning boosting
and bagging are the two different types
in which you can perform ensamble
learning so the first type of model is
the sequential ensamble model which is
popularly known as boosting here the
weak learners are sequentially produced
during the training phase the
performance of the model is improved by
assigning a higher beta to the previous
incorrectly classified samples an
example of boosting is the adaptive
boosting algorithm now in boosting and
sample learning what happens is you feed
you entire data set to your algorithm
and the algorithm will make some
predictions let's say the algorithm
misclassified some of your data now what
happens in boosting is you pay more
attention to the misclassified data
points you increase our wait age and
therefore you make it a point that a lot
more importance is given to these
misclassified values you keep doing this
until all your wrongfully predicted or
your misclassified samples are correctly
predicted right that's how you increase
the efficiency of your model then we
have something known as parallel and
sample learning also known as bagging
your the week learners are produced
parallel
during the training phase now the
performance of the model can be
increased by parallely training a number
of weak learners on a bootstrapped data
set an example of bagging is the random
forest algorithm okay so the principle
behind bagging is dividing your data set
into different bootstrap data sets and
you're running a weak learner or an
algorithm on each of these data sets so
your paddle I doing all of this
whereas in boosting you sequentially
doing this along with updating the
weights depending on the misclassified
samples right this is exactly what
ensamble learning is and I just told you
what exactly bagging and boosting is
right so there is a clear distinction
between these two and this is actually
one of the most frequently asked
questions if you go for an interview on
machine learning they always make sure
to ask you what exactly is bagging and
boosting and how are they different
so as make sure you understand the
difference between the two now let's
move on and understand how the boosting
algorithm works so like I mentioned the
basic principle here is to generate
multiple week learners and combine their
predictions to form one strong rule now
these big learners are generated by
applying base machine learning
algorithms on different distributions of
the data set now these base learning
algorithms are usually decision trees by
default in a boosting algorithm so what
these based learners do is they generate
weak rules for each iteration so after
multiple iterations the weak learners
are combined and they form a strong
learner that will predict a more
accurate outcome so let me explain this
stepwise to you consider this about data
set over here you have two different
types of data you have squares and you
have circles so basically your end goal
is to classify them into two different
classes now this is exactly how you do
it so how you start is the base
algorithm will read the data and it'll
assign equal wage to all of these data
points so after that we'll try to
analyze the data and try to draw a
decision stump
decision stump is basically a single
level decision tree that tries to
classify the data points so after it
assigns equal weights to all the
the points in the class it will try to
draw a decision stump right in the first
image you can see the decision stump
now after that it will check for any
false predictions now the next step is
the base learner will identify all the
false predictions that had made so in
the next iteration what you do is you
just assign a higher weight age to these
misclassified samples in the first image
we have successfully separated these two
squares right but there are three other
squares on the other side meaning that
we've misclassified these three squares
so in the next iteration if you take a
look at the image the three squares have
a higher beta jazzing have shown that by
increasing the size of the image bhadiya
the next hydration you increase the page
on your misclassified samples
similarly you keep doing this until you
separate your class a from a class B so
basically you are going to pay more
attention to your misclassified samples
you're going to increase their wait age
and you're going to make sure that those
samples are correctly classified in the
next iteration so like I said you'll
repeat the step two right you will keep
increasing the weight age of
misclassified samples until all of the
samples are correctly classified so look
at the forth diagram here so everything
is classified correctly right we have a
set of squares we have a set of circles
so that's exactly how boosting algorithm
works now let's understand the different
types of boosting so mainly there are
three classes of boosting this adaptive
boosting gradient boosting and xg--
boost so we'll discuss each of these in
brief so that the boosting is what I
explained to you in the previous line it
is implemented by combining several weak
learners into a single strong learn so
the couple of steps that this adaptive
boosting algorithm follows so adaptive
boosting starts by assigning equal
weight edge to all of your data points
and you draw out a decision stump for a
single input feature so the next step is
the results that you get from the first
decision stump are analyzed and if any
observations are misclassified then they
are assigned higher weights this exactly
are explained in the previous slide also
so after that new decision stump is
drawn by considering the
observations with the higher weights as
more significant so whichever data point
was misclassified they are given a
higher weight it in the next step you'll
draw another decision stump that tries
to classify the data points by giving
more importance to the data points with
higher weight age again if there are any
observations that are misclassified then
they're given higher weight and this
process will keep continuing it will
keep looping until all the observations
fall into the right class so the end
goal here is to make sure that all your
data points are classified into the
correct classes adaptive boosting or add
a boost can also be used for regression
problems it's not restricted to
classification only it can be used for
both classification and regression but
it's more commonly seen in
classification problems so that was a
brief about adaptive boosting now let's
understand gradient boosting so gradient
boosting is also based on the sequential
and symbol learning model here what
happens is the base learners are
generated sequential e in such a way
that the present based learner is always
more effective than the previous one
basically the overall model improves
sequential e with each iteration now the
difference in this type of boosting is
that the weights for misclassified
outcomes are not incremented you're not
going to increment or add weights to the
misclassified outcomes instead in
gradient boosting what you do is you try
to optimize the loss function of the
previous learner by adding a new
adaptive model that adds weak learners
in order to reduce the loss function now
the main idea here is to overcome the
errors in the previous learner's
prediction now this type of boosting has
three main components it has something
known as the loss function loss function
is the one that needs to be optimized
meaning that you need to reduce the
error the other component is that the
weak learners are needed for computing
predictions and forming strong learners
then you need an additional model that
will regularize the loss function
meaning that it will try to fix the loss
or the error from the previous week
learner
right so you keep adding a model that
will regularize the loss function from
the previous learner so just like
adaptive boosting gradient boosting can
also be used for both classification and
regression problems now let's discuss
the last type of boosting which is XG
boost now XG boost is basically an
advanced version of gradient boosting it
literally means extreme gradient
boosting so XG boosting actually falls
under the category of distributed
machine learning community okay it's a
more advanced version of the gradient
boosting method the main aim in this
algorithm is to increase speed and to
increase the efficiency of your
competitions and off the model
performance so the reason why this model
was introduced was because gradient
boosting algorithm was computing the
output at a very slow rate right because
there's sequential analysis of the data
set and it takes a longer time
that's why XG boost was introduced it
basically boosts your or extremely boost
the performance of the model so SG boost
is mainly going to focus on your speed
and your model efficiency in order to do
this it has a couple of features
it supports parallelization by creating
decision trees parallely there's no
sequential modeling in this it
implements or something known as
distributed computing methods for
evaluating any large and any complex
modules it also uses out of core
computing in order to analyze huge and
varied datasets it implements kashchei
optimization in order to make best use
of your hardware and of your resources
overall so guys these were the basics of
the different types of boosting
algorithms now to make things a little
more interesting let's run a practical
implementation a short disclaimer before
I get started with the demo is that I'll
be using Python to run the demo so if
you don't know Python I'll leave a
couple of links in the description box
you can go through those links and maybe
then come back and watch this video so
now let's understand what exactly we're
going to do in this demo so your problem
statement is to study a mushroom data
and build a machine learning model that
in classify a mushroom as either
poisonous or edible by analyzing the
features of the mushroom so you're going
to be given a mushroom dataset what you
have to do is you have to understand
which of these mushrooms are edible and
which is poisonous so this data set
basically has mushrooms of 23 different
species and a species is either
classified as edible mushrooms or non
edible ones right so guys the logic
again over here is to build a machine
learning model by using one of the
boosting algorithms in order to predict
whether or not a mushroom is edible so
let me quickly open up the code for you
I hope all of y'all can see the console
so we'll wait for this to run until then
lemme just run you through the entire
code so like any other demo what you do
is you start by importing the required
packages now the best thing in Python is
that there are inbuilt packages and
libraries it let you implement any
complex process all you have to do is
you have to import these libraries so
that's exactly what I'm doing over here
after that I'm loading the dataset into
a variable known as data set and so
basically this is my data set it's
stored in this location at all I'm doing
is I'm reading it and I'm storing it in
this variable after that we'll perform
data processing here we will define the
column names at in our data set the
column names are not defined and I'm
just defining all the column names here
and then we're assigning these column
names to our data set next we're running
this print data set info to look at all
our features so these are our data
columns so in total we have 23 features
meaning that there are 23 variables are
to which one variable is your target
variable so the target variable is your
output variable that we're trying to
predict and the rest of the variables
bruce's cap color cap surface all of
these are predictor variables next what
we're going to do is we are going to
drop this target variable from our data
set because we are trying to predict
this we're trying to predict the value
of this target variable so we'll just
drop this variable because we are trying
to predict that vehicle our Y will
contain our target variable our X will
contain no target variable now Y is
basically created for evaluating your
model so guys I hope you all know what
all of this is right I'm not going
in-depth into this because this is basic
machine learning and I'm hoping that
you'll have a good idea about machine
learning if you're studying boosting
machine learning next you're performing
something known as data splicing which
is basically splitting your data set
into your training and testing data set
right this variable here defines size of
your testing data set so 30% is assigned
for testing and 70% here is assigned for
training after that we're creating a
model by using the decision tree
classifier as our base estimator right
base estimators basically your weak law
and here we're using the entropy method
in order to find the best attribute for
the route vehicle this is a part of
decision tree next for calling this
function adaboost classifier now this is
an inbuilt function that would basically
do the exact same thing that an adaptive
boosting classify is supposed to do and
the three important parameters that you
pass through this function base
estimator n estimators and learning rate
your base estimator is basically you're
a weak learner right and by default a
weak learner is always the decision tree
right so what we're doing is which is
passing the variable model over here in
model we've stored the decision tree
classifier next we have n underscore
estimators so this field specifies the
number of based learners that we are
going to use right it just specifies the
number of week learners that we have in
our model we've assigned a number of 400
then we have learning grid now learning
rate specifies the learning rate of
course which we have set to the default
value as one
let me just clear that line that's not
needed next which is fitting our
training data set into our model here
which is evaluating our model and seeing
how it will predict the values when you
give it the testing data set next which
is comparing our predicted values with
our actual values when you do that we
get an accuracy of 100% and here you can
see the accuracy is 100% which is
perfect because you know this is
expected when you use boosting machine
learning algorithms now instead of using
boosting machine learning if you try
this with just weak learner models like
decision trees then your accuracy will
not be hundred-percent
there's always some of the other problem
especially where decision tree is
overfitting can occur so the best way to
get your model to increase its accuracy
is by using boosting machine learning
algorithms that's exactly what I wanted
to prove to y'all that boosting
technique will help you increase the
accuracy of your model so guys with that
we come to an end of today's session if
you have any doubts regarding this
session then you can leave them in the
comment section I hope you enjoy the
class
and until next time happy learning. I
hope you have enjoyed listening to this
video please be kind enough to like it
and you can comment any of your doubts
and queries and we will reply them at
the earliest do look out for more videos
in our playlist and subscribe to Edureka
channel to learn more, happy
learning.
