hey guys welcome to another session by
Intellipaat driving a data-driven
business using machine learning is
considered an important aspect in
today's world top companies such as
Amazon Facebook Apple and many more used
machine learning to perform advanced
analytics and drive their business to
success in today's session we're gonna
have a quick look into the world of
machine learning algorithms now before
we begin do subscribe to Intellipaat
YouTube channel so that you never miss
out on any of upcoming videos now let's
have a look at the agenda for today's
video
first we'll understand why we require
machine learning algorithms then we'll
further understand what these algorithms
actually are after that we'll take a
quick dive into the world of machine
learning algorithms and finally we'll do
a couple of demos using these algorithms
also guys if you're looking to get
certified in data science Intellipaat
provides data science certification
training courses for more details you
can check out the description without
much further delay let's get started
what do you think is the need for
something called as an algorithm well
consider this situation right so let's
say you're either baking a cake or
you're driving your car you're even
walking or singing well your body is
continuously oh you know executing the
set of steps that you have already
trained it to do and then this is what
we call as an algorithm so basically
when you when you driving your car your
brain is already programmed to do all
the tasks that are required to pretty
much help you to you don't drive your
car and then when you're walking as well
how do you maintain balance well as a
kid if you could realize that
maintaining your balance as a toddler
was very difficult but then you trained
yourself every day and then now you can
walk very easily right so this process
which involves learning and then this
repetitive process is again pretty much
can be termed as an algorithm as well
guys well if you've been wondering if
algorithms are new concepts well they're
not algorithms have been used for
decades together well back to this
person on the screen called Alan Turing
this person appears a good fact this
person was the reason probably why World
War two ended he was the one who decoded
the very famous encrypted enigma
messages from Germany and then this
person decoded that and then and all the
code breakers and so much more right so
the entire point here is to tell you
that algorithms have been used as an
age-old tradition that's being used and
these days we've been pushing it to our
computer science field as well and then
making sure that we make full use out of
it guys and then again why would we
require it well think of the huge amount
of data that's being generated these
days and then think of the methods that
we'd need to process it to understand
the data or to process the data and then
to you know pretty much clean up the
data and work with it right so for all
of these we have something called as
algorithms guys so on that note what are
algorithms what is the formal definition
of an algorithm
well guys algorithms are as simple as
this they are just a set of rules or you
can call them as processes as well to be
followed in calculations
or any other problem-solving operations
when done by a computer well house how
simple is that well this is exactly what
an algorithm means well you have a turn
a symbol on the left hand side that is
pretty much what a flowchart looks like
as well or don't worry you'll just be
checking out the flowchart sections in
the next set of this slide but then
right now I want to tell you guys it you
guys are using algorithm as is well step
one you're looking at your screen while
you've programmed yourself to look at
the screen that's an algorithm and
YouTube is running a recommendation
algorithms where you just aw let's say
you search for something Python
tutorials or anything for that matter
right intellipaat videos are up there
so how does YouTube know that you know
it should recommend intellipaat's videos
to its learners
well again an algorithm is being said
there and every time you check your mail
you mails a filter in your inbox or in
your spam folder and so much more so how
does do a Google or gmail know what or
what mail is a spam mail what mail is
not a spam mail right so that again is
an algorithm right there and no matter
what operating system you're on Windows
right now or let's say iOS let's say Mac
OS Android whatever right so all these
operating systems are using algorithms
right now and on that note we let's
quickly break it down into simple terms
and check out the relationship between a
pseudocode and a flowchart guys a quick
info guys if you're looking to get so
defined in data science Intellipaat
provides the data science certification
training courses do check out our
website for more information let's
continue the session so here it is a
very simple piece of code for you guys
this is what we call a pseudocode or a
pseudocode is almost a high level
language code it just looks a little
very literal and then you can figure out
what the code is doing even though you
might not be a native programmer so once
the code part of it and the other is
what we call as the algorithm which the
flowchart alongside it so pretty much
we're inputting a single variable a
putting the value 10 to it we're
inputting our variable B or putting a
value of 20 to it we're adding it pretty
much so C will have the value 30 right
now right
a plus B is 10 plus 20 and then we out
putting that the word start and stop
again are a part of the pseudocode
flowchart relationship and on the right
side if it can just take a look this is
what the flowchart of this exact
pseudocode will look like guys well this
was very simple so let me quickly step
it up one single notch you know where we
can go about checking another pseudocode
flowchart relationship guys so here
again we're inputting a inputting B and
then we're making sure that until a
becomes equal to B will be printing all
the values from A to B and then we're
gonna be increasing a by one so right
now a is 10 it's gonna check if you know
10 is equal to 20 it's not so until 10
becomes 20 we're gonna start printing
out everything so the answer is going to
be 10 11 12 13 14 all the way until 20
and this is going on around an iteration
in a loop if you can figure out the
diamond box is called as the decision
box where it has two tracks one is one
can be your true/false strike or a
yes/no track and in this particular case
we have the yes/no track here guys
so on that note we need to understand
why we would require all these
algorithms in machine learning right so
before that why would we even require
machine learning well guys again the
machine learning definition can pretty
much can be given you know to the world
as the ability for a machine to learn
something without it being programmed
for that particular thing well how cool
is that it is again basically the field
of study where computers use a massive
amount of data and they apply all of
these algorithms were training
themselves how here's the keyword
training themselves and again making
predictions on that right so again
training in machine learning entails
feeding a lot of data into the algorithm
and allowing the machine itself to learn
more about the process information well
you're gonna just tell the Machine a lot
of basics probably or just show it one
iteration where the Machine pretty much
goes on to figure out say 9 or 10 more
iterations on its own it's gonna learn
on its own it's kind of process on its
own and pretty much you know you can
work with that data later on right so
again we can call this a process of
converting just raw data into useful
information as
but then we're doing it with the help of
these algorithms that we're about to
learn guys so on that note or we need to
check out what the types of machine
learning are so we have three main types
of learning which happens when we talk
about machine learning guys it's
supervised learning
it's unsupervised learning and it's
reinforcement learning guys so if I were
you guys I would just suggest I would
just suggest you guys just take a minute
pause on the slide to note these three
types of machine learning guys
supervised learning unsupervised
learning and reinforced learning if
you're already familiar with the
concepts or if you think that you got it
in the bag well let's more to check out
what supervised learning actually means
oh well supervised learning as the name
suggests requires some sort of
supervision right let us talk in terms
of variables so we can understand it
easily again in super wise machine
learning algorithms let's say we have
input variables and our output variables
these input variables are denoted by X
and the output variables are denoted by
Y so X is input Y is output the goal of
any supervised learning system is to
understand how your output variable Y
changes with respect to the change made
in terms of X guys so how does the
output variable Y vary when we go about
playing with our input variable X is
pretty much the goal of for supervised
learning system guys and then here will
also be approximating the mapping
function or to a point where we'll have
new input data coming in which we
haven't seen which the machine hasn't
seen and then we can predict new output
variables Y with respect to all the new
data the new X data that the machine
just saw so we have pre ended for a
particular amount of X's and then it saw
a new amount of data a new amount of
input variables and then it trains
itself to pretty much give us new a Y
output value guys so how cool is that
right and then we need to also know that
we have dependent variables and the
concept of independent variables right
and our aim here is to pretty much
understand how our dependent variable
will
change with respect to one independent
variable so we have a couple of
dependent variable with, you know
goes hand-in-hand with all the
variability call as the independent
variable and then we need to understand
what are the changes that goes into
these dependent variables when they are
mapped across and compared or with
respect to our independent variable says
just to make sure that you guys are
getting the concept out here here's a
very simple example showing you the same
so again here our independent variable
in our particular cases let's say our
gender of the student we have a girl and
a boy here the dependent variable can be
the outcome of the educational
qualification of these Students so
let's say if the student either passed
an examination or fail an examination
this becomes our dependent variable so
the independent variable is our gender
the dependent variable becomes the
output of what the student is trying to
do and at the end of it what we're
trying to do is basically trying to
determine whether the student would pass
the exam or not based on the person's
gender let's say we're doing a survey
where we need to find out how many girls
have passed or how many boys have passed
here again the gender becomes the
independent variable and all of that
depending on it in our particular cases
the outcome the paths of the fail
becomes dependent right so here again we
trying to find out if the student would
pass based on the gender or not so the
dependent variable would pretty much
here be again now as I've already been
mentioning it's going to be the outcome
and the independent variable is going to
be the gender guys so do we have
anything more in terms of supervised
learning well yes guys here is more
classification with respect to
supervised learning as we have for
something called as classification and
something called as the regression let
us quickly check out water regression is
and then we can come talk about
classification guys well regression is a
type of supervised learning where the
output variable is a continuous numeric
value to what do we mean by a continuous
numeric value right so let me again take
another quick example to make sure you
guys understand this better
I've images of two apples for you guys
one Apple cost four dollars the other
Apple costs are three dollars here the
output variable is the cost of the Apple
it is a numeric value which is a
nice value you can predict it right is
the Apple ripe if it's yes then its
costly if it's not yet ripe then it's
cheap well is it or Shimla Apple as a
Kashmiri Apple is it a Washington
Apple well you can you can pretty much
go on adding so many factors around this
Apple and then come up with one
particular outcome out of it which would
be the price right so the price depends
on all of these factors and in our case
the price is the output variable so
we're trying to predict the cost of the
apple with respect to all these other
factors right so again doing this in a
real-world or in a mathematical
situation and in this situation pretty
much we call it as a regression guys a
quick info guys if you're looking to get
Certified in data science intellipaat
provides the data science a certification
training courses do check out our
website for more information let's
continue the session so with respect to
regression again there is another type
of regression which what we call it as
the logistic regression and this is
basically just a technique you know
where our dependent variable instead of
it being a country it's numerical value
it is a categorical value guys so again
what do we mean by this time for an
example if you can take a look at the
example on your screen right now what
we're trying to do is we're trying to
predict whether or if it's gonna rain on
that particular day or not and this is
being done with respect to two
independent variables right so how do we
check rain again pretty much it's
usually done by checking the temperature
or checking the humidity and if all of
this is good we probably just go out
take a look at the sky or to check for
clouds and so much more right and you're
coming back to logistic regression the
dependent variable is the categorical
variable right so it can have only two
values a categorical variable can only
have two values it is mostly binary guys
so it is going to be either zero or it's
gonna be one and in this logistic
regression model what we call it
depending on all of these attributes or
we get the probability our final answer
is going to be either yes or no right so
if you ask someone a question is it
gonna rain their answer might be either
a yes or a no right so it's a binary
answer again here it's the same as well
again so - pretty much - graph out what
it would look like we have an s-shaped
curve out of this model what
we call as the logistic regression case
so on the Left we have a linear
relationship between our dependent
variables and the independent variables
and it's just a straight line on the
right since it's a binary value by the
outcome that we were looking at the
curve looks like an S so again guys take
a moment pretty much pause on this slide
to understand what a linear regression
graph looks like versus what a logistic
regression graph looks like so on that
note let us quickly come back to check
out the next subdivision under
supervised learning which is called as
classification guys oh you pretty much
as the name suggests you might already
know what classification means in
literal terms well again classification
here the output variable is
categorical in nature so again it's
going to be a binary value so you can
just have a have a look at the picture
on your screen and then we can
categorically analyze if that person is
a male or a female right so here the
buyer your outcome is again the gender
of the person if the person's either a
man or a woman and then again the output
variable is the gender of the person
which is a categorical value and we are
trying to classify this person into a
specific gender or based on all the
other factors as well well how do we
know it well we could see the beard on
the face it looks like a man so our
brain pretty much told us it as a man
right simple as that so on that note of
we've pretty much checked out what
supervised learning is so what is
unsupervised learning
well guys in unsupervised learning or
all of the algorithms that we have right
we have input data which has no labels
so when we mean that we the data does
not have any labels then there is
nothing that the Machine can map to
understand the data offhand very easily
so if we can take a look at the raw data
ourselves right so we can probably tell
that it there's a couple of fishes in
there there's a couple of birds in there
well we know it because we have trained
ourselves for that when the machine sees
this there's not gonna be any label
which is going to tell that this is a
fish or this is a bird
so our unsupervised learning algorithm
is pretty much going to run through this
again and at the end of it with respect
to clustering what we call is the
process of clustering it's going to
divide all the fishes for us divide all
the birds for us on its
on so here the input data has no input
labels has no class labels and it
doesn't know what's a fish what's a
bird right so again building a
supervised or unsupervised model on top
of this input data is again very
interesting and very fun guys
so here again is going to pretty much be
giving out two clusters first consists
of all the fishes and second consists of
all the birds guys so coming to
clustering which is again a major part
of unsupervised learning the most
important clustering algorithm the most
simple one is the k-means clustering
guys well k-means clustering again is an
unsupervised machine learning algorithm
where the aim is to pretty much go about
grouping all the similar data points
just like fishes and birds and making it
to do one cluster race so again there
must be already high I know intra
cluster similarity and low inter cluster
similarity out here right so what do we
mean by that well all the data points
you know within a cluster should be as
similar as possible and all the data
points in between two different clusters
must be as different as possible so all
the data in one cluster is simple and
similar all the data when you compare
two different clusters are very
different to each other right so this is
pretty much the k-means clustering in
just a sentence guys well what is the K
stand for on the k-means clustering
right well k is the number of clusters
that you just want the outcome to be in
a particular case we have close to A
cluster B and cluster C so the K value
here is three because we have three
different clusters right very very very
simple as that guys so on that note the
next type of learning that happens is
what we call as the reinforcement
learning guys again in reinforcement
learning or there is something called as
an agent and this agent pretty much runs
up and returns up most effective actions
for us by mapping its state at every
single moment guys so to give you a
better clarity just so I I hope you guys
have played pac-man in your raw in your
olden days guys so in this particular
video game the space around or around
the figure should what we call as a 2d
game space again you have all you have
something called is packed dots you have
enemies you have walls and so much more
right so the action here is to again
just pretty much more around and make
sure you don't
bad guys and just finish your entire
goal here how do you know what the who
the good guys are and where you need to
move and how you you're not supposed to
you know get out every single time right
so that particular thing you've been
playing this game for a while or let's
say you've been playing this game for a
couple of hours couple of days in your
childhood and then you realize how the
game actually works well that exactly is
reinforcement learning guys again to
give you another example reinforcement
learning is pretty much how a dog or a
cat has trained in its real life as well
if the dog does something right if the
dog has given a handshake let's say
we're training a dog to give a handshake
and then if the dog is given a handshake
you might see that the trainer just
feeds a biscuit that instant right so
the dog knows that the outcome of giving
and a handshake is pretty much the right
thing to do because there is a biscuit
at the end of it so the reward is being
hunted by the animal right so again to
put it all in one single picture this
would or reinforcement learning
environment would look like I guess so
we have an agent who performs an action
in an environment and then here we can
actually have two tracks where it if the
agent does it right if the task is being
performed right there is a reward with
respect to it and everyone's happy yeah
else if you do not have that particular
reward then it means that something went
wrong and this will have a state because
something went wrong you're eventually
not getting the reward let's say the dog
did not give you a handshake or if you
pretty much give it a biscuit at that
moment it will not realize if it's doing
the right thing or the wrong thing right
so that we can have a state of let's say
the dog did not give a handshake and
that's pretty much what st means guys a
reward is RP and this keeps on going in
a nitration where you're just training
your model better and better and better
to hunt more rewards the more the
rewards then the machine is doing the
right thing it's as simple as that case
so all that note I have two very simple
demos which are in Python that I just
quickly want to run it by you guys to
tell you the use of machine learning
algorithms anyway also on that note let
me quickly jump into Google collab a
quick info guys if you're looking to get
so defined in data science intially path
provides data science certification
training courses do check out our
website for more information
let's continue the session google collab
is basically a Python or Jupiter
notebook hosted on the Google cloud and
I use this for most of my Python coding
as well
so anyway coming back to it here's the
here's the first example that we'd like
to discuss with you guys well just give
me a second the runtime is being
connected so it's almost connected now
it's initializing and then it's gonna
say connected any minute time and there
it is so first let us take out a k-means
clustering demo right so pretty much
we're gonna import a couple of packages
such as numpy pandas we have matplotlib
to pretty much give us the output in
terms of graphs we have SK learn to
pretty much import of what we have the
sub library called as the k-means
library and then go on working with it
so let me quickly import all of these
libraries that we'll be making use of
and then go ahead with that so to
generate a data of our own instead of
just picking it up from any data set for
this particular case we'll be making our
own data using something called us make
underscore blobs case so we'll have 300
samples here and then we'll have four
clusters each so this is what we mean
Zen and disco samples is $300 we have
300 dots on your screen right now and
these dots are divided pretty much into
four clusters for us so let us use
something called as the elbow method or
we're pretty much it's called as W CSS I
would recommend you guys pretty much
google it what would if you want to know
what W CSS means it does again a very
complex part of the k-means algorithm
and and i would just suggest you guys to
check it out on your own because it is
not on the scope of this particular
tutorial and then so we'll be using that
particular method and we're gonna tree
in the entire model for us or to make it
understand what's going on so look at
this right so what does the optimal
number of clusters again for us is
somewhere around or say 3 or 4 as well
so we have 4 clusters and we have the
WCS s all the way from 2500 or till 0
right so we're gonna have to categorize
this is just a graph to tell us what the
data might look like right so we need to
find out the centroid of what we call as
the centroid in our k-means clustering
algorithm of each different cluster and
then we need to mark that Center
right so this is exactly the red dot
what you see is again exactly what's
going on then
so if pretty much found out that there
are four clock clusters that exist and
then we've pretty much mark the centroid
of the of the four different clusters
that you see are using k-means
clustering guys it's as simple as that
so that was a very simple first demo
right for a second scenario I will be
checking out our logistic regression and
in this particular case we'll be going
on to predict a heart disease prediction
data set and we'll be performing our
machine learning algorithms and we'll be
using machine learning here to predict
if a person is gonna have a heart
disease or not and we're gonna be doing
this entirely using the process of
logistic regression guys again we're
importing a couple of libraries here
pandas to handle the data on numpy 200
mathematical operations Skype right to
go on to do our computations then we
have matplotlib and Seabourn - pretty
much to give us visualizations and we
have SK learn which is a sky kick learn
which is again a very important machine
learning library of Python and we're
gonna import all of these guys so just
before that we need oh we need the data
set file which is called as the
framingham data site well the data set
is from the town of Framingham in
Massachusetts so let me just quickly you
know import the file which is called as
the Framingham dataset and then we can
pretty much go on to working over that
guys so you know it's gonna take a
second to pretty much get uploaded it's
a small file and as you can see it's
been uploaded so now I can go out to
pretty much run this code where this is
what our dataset would look like oh if
it's a binary value for mail it means if
it's mail equal to one then the person's
mail if mail equal to zero it means the
person's if email there it has the age
it has if the person is a if the person
is a current smoker or not and how many
cigarettes per day do you have PP Mandic
BB medications and their blood pressure
basically and then have you had a stroke
in your life are you diabetic what is
your total cholesterol what is your
systolic blood pressure what is your
diastolic blood pressure what is your
body mass index what's your heart rate
what's the glucose that and then it's
not check your or CHT as well and so
much more so this isn't a me
using data said to work with and pretty
much we're gonna be just replacing the
column of mail by sections command
that's about it what we're doing here
and then we need to find out how many
missing values we have in this
particular data set and there are so
many values with zeros in it right so we
have a about 388 missing values when it
comes to glucose 50 missing values when
it comes to cholesterol and so much more
so let us go on to you know remove all
of these missing values and say hey look
it found pretty much about 500 or total
number of rows with missing values right
and it's fine in our particular case
because it's only 12% of the entire data
set so we can exclude that and we can
pretty much drop it and you know it
wouldn't hurt our analysis at the end of
it so to begin with you have to perform
some exploratory analysis where we need
to show what the data is being
distributed like I mean we just hunt
into our data to find out what the data
is telling us right so here's a couple
of for quick charts which pretty much
give us all of our numerical data with
respect to graph so we have again the
sex distribution we have the age
distribution current smokers BP
medications distribution cigarettes per
day up again our diabetics total
cholesterol is BMI systolic blood
pressure the weekend diastolic blood
pressure and so much more right so we're
just pretty much performing some quick
exploratory analysis analytics on it and
then are they gonna be going about to
find out what the actual this is just a
10-year raw CHD that i'm printing out
and then we need to go about finding out
if the person has a rate you know has a
chance of forgetting a heart disease or
not well here we can check out the count
right so there are about 500 let's say
600 people who are in the risk of
getting a heart disease while there are
about 3,500 or let's say 4,000 people
who are healthy and quite well this is
what exploratory analysis you know
pretty much helps us to do it gives us a
sort of an analytics number where it can
find out of the person might you know
suffer from our heart disease or so in
the near future and so much more right
so let us quickly you know go about
plotting that and we can go out from
that
well as you guys could see that pretty
much took about a minute of processing
because it has to plot so many values
for us right I'm sorry let me quickly
scroll down so we can get a better view
again this is respective this is a
seaborne access grid plot and then you
can see all the concentration of all the
values at every particular instant right
this is for every single aspect that we
are using to compare so let us quickly
use describe to pretty much tell us what
we're just looking at and yeah so we
have a count of about three thousand
seven fifty one males thieves it's gonna
give you the age of so many people it's
gonna give you all the cigarettes BB
Mets prevail and stroke and so much more
right so coming to the process of
logistic regression out here from all
these data set we need to make we need
to have an inference at the end of it
right so to do that we pretty much be
running a couple of functions one of
those functions is lambda function and
then we can have this very nicely
optimized output printed for us and then
as you can check out as it already says
the tenure or CH D is pretty much our
dependent variable will be using
logistic regression so much more right
so it's going to give you all the
standard errors all the values of we
call it the Z method it's going to be
the Z method value it's gonna check if
your probability of your outcome is
greater than or the value of Z with
respect to all of these single
categorical variables that were checking
and then when it comes to backward
elimination will pretty much be using
our off each of selection to go about
doing it and the end of it we can have a
summary very nice looking somebody
printed for us oh well again the
somebody looks nice right so we need to
make more sense out of it such that okay
this is the odds this is the ratio
around so here we have something called
as the p-values we have the odds ratio
and the CI 95% value is out here so here
we can pretty much go on to analyze what
actually causes or you know the the
outcome of let's say our heart disease
and so where we can make sense out of it
to use our model to make sense out of us
let's quickly split our row one single
dataset into a training data set and our
testing dataset and let us make our
model give us the answer for us right so
checking out model accuracy using our
raw skycat law library again
you can pretty much find out that our
model is almost accurate for about 90
percent right so eighty-eight point one
four percent is a big number and it's
been training well not for many times
right so the number of high iterations
again is very less so here's our subplot
is what we call as an access subplot and
here as well you can pretty much check
out the actual predicted outcome values
which is predicted one predictor zero
the actual outcome values is this color
while the actual values blue color right
so the color distribution here again
will let you know if what's going on
there as well well here is another step
to pretty much print out what's you know
what's a true or true positive rate of
the data true negative date of the data
and so much more to put it all into one
single print statement to make it sure
it looks very nicely the accuracy of our
entire model is about 88% the miss
classification is pretty much 1 - so
what the accuracy is right so we've
missed about 11 percent of accuracy true
positive rates we are somewhere about 4
percent - negative rates we have
somewhere around 99 percent positive
prediction rate is 80 percent negative
prediction rate is somewhere around 88
percent and so much more right so look
at this amount of data look at this
amount of data that our machine learning
algorithm is up is pretty much giving us
right so if you put it literally you
know in terms of for use cases in terms
of medicine then this is going to help a
lot of people right so that was a quick
walk through you know pretty much on how
you can go about using gain means
clustering and logistic regression
algorithm sketch all right guys I hope
this video is helpful to you if you have
any further queries do let us know in
the comment section below we'll reach
out to you immediately so guys thank you
so much for watching this video and
giving us your precious time
