Undoubtedly, Data Science is
the most revolutionary
technology of the era.
It's all about deriving
useful insights from data
in order to solve
real-world complex problems.
Hi all I welcome you
to this session
on Data Science full course
that contains everything
that you need to know
in order to master data science.
Now before we get started,
let's take a look
at the agenda.
The first module is
an reduction to data science
that covers all
the basic fundamentals of
data science followed by this.
We have statistics
and probability module
where you'll understand
the statistics and math
behind data science
and machine learning algorithms.
The next module is the basics
of machine learning
where will understand what
exactly machine learning is
the different types
of machine learning
the different machine
learning algorithms
and so on the next module
is the supervised learning
algorithms module
where we'll start
by understanding the most
basic With them or
which is linear regression.
The next module is
the logistic regression module
where we will see
how logistic regression
can be used to solve
classification problems.
After this we'll discuss
about decision trees
and we'll see
how decision trees
can be used to solve
complex data-driven problems.
The next module is random Forest
here will understand
how random Forest can be used
to solve classification problems
and regression problems
with the help
of use cases and examples.
The next module
will be be discussing is
the k-nearest neighbor module.
We will understand how gain
and can be used to solve
complex classification problems
followed by this.
We look at the
naive bias module,
which is one of the most
important algorithms
in the Gmail spam detection.
The next algorithm
is support Vector machine
where we will understand
how svm's can be used
to draw a hyperplane between
different classes of data.
Finally.
We move on to the unsupervised
learning module where we
will understand how genes
can be used for clustering.
And how you can perform
Market Basket analysis by using
Association rule mining.
The next module
is reinforcement learning
where we will understand
the different concepts
of reinforcement learning
along with a couple
of demonstrations
followed by this bill.
Look at the Deep learning module
where we will understand what
exactly deep learning is what
our neural networks
with different types
of neural networks.
And so on.
The last module is
the data science
interview questions module
where we will understand
the important concepts of data.
Along with a few tips in order
to Ace the interview now
before we get started
make sure you subscribe
to Adorama YouTube channel
in order to stay updated
about the most trending
Technologies data science is one
of the most in-demand
Technologies right now.
Now this is probably
because we're generating data
at an Unstoppable pace.
And obviously we need to process
and make sense out
of this much data.
This is exactly
where data science comes
in in today's session.
We'll be talking
about data science in depth.
So let's move ahead and take
a look at today's agenda.
We're going to begin
with discussing the various
sources of data and
how the evolution of technology
and introduction of IOD
and social media have led
to the need of data sign next.
We'll discuss how Walmart
is using insightful patterns
from their database to increase
the potential of their business.
After that.
We will see what
exactly data science is,
then we'll move on and discuss
who are data scientist is
where we will also discuss
the various skill sets.
Needed to become
a data scientist next
we can move on to see
the various data
science job roles
such as data analyst data
architect data engineer
and so on after this we
will cover the data life cycle
where we will discuss
how data is extracted processed
and finally use as a solution.
Once we're done with that.
We'll cover the basics
of machine learning
where we'll see what
exactly machine learning is
and the different types
of machine learning next.
We will move onto
the K means algorithm
and we'll discuss a use case
of the k-means clustering
after which we Discuss
the various steps involved
in the k-means algorithm
and then we will finally move on
to the Hands-On part
where we use the k-means
algorithm to Cluster movies
based on their popularity
on social media platforms,
like Facebook at the end
of today's session
will also discuss about what
a data science certification is
and why you should take it up.
So guys, there's a lot to cover
in today's session.
Let's jump into the first topic.
Do you guys remember the times
when we have telephones and we
had to go to PC your boots
in order to make a phone call.
Call now those things
are very simple
because we didn't generate
a lot of data.
We didn't even store
the contacts and our phones
or our telephones.
We used to memorize phone
numbers back then or you know,
these have a diary
of all our contact
but these days
we have smartphones
with store a lot of data.
So there's everything
about us in our mobile phones.
We have images we have contacts.
We have various apps.
We have games.
Everything is stored
on a mobile phones these days
similarly the PCS that we use
in the earlier times.
It used to process
very little data.
All right, there was A lot
of data processing needed
because technology was
an evolved that much.
So if you guys remember
we use floppy disk
back then and floppy.
This was used to store
small amounts of data,
but later on hard disks
were created and those
used to store GBS of data.
But now if you look
around there's data
everywhere around us.
All right, we have a data
stored in the cloud.
We have data in each and every
Appliance at our houses.
Similarly.
If you look at smart cars
these days they're connected
to the internet they connected
to a mobile phones
and this also generates
a lot of data.
What we don't realize is
that evolution of technology
has generated a lot of data.
All right.
Now initially there
was very little data
and most of it was even
structured only a small part
of the data was unstructured
or semi-structured.
And in those days you could use
Simple bi Tools in order
to process all of this data
and make sense out of it.
But now we have way
too much data and order
to process this much data.
We need more complex algorithms.
We need a better process.
All right, and this is
where data science
comes in now guys,
I'm not going to get
into the depth of data science.
Yet I'm sure all
of you have heard of iot
or Internet of things.
Now.
Did you guys know
that we produce
2.5 quintillion bytes
of data each day.
And this is only accelerating
with the growth of iot.
Now iot or Internet
of Things is just a fancy term
that we use for network
of tools or devices
that communicate and transfer
data through the internet.
So various devices
are connected to each other
through the internet
and they communicate
with each other right
now the communication happens
by exchange of data or by.
Generation of data now these
devices include the vehicles.
We drive the include our TVs
of coffee machines
refrigerators washing machines
and almost everything else
that we use in a daily basis.
Now, these interconnected
devices produce an unimaginable
amount of data guys iot data
is measured in zettabytes
and one zettabyte is equal
to trillion gigabytes.
So according to a recent
survey by Cisco.
It's estimated that by
the end of 2019,
which is almost here.
The iot will generate more
than five hundred zettabytes
of data per year.
And this number will only
increase through time.
It's hard to imagine data
in that much volume,
imagine processing analyzing
and managing this much of data.
It's only going
to cause as a migraine
so guys having to deal
with this much data
is not something that
traditional bi tools can do.
Okay.
We no longer can rely
on traditional data
processing methods.
That's exactly why
we need data science.
It's our only hope right
now now let's not get
into the details here.
Yet moving on.
Let's see how social
media is adding on
to the generation of data.
Now the fact
that we are all in love
with social media.
It's actually generating
a lot of data for us.
Okay.
It's certainly one of the fuels
for data creation Now
all these numbers
that you see on the screen
are generated every minute
of the day.
Okay, and this number
is just going to increase so
for Instagram it says
that approximately
1.7 million pictures uploaded
in a minute and similarly
on Twitter approximately.
A hundred and forty eight
thousand tweets are published
every minute of the day.
So guys imagine in one are
how much that would be
and then imagine in 24 hours.
So guys, this is
the amount of data
that is generated
through social media.
It's unimaginable.
Imagine processing this much
data analyzing it and then
trying to figure out, you know,
the important insights
from this much data analyzing
this much data is going to be
very hard with traditional tools
or traditional methods.
That's why data science
was introduced data science
is a simple process
that will just extract the
useful information from data.
All right, it's just
going to process
and analyze the entire data
and then it's just
going to extract
what is needed now guys apart
from social media and iot,
there are other factors as well
which contribute to
data generation these days
all our transactions
are done online, right?
We pay bills online.
We shop online.
We even buy homes online
these days you can even sell
your pets on oil excuses.
Not only that
when we stream music
and Watch videos on YouTube all
of this is generating a lot
of data not to forget.
We've also brought Health Care
into the internet wall.
Now there are various
watches like bit fit
which basically trans
our heart rate
and it generates data about
a health conditions education is
also an online thing right now.
That's exactly what you
are doing right now.
So with the emergence
of the internet,
we now perform all
our activities online.
Okay, obviously, this
is helping us,
but we are unaware of
how much data we are generating
what can be done with All
of this data and what
if we could use the data
that we generated
to our benefit?
Well, that's exactly
what data science
does data science is all
about extracting the useful
insights from data and using
it to grow your business.
Now before we get into
the details of data science,
let's see how Walmart uses data
science to grow that business.
So guys Walmart is
the world's biggest retailer
with over 20,000 stores
in just 28 countries.
Okay.
Now, it's currently building
the world's biggest.
Good Cloud,
which will be able to process
two point five petabytes
of data every hour now.
The reason behind
Walmart success is
how the user customer data
to get useful insights about
customers shopping patterns.
Now the data analyst and
the data scientist at Walmart.
They know every detail
about their customers.
They know that
if a customer buys Pop-Tarts,
they might also buy cookies,
how do they know all of this?
Like how do they generate
information like this now
the user data that they get
from their customers.
Hours and the analyze it
to see what a particular
customer is looking for.
Now.
Let's look at a few cases
where Walmart actually
analyze the data
and they figured out
the customer needs.
So let's consider the Halloween
and the cookie sales example now
during Halloween sales Analyst
at Walmart took
a look at the data.
Okay, and he found out
that a specific
cookie was popular
across all Walmart stores.
So every Walmart store was
selling these cookies very well,
but he found out
that they would to stores
which are not selling.
A DOT.
Okay.
So the situation was immediately
investigated and it was found
that there was
a simple stocking oversight.
Okay, because of which
the cookies were not put
on the shelves for sale.
So because this issue
was immediately identified
they prevented any further loss
of sales now
another such example,
is that true Association
rule mining Walmart found out
that strawberry Pop-Tart sales
increased by seven times
before a hurricane.
So a data analyst at Walmart
identified the association
between ha Hurricane
and strawberry pop tarts
through data mining now guys.
Don't ask me the relationship
between Pop-Tarts
and Harry Caine,
but for some reason whenever
there was a hurricane
approaching people really wanted
to eat strawberry Pop-Tart.
So what Walmart did
was they place all
the strawberry Pop-Tarts?
I will check out
before a hurricane would occur.
So this way the increase sales
of the Pop-Tarts Now,
where's this is a natural thing.
I'm not making it up.
You can look it up
on the internet.
Not only that Walmart
is analyzing the data generated
by Social media to find out
all the training product so
through social media.
You can find out the likes
and dislikes of a person right?
So what Walmart did is
they are quite smart
the user data generated
by social media to find out
what products are trending
or what products
are liked by customers.
Okay an example
of this is 1 mod analyze
social media data to find out
that Facebook users were crazy
about cake pops.
Okay, so Walmart
immediately took a decision
and they introduced cake pops
into the Walmart stores.
So guys the only reason
Walmart is so successful is
because the huge amount of data
that they get they don't see
it as a burden instead.
They process this data analyze
it and then you try to draw
useful insights from it.
Okay, so they invest a lot
of money a lot of effort
and a lot of time
and data analysis.
Okay, they spend a lot
of time analyzing data in order
to find any hidden patterns.
So as soon as they find out
hidden pattern or association
between any two products,
these are giving out offers
or Started having discount
or something along that line.
So basically Walmart uses data
in a very effective manner
the analyzer very, well.
They process the data very well
and they find out
the useful insights
that they need in order to get
more customers or in order
to improve their business.
So guys, this was all
about how Walmart uses
data science now,
let's move ahead and look
at what is data set now
guys data science is all about
uncovering findings from data.
It's all about surfacing
the hidden insights
that can help.
Ponies to make
smart business decisions.
So all these hidden insights
or these hidden patterns can
be used to make better decisions
in a business now an example
of this is also Netflix.
So Netflix, basically analyzes
the movie viewing patterns
of users to understand
what drives user interest
and to see what users want
to watch and then
once they find out
they give people
what they want.
So guys actually data
has a lot of power.
You should just know
how to process this data
and how to extract
the useful information.
From data.
Okay.
That's what data
science is all about.
So guys a big question
over here is
how do data scientists get
useful insights from data.
So it's all starts
with data exploration.
Whenever a data scientist comes
across any challenging question
or any sort
of challenging situation,
they become detectives so
the investigative leads
and they try to understand
the different patterns
or the different
characteristics of the data.
Okay.
They try to get
all the information
that they can from the data
and then Then they use it
for the betterment
of the organization
or the business.
Now, let's look at
who is a data scientist.
So guys the data scientists
has to be able to view data
through a quantitative lengths.
So guys knowing math is one
of the very important skills
of data scientists.
Okay.
So mathematics is important
because in order to find
a solution you're going to build
a lot of predictive models
and these predictive models are
going to be based on hard math.
So you have to be able
to understand all
the Underlying mechanics
with these models most
of the predictive models most
of the algorithms
require mathematics.
Now, there's a
major misconception
that data science is
all about statistics.
Now, I'm not saying
that statistics is an important.
It is very important,
but it's not the only type
of math that is utilized
in data science.
There are actually
many machine learning algorithms
which are based
on linear algebra.
So guys overall you need
to have a good understanding
of math and apart
from that data scientist.
Eli's technology,
so data scientists have to be
really good with technology.
Okay.
So their main work is
they utilize all the technology
so that they can analyze
these enormous data sets and
work with complex algorithms.
So all of this requires tools,
which are much more
sophisticated than Excel
so there's data scientist need
to be very efficient
with coding languages
and few of the core language
has associated with data science
include SQL python R & sass.
It is also important
for a data scientist.
Be a tactical
business consultant.
So guys business problems can be
on a sword by data scientist
since our data scientists
work so closely with data
they know everything
about the business.
If you have a business
and you give the entire data set
of your business
stored data scientist,
he know each and every aspect
of your business.
Okay?
That's how data scientists work.
They get the entire data set.
They study the data set
the analyze it and then we see
where things are going wrong
or what needs to be
done more or what?
Needs to be excluded.
So guys having this business
Acumen is just as important
as having skills
in algorithms or being good
with math and technology.
So guys business is
also as important as
these other fields now,
you know who
our data scientist is.
Let's look at the skill sets
that a data scientist names.
Okay, it always starts
with Statistics statistics
will give you the numbers
from the data.
So a good understanding
of Statistics is very important
for becoming a data scientist.
You have to be familiar
with satisfaction.
Contest distributions maximum
likelihood estimators and all
of that apart
from that you should also
have a good understanding
of probability Theory
and descriptive statistics.
These Concepts will help you
make Better Business decisions.
So no matter what type
of company or role
you're interviewing for.
You're going to be
expected to know
how to use the tools
of the trade.
Okay.
This means that you have
to know a statistical
programming language like our
or Python and also you'll need
to know or database.
Wiring language like SQL now
the main reason why
people prefer our
and python is because of
the number of packages
that these languages have
and these predefined
packages have most
of the algorithms in them.
So you don't have
to actually sit down
and code the algorithms instead.
You can just load one
of these packages
from their libraries and run it.
So programming languages
is a must at the minimum.
You should know our
or python and a database
query language now,
let's move on to data
extraction and processing.
So guys That you have
multiple data sources like
mySQL database Mongo database.
Okay.
So what you have to do
is you have to extract
from such sources
and then in order to analyze
and query this database you have
to store it in a proper format
or a proper structure.
Okay, finally, then you can load
the data in the data warehouse
and you can analyze
the data over here.
Okay.
So this entire process is called
extraction and processing.
So guys extraction
and processing is all
about getting data.
From these different
data sources and then
putting it in a format
so that you can analyze it
now next is data wrangling
and exploration now
guys data wrangling is one
of the most difficult tasks
in data science.
This is the most
time-consuming task
because data wrangling is all
about cleaning the data.
There are a lot of instances
where the data sets
have missing values
or they have null values
or they have inconsistent
formats or inconsistent values
and you need to understand
what to do with such values.
This is Data wrangling
or data cleaning comes
into the picture then
after you're done with that.
You are going
to analyze the data.
So where's after data wrangling
and cleaning is done.
You're going to start exploring.
This is where you try to make
sense out of the data.
Okay, so you can do this
by looking at the different
patterns in the data
the different Trends outliers
and various unexpected results
in all of that.
Next.
We have machine learning.
So guys if you're
a large company
or with huge amounts of data or
if you're working at a company.
See where the product
is data driven,
like if you're working
in Netflix or Google Maps,
then you have to be
familiar with machine
learning methods, right?
You cannot process
large amount of data
with traditional methods.
So that's why you need
a machine learning algorithms.
So there are few algorithms.
Like knok nearest neighbor
does random Forest
this K means algorithm
this support Vector machines,
all of these algorithms.
You have to be aware of all
of these algorithms
and let me tell you
that most of these algorithms
can be implemented.
Using our or python libraries.
Okay, you need to
have an understanding
of machine learning.
If you have large amount
of data in front of you
which is going to be the case
for most of the people right now
because data is being generated
at an Unstoppable Pace earlier
in the session we discussed
how much of data is generated.
So for now knowing
machine learning algorithms
and machine learning Concepts
is a very required skill
if you want to become
a data scientist,
so if you're sitting
for an interview as
a data scientist,
you will be asked
machine learning.
Seems you will be asked
how good you are
with these algorithms
and how well you
can Implement them.
Next we have big
data processing Frameworks.
So guys, we know
that we've been generating
a lot of data and most
of this data can be structured
or unstructured as well.
So on such data,
you cannot use traditional
data processing system.
So that's why you need
to know Frameworks
like Hadoop and Spark.
Okay.
These Frameworks can be used
to handle big data lastly.
We have data visualization.
So guys data visualization is Is
one of the most important part
of data analysis,
it is always very important
to present the data
in an understandable
and Visually appealing format.
So data visualization
is one of the skills
that data scientists
have to master.
Okay, if you want to communicate
the data with the end users
in a better way then
data visualization is a must
so guys are a lot of tools
which can be used for data
visualization tools like Diablo
and power bi are few the most
popular visualization tools.
So with this we sum up
the entire skill set
that is needed to become
a data scientist apart from this
you should also have data-driven
problem solving approach.
You should also be
very creative with data.
So now that we know the skills
that are needed to become
a data scientist.
Let's look at the different
job roles just data science is
a very vast field.
There are many job roles
under data science.
So let's take a look
at each role.
Let's start off
with a data scientist.
So there's data scientists
have to understand.
The challenge is over business
and they have to offer the best
solution using data analysis
and data processing.
So for instance
if they are expected
to perform predictive analysis,
they should also be able
to identify Trends and patterns
that can have the companies
in making better decisions
to become a data scientist.
You have to be an expert in
our Matlab SQL Python and other
complementary Technologies.
It can also help
if you have a higher
degree in mathematics
or computer engineering
next we have data.
An analyst so a data
analyst is responsible
for a variety of tasks,
including visualization
processing of massive amount
of data and among them.
They have to also perform
queries on databases.
So they should be aware
of the different query languages
and guys one of the most
important skills of
a data analyst is optimization.
This is because they have
to create and modify algorithms
that can be used to pull
information from some
of the biggest databases
without corrupting the data
so to become Be done.
You must know Technologies
such as SQL our SAS and python.
So certification in any
of these Technologies
can boost your job application.
You should also have
a good problem solving quality.
Next.
We have a data architect.
So a data architect
creates the blueprints
for a data management
so that the databases
can be easily integrated
centralized and protected
with a best security measures.
Okay.
They also ensure
that the data Engineers
have the best tools
and systems to work with So
to become a data architect,
you have to have expertise
and data warehousing
data modeling extraction
transformation and loan.
Okay.
You should also be
well versed in Hive Pig
and Spark now apart from this
there are data Engineers.
So guys,
the main responsibilities of
a data engineer is to build
and test scalable
Big Data ecosystems.
Okay, they are also needed
to update the existing systems
with newer or upgraded versions
and they are also responsible
for improving the efficiency.
For database now.
If you are interested
in a career as a data engineer,
then technologies
that require hands-on
experience include Hive nosql
are Ruby Java C++ and Matlab,
it would also help
if you can work
with popular data apis
and ETL tools next.
We have a statistician.
So as the name suggests you have
to have a sound understanding
of statistical theories
and data organization.
Not only do they extract
and offer valuable insights.
They also create new.
Methodologies for engineers
to apply now.
If you want to become
a statistician then you have
to have a passion for logic.
They are also good variety
of database systems
such as SQL Data Mining
and other various machine
learning Technologies by that.
I mean, you should be good
with math and you should also
have a good knowledge
about the weight is
database system such as SQL
and also the various
machine learning Concepts
and algorithms is
the most next we have
the database administrator.
So guys the job profile of
a database administrator
is Much self-explanatory,
they are basically responsible
for the proper functioning
of all the databases
and they are also responsible
for granting permission
or the working in services to
the employees of the company.
They also have to take care
of the database backups
and recoveries.
So some of the skills
that are needed to become
a database administrator include
database backup and Recovery
data security data modeling
and design next.
We have the business analyst now
the role of a business analyst
is a little It different
from all of the other
data signs job now.
Don't get me wrong.
They have a very
good understanding of the data
oriented Technologies.
They know how to handle a lot
of data and process it
but they are also very focused
on how this data can be linked
to actionable business inside.
So they mainly focus
on business growth.
Okay.
Now a business analyst
acts like a link
between the data engineers
and the management Executives.
So in order to become
a business analyst you have
to have an understanding
of business finances
business intelligence.
And also I did acknowledge,
he's like data modeling data
visualization tools and Etc
at last we have a data
and analytics manager
a data and analytics
manager is responsible
for the data science operations.
Now the main responsibilities
of a data and analytics
manager is to oversee
the data science operation.
Okay, he's responsible
for assigning the duties
to the team according
to their skills
and expertise now their strength
should include Technologies
like SAS our SQL.
And of course,
they should have good management
skills apart from that.
They must have excellent social
skills leadership qualities
and and out-of-the-box
thinking attitude.
And like I said earlier
you need to have a good
understanding of Technologies.
Like pythons as
our Java and Etc.
So Guys, these were
the different job roles
in data science.
I hope you all found
this informative.
Now, let's move ahead
and look at the data lifecycle.
So guys are basically six steps
in the data life cycle.
It starts with
a business requirement.
Next is the data acquisition
after that you
would process the data
which is called data processing.
Then there is
data exploration modeling
and finally deployment.
So guys before you even start
on a data science project.
It is important
that you understand the problem
you're trying to solve.
So in this stage,
you're just going to focus
on identifying the central
objectives of the project
and you will do this
by identifying the variables
that need to be
predicted next up.
We have data acquisition.
Okay.
So now that you have
your objectives I find it's time
for you to start
Gathering the data.
So data mining is the process
of gathering your data
from different sources
at this stage some
of the questions you
can ask yourself is
what data do I need
for my project?
Where does it live?
How can I obtain it?
And what is the most
efficient way to store
and access all of it?
Next up there is data processing
now usually all the data
that you collected
is a huge mess.
Okay.
It's not formatted.
It's not structured.
It's not cleaned.
So if Find any data set
that is cleaned
and it's packaged well for you,
then you've actually
won the lottery
because finding the right data
takes a lot of time
and it takes a lot of effort
and one of the major
time-consuming task
in the data science process
is data cleaning.
Okay, this requires
a lot of time.
It requires a lot of effort
because you have to go
through the entire data set
to find out any missing values
or if there are
any inconsistent values
or corrupted data,
and you also find
the unnecessary data.
Over here and you
remove that data.
So this was all
about data processing next
we have data exploration.
So now that you have sparkling
clean set of data,
you are finally ready to get
started with your analysis.
Okay, the data exploration stage
is basically the brainstorming
of data analysis.
So in order to understand
the patterns in your data,
you can use histogram.
You can just pull up
a random subset of data
and plot a histogram.
You can even create
interactive visualizations.
This is the point
where you Dive deep
into the data
and you try to explore
the different models
that can be applied
to your data next up.
We have data modeling.
So after processing the data,
what you're going to do
is you're going to carry
out model training.
Okay.
Now model training is basically
about finding a model
that answers the
questions more accurately.
So the process of model training
involves a lot of steps.
So firstly you'll start
by splitting the input data
into the training data set
and the testing data set.
Okay, you're going to take
the entire data set
and you're going to separate it
into Two two parts one is
the training and one
is the testing data
after that your build a model
by using the training data set
and once you're done with that,
you'll evaluate the training
and the test data set now
to evaluate the training
and testing data.
So you'll be using series
of machine learning
algorithms after that.
You'll find out the model
which is the most suitable
for your business requirement.
So this was
mainly data modeling.
Okay.
This is where you build a model
out of your training data set
and then you evaluate this model
by using the testing data set.
You have deployment.
So guys a goal of this stage
is to deploy the model
into a production or maybe
a production like environment.
So this is basically done
for final user acceptance
and the users have to validate
the performance of the models
and if there are any issues
with the model or any issues
with the algorithm,
then they have to be
fixed in this stage.
So guys with this
we come to the end
of the data lifecycle.
I hope this was clear statistics
and probability are essential
because these disciples
form the basic Foundation
of all machine
learning algorithms deep
learning artificial intelligence
and data science.
In fact, mathematics
and probability is
behind everything around us
from shapes patterns
and colors to the count
of petals in a flower
mathematics is embedded
in each and every aspect
of our lives with this in mind.
I welcome you all
to today's session.
So I'm going to go ahead
and Scoffs the agenda for today
with you all now going to begin
the session by understanding
what is data after that.
We'll move on and look at the
different categories of data,
like quantitative
and qualitative data,
then we'll discuss what
exactly statistics is
the basic terminologies in
statistics and a couple
of sampling techniques.
Once we're done with that.
We'll discuss the different
types of Statistics
which involve descriptive
and inferential statistics.
Then in the next session
will mainly be focusing
on descriptive statistics
here will understand
the different measures
of center measures
of spread Information Gain
and entropy will also
understand all of these measures
with the help of a use case
and finally we'll discuss
what exactly a
confusion Matrix is
once we've covered the entire
descriptive statistics module
will discuss the probability
module here will understand what
exactly probability is
the different terminologies
in probability will also
study the Different
probability distributions,
then we'll discuss the types
of probability which include
marginal probability joint
and conditional probability.
Then we move on
and discuss a use case
where and we'll see
examples that show us
how the different types
of probability work
and to better
understand Bayes theorem.
We look at a small example.
Also, I forgot to mention
that at the end of the
descriptive statistics module
will be running a small demo
in the our language.
So for those of you
who don't know much
about our I'll be explaining
every line in depth,
but if you want to have
a more in-depth understanding
about our I'll leave
a couple of blocks.
And a couple of videos
in the description box
you all can definitely
check out that content.
Now after we've completed the
probability module will discuss
the inferential statistics
module will start this module
by understanding
what is point estimation.
We will discuss
what is confidence interval
and how you can estimate
the confidence interval.
We will also discuss margin
of error and will understand all
of these concepts by looking
at a small use case.
We'd finally end the inferential
Real statistic module by looking
at what hypothesis
testing is hypothesis.
Testing is a very important part
of inferential statistics.
So we'll end the session
by looking at a use case
that discusses how
hypothesis testing works
and to sum everything up.
We'll look at a demo
that explains how
inferential statistics Works.
Alright, so guys,
there's a lot to cover today.
So let's move ahead
and take a look
at our first topic
which is what is data.
Now, this is
a quite simple question
if I ask any of You
what is data?
You'll see that it's
a set of numbers
or some sort of documents
that have stored in my computer
now data is actually everything.
All right, look around you there
is data everywhere each click
on your phone generates
more data than you know,
now this generated data
provides insights for analysis
and helps us make
Better Business decisions.
This is why data is
so important to give you
a formal definition data refers
to facts and statistics.
Collected together
for reference or analysis.
All right.
This is the definition
of data in terms
of statistics and probability.
So as we know data
can be collected it
can be measured and analyzed
it can be visualized by
using statistical models
and graphs now data is divided
into two major subcategories.
Alright, so first we
have qualitative data
and quantitative data.
These are the two
different types of data
under qualitative data.
We have nominal and ordinal data
and under quantitative data.
We have discrete
and continuous data.
Now, let's focus
on qualitative data.
Now this type of data deals with
characteristics and descriptors
that can't be easily measured
but can be observed subjectively
now qualitative data
is further divided
into nominal and ordinal data.
So nominal data is
any sort of data
that doesn't have
any order or ranking?
Okay.
An example of nominal
data is gender.
Now.
There is no ranking in gender.
There's only male female
or other right?
There is no one two,
three four or any sort
of ordering in gender race is
another example of nominal data.
Now ordinal data is basically an
ordered series of information.
Okay, let's say
that you went to a restaurant.
Okay.
Your information is stored
in the form of customer ID.
All right.
So basically you are represented
with a customer ID.
Now you would have rated
their service as
either good or average.
All right, that's
how no ordinal data is
and similarly they'll have
a record of other customers
who visit the restaurant
along with their ratings.
All right.
So any data which has
some sort of sequence
or some sort of order
to it is known as ordinal data.
All right, so guys,
this is pretty simple
to understand now,
let's move on and look
at quantitative data.
So quantitative data
basically these He's
with numbers and things.
Okay, you can understand
that by the word quantitative
itself quantitative is
basically quantity.
Right Saudis will numbers
a deals with anything that you
can measure objectively.
All right, so
there are two types
of quantitative data there is
discrete and continuous data
now discrete data is also
known as categorical data
and it can hold a finite number
of possible values.
Now, the number of students
in a class is a finite Number.
All right, you can't
have infinite number
of students in a class.
Let's say in your fifth grade.
They have a hundred students
in your class.
All right, there weren't
infinite number but there
was a definite finite number
of students in your class.
Okay, that's discrete data.
Next.
We have continuous data.
Now this type of data
can hold infinite number
of possible values.
Okay.
So when you say weight
of a person is an example
of continuous data
what I mean to see is my weight
can be 50 kgs or it NB 50.1 kgs
or it can be 50.00 one kgs
or 50.000 one or is
50.0 2 3 and so
on right there
are infinite number
of possible values, right?
So this is what I mean
by a continuous data.
All right.
This is the difference between
discrete and continuous data.
And also I'd like to mention
a few other things over here.
Now, there are a couple
of types of variables as well.
We have a discrete variable
and we have a continuous
variable discrete variable
is also known as
a categorical variable
or and it can hold values
of different categories.
Let's say that you have
a variable called message
and there are two types
of values that this variable
can hold let's say
that your message
can either be a Spam message
or a non spam message.
Okay, that's when you call
a variable as discrete
or categorical variable.
All right, because it
can hold values
that represent different
categories of data
now continuous variables
are basically variables
that can store infinite
number of values.
So the weight of a person
can be denoted as
a continuous variable.
All right, let's say there is
a variable called weight
and it can store infinite number
of possible values.
That's why we will call
it a continuous variable.
So guys basically
variable is anything
that can store a value right?
So if you associate any sort
of data with a Able,
then it will become
either discrete variable
or continuous variable.
There is also dependent and
independent type of variables.
Now, we won't discuss all
of that in death because
that's pretty understandable.
I'm sure all of you know,
what is independent variable
and dependent variable right?
Dependent variable is
any variable whose value
depends on any other
independent variable?
So guys that much
knowledge I expect
or if you do have all right.
So now let's move on and look
at our next topic which Which is
what is statistics now coming
to the formal definition
of statistics statistics is
an area of Applied Mathematics,
which is concerned
with data collection
analysis interpretation
and presentation now usually
when I speak about statistics
people think statistics is
all about analysis
but statistics has other parts
to it it has data collection is
also a part of Statistics data
interpretation presentation.
All of this comes
into statistics already are
going to use statistical methods
to visualize data to collect
data to interpret data.
Alright, so the area
of mathematics deals
with understanding
how data can be used
to solve complex problems.
Okay.
Now I'll give you
a couple of examples
that can be solved
by using statistics.
Okay, let's say
that your company
has created a new drug
that may cure cancer.
How would you conduct
a test to confirm
the As Effectiveness now,
even though this sounds
like a biology problem.
This can be solved
with Statistics already
will have to create a test
which can confirm
the effectiveness of the drum
or a this is a common problem
that can be solved
using statistics.
Let me give you
another example you
and a friend are at a baseball
game and out of the blue.
He offers you a bet
that neither team will hit
a home run in that game.
Should you take the BET?
All right here you just
discuss the probability
of I know you'll win or lose.
All right, this
is another problem
that comes under statistics.
Let's look at another example.
The latest sales data
has just come in
and your boss wants
you to prepare a report
for management on places
where the company
could improve its business.
What should you look for?
And what should you
not look for now?
This problem involves a lot
of data analysis will have to
look at the different variables
that are causing
your business to go down
or the you have to look
at a few variables.
That are increasing
the performance of your models
and thus growing your business.
Alright, so this involves
a lot of data analysis
and the basic idea
behind data analysis is
to use statistical techniques
in order to figure
out the relationship
between different variables
or different components
in your business.
Okay.
So now let's move on
and look at our next topic
which is basic
terminologies in statistics.
Now before you dive
deep into statistics,
it is important that you
understand basic terminologies
used in statistics.
The two most important
terminologies in statistics
are population and Sample.
So throughout the statistics
course or throughout any problem
that you're trying
to stall with Statistics.
You will come
across these two words,
which is population and Sample
Now population is a collection
or a set of individuals
or objects or events.
Events whose properties
are to be analyzed.
Okay.
So basically you can refer
to population as a subject
that you're trying to analyze
now a sample is just
like the word suggests.
It's a subset of the population.
So you have to make sure
that you choose the sample
in such a way
that it represents
the entire population.
All right.
It shouldn't Focus add one part
of the population instead.
It should represent
the entire population.
That's how your sample
should be chosen.
So Well chosen sample
will contain most
of the information about a
particular population parameter.
Now, you must be wondering
how can one choose a sample
that best represents
the entire population now
sampling is a statistical method
that deals with the selection
of individual observations
within a population.
So sampling is performed
in order to infer statistical
knowledge about a population.
All right, if you
want to understand
the different statistics
of a population
like the mean
the median Median the mode
or the standard deviation
or the variance of a population.
Then you're going
to perform sampling.
All right, because it's
not reasonable for you to study
a large population
and find out the mean median
and everything else.
So why is sampling
performed you might ask?
What is the point of sampling?
We can just study
the entire population now guys,
think of a scenario
where in your asked
to perform a survey
about the eating habits
of teenagers in the US.
So at present there are
over 42 million teens in the US
and this number is growing
as we are speaking
right now, correct.
Is it possible to survey each
of these 42 million individuals
about their health?
Is it possible?
Well, it might be possible
but this will take
forever to do now.
Obviously, it's not it's
not reasonable to go around
knocking each door
and asking for what does
your teenage son eat
and all of that right?
This is not very reasonable.
That's By sampling is used.
It's a method wherein a sample
of the population is studied
in order to draw inferences
about the entire population.
So it's basically
a shortcut to studying
the entire population instead
of taking the entire population
and finding out
all the solutions.
You just going to take
a part of the population
that represents the
entire population
and you're going to perform
all your statistical analysis
your inferential statistics
on that small sample.
All right,
and that sample basically here
Presents the entire population.
All right, so I'm short
of made this clear
to y'all what is sample
and what is population now?
There are two main types
of sampling techniques
that are discussed today.
We have probability sampling
and non-probability
sampling now in this video
will only be focusing on
probability sampling techniques
because non-probability sampling
is not within the scope
of this video.
All right will only discuss
the probability part
because we're focusing
on statistics and
probability, correct.
Now again under
probability sampling.
We have three different types.
We have random
sampling systematic
and stratified sampling.
All right, and just
to mention the different types
of non-probability sampling,
's we have no bald Kota judgment
and convenience sampling.
All right now guys
in this session.
I'll only be
focusing on probability.
So let's move on
and look at the different types
of probability sampling.
So what is probability sampling
it is a sampling technique
in which samples
from a large population
are chosen by using
the theory of probability.
All right, so there
are three types
of probability sampling.
All right first we have
the random sampling now
in this method each member
of the population
has an equal chance
of being selected in the sample.
All right, so each
and every individual or each
and every object
in the population
has an equal John's
of being a part of the sample.
That's what random
sampling is all about.
Okay, you are randomly going
to select any individual
or any object.
So this Bay each individual has
an equal chance
of being selected.
Correct?
Next.
We have systematic sampling now
in systematic sampling
every nth record is chosen
from the population to be
a part of the sample.
All right.
Now refer this image
that I've shown over
here out of these six.
Groups every second group
is chosen as a sample.
Okay.
So every second record
is chosen here and this is
our systematic sampling works.
Okay, you're randomly
selecting the nth record
and you're going to add
that to your sample.
Next.
We have stratified
sampling now in this type
of technique a stratum
is used to form samples
from a large population.
So what is a stratum
a stratum is basically a subset
of the population that shares
at One common characteristics.
So let's say
that your population has a mix
of both male and female
so you can create to straightens
out of this one will have
only the male subset
and the other will have
the female subset.
All right, this is
what stratum is.
It is basically a subset
of the population
that shares at least
one common characteristics.
All right in our example,
it is gender.
So after you've created
a stratum you're going
to use random sampling
on these stratums
and you're going to choose.
Choose a final sample.
So random sampling meaning
that all of the individuals
in each of the stratum
will have an equal chance
of being selected in the sample.
Correct.
So Guys, these were
the three different types
of sampling techniques.
Now, let's move on and look
at our next topic
which is the different
types of Statistics.
So after this,
we'll be looking at the more
advanced concepts of Statistics,
right so far we discuss
the basics of Statistics,
which is basically
what is statistics the Friend
sampling techniques and the
terminologies and statistics.
All right.
Now we look at the different
types of Statistics.
So there are two major
types of Statistics
descriptive statistics
and inferential statistics
in today's session.
We will be discussing
both of these types
of Statistics in depth.
All right, we'll also
be looking at a demo
which I'll be running
in the our language
in order to make
you understand what exactly
descriptive and inferential
statistics is soaked.
As which is going
to look at the basic,
so don't worry.
If you don't
have much knowledge,
I'm explaining everything
from the basic level.
All right, so guys descriptive
statistics is a method
which is used to describe
and understand the features
of specific data set by giving
a short summary of the data.
Okay, so it is mainly
focused upon the
characteristics of data.
It also provides a graphical
summary of the data now
in order to make you understand
what descriptive statistics is.
Let's suppose that
you want to gift all
your classmates or t-shirt.
So to study the average
shirt size of a student
in a classroom.
So if you were to use
descriptive statistics to study
the average shirt size
of students in your classroom,
then what you would do is you
would record the shirt size
of all students in the class
and then you would find out
the maximum minimum and average
shirt size of the cloud.
Okay.
So coming to inferential
statistics inferential.
Six makes inferences
and predictions about
a population based
on the sample of data taken
from the population.
Okay.
So in simple words,
it generalizes a large data set
and it applies probability
to draw a conclusion.
Okay.
So it allows you
to infer data parameters
based on a statistical model
by using sample data.
So if we consider
the same example of finding
the average shirt size
of students in a class
in infinite real statistics.
We'll take a sample set
of the class
which is basically a few people
from the entire class.
All right, you already
have had grouped the class
into large medium and small.
All right in this method
you basically build
a statistical model
and expand it for the entire
population in the class.
So guys, there was a brief
understanding of descriptive
and inferential statistics.
So that's the difference
between descriptive
and inferential now
in the next section,
we will go in depth
about descriptive statistics.
Right.
So let's discuss more
about descriptive statistics.
So like I mentioned
earlier descriptive
statistics is a method
that is used to describe
and understand the features
of a specific data set by giving
short summaries about the sample
and measures of the data.
There are two important measures
in descriptive statistics.
We have measure
of central tendency,
which is also known as measure
of center and we have
measures of variability.
This is also known
as Measures of spread
so measures of center include
mean median and mode now
what is measures
of center measures of the center
are statistical measures
that represent the summary
of a data set?
Okay, the three main measures
of center are mean median
and mode coming
to measures of variability
or measures of spread.
We have range
interquartile range variance
and standard deviation.
All right.
So now let's discuss each
of these measures.
Has in a little
more depth starting
with the measures of center.
Now, I'm sure all of you know,
what the mean is mean is
basically the measure
of the average of all
the values in a sample.
Okay, so it's basically
the average of all
the values in a sample.
How do you measure the mean I
hope all of you know
how the main is measured
if there are 10 numbers
and you want to find the mean
of these 10 numbers.
All you have to do is you have
to add up all the 10 numbers
and you have to divide
it by 10 then.
Represents the number
of samples in your data set.
All right, since we
have 10 numbers,
we're going to
divide this by 10.
All right, this will
give us the average
or the mean so to better
understand the measures
of central tendency.
Let's look at an example.
Now the data set over here is
basically the cars data set
and it contains a few variables.
All right, it has
something known as cars.
It has mileage per gallon
cylinder type displacement
horsepower and relax.
Silver ratio.
All right, all of these measures
are related to cars.
Okay.
So what you're going
to do is you're going
to use descriptive analysis
and you're going to analyze
each of the variables
in the sample data set
for the mean standard deviation
median more and so on.
So let's say that you want
to find out the mean
or the average horsepower
of the cars among
the population of cards.
Like I mentioned earlier
what you'll do is you'll check
the average of all the values.
So in this case we will take
The sum of the horsepower
of each car and we'll divide
that by the total
number of cards.
Okay, that's exactly
what I've done here
in the calculation part.
So this hundred
and ten basically
represents the horsepower
for the first car.
All right.
Similarly.
I've just added up all
the values of horsepower
for each of the cars
and I've divided it by 8 now
8 is basically the number
of cars in our data set.
All right, so hundred and three
point six two five is
what army mean is or the average
of horsepower is all right.
Now, let's understand
what median is with an example.
Okay.
So to Define median median
is basically a measure
of the central value
of the sample set
is called the median.
All right, you can see
that it is the middle value.
So if we want to find
out the center value
of the mileage per gallon
among the population
of cars first,
what we'll do is we'll arrange
the MGP values in ascending
or descending Order
and choose a middle value
right in this case
since we have
eight values, right?
We have eight values
which is an even entry.
So whenever you have even
number of data points
or samples in your data set,
then you're going
to take the average
of the two middle values.
If we had nine values over here.
We can easily figure
out the middle value
and you know choose
that as a median.
But since they're even number
of values we are going
to take the average
of the two middle values.
All right.
Right.
So 22.8 and 23 are
my two middle values
and I'm taking the mean
of those 2 and hence I
get twenty two point nine,
which is my median.
All right, lastly,
let's look at
how mode is calculated.
So what is mode the value
that is most recurrent
in the sample set is known as
mode or basically the value
that occurs most often.
Okay, that is known as mode.
So let's say
that we want to find out
the most common type of cylinder
among the population of cards.
What we have to do is
we will check the value
which is repeated
the most number of times here.
We can see that the cylinders
come in two types.
We have cylinder of Type
4 and cylinder of type 6, right?
So take a look at the data set.
You can see that the most
recurring value is 6 right.
We have one two,
three four and five.
We have five six
and we have one two, three.
Yeah, we have three
four types of lenders
and five six types of lenders.
So basically we have
three four type cylinders
and we have five
six type cylinders.
All right.
So our mode is going
to be 6 since 6 is more
recurrent than 4 so guys
those were the measures
of the center or the measures
of central tendency.
Now, let's move on and look
at the measures of the spread.
All right.
Now, what is the measure
of spread a measure of spread?
Sometimes also called
as measure of dispersion is Used
to describe the variability
in a sample or population.
Okay, you can think
of it as some sort
of deviation in the sample.
All right, so you measure
this with the help
of the different
measure of spreads.
We have range
interquartile range variance
and standard deviation.
Now range is pretty
self-explanatory, right?
It is the given measure of
how spread apart the values
in a data set are
the range can be calculated
as shown in this formula.
You basically going
to subtract the maximum value
in your data set
from the minimum value
in your data set.
That's how you calculate
the range of the data.
Alright, next we
have interquartile range.
So before we discuss
interquartile range,
let's understand.
What a quartile is red.
So quartiles basically tell us
about the spread of a data set
by breaking the data set
into different quarters.
Okay, just like how the median
breaks the data into two parts
the court is We'll break it
into different quarters.
So to better understand
how quartile and
interquartile are calculated.
Let's look at a small example.
Now this data set basically
represents the marks
of hundred students
ordered from the lowest
to the highest scores red.
So the quartiles lie
in the following ranges
the first quartile,
which is also known as q1 it
lies between the 25th
and 26th observation.
All right.
So if you look at this
I've highlighted Add the 25th
and the 26th observation.
So how you can calculate
Q 1 or first quartile is
by taking the average
of these two values.
Alright, since both
the values are 45
when you add them up
and divide them by two
you'll still get 45 now
the second quartile
or Q 2 is between the 50th
and the 51st observation.
So you're going to take
the average of 58 and 59
and you will get
a value of 58.5.
Now, this is my second quarter
the third quartile.
Ah Q3 is between the 75th
and the 76th observation here.
Again, we'll take the average
of the two values
which is the 75th value
and the 76 value right
and you'll get a value of 71.
All right, so guys
this is exactly
how you calculate
the different quarters.
Now, let's look at
what is interquartile range.
So IQR or the interquartile
range is a measure
of variability based
on dividing a data set
into quartiles now
the The interquartile range
is calculated by
subtracting the q1 from Q3.
So basically Q3
minus q1 is your IQ are so
your IQR is your Q3 minus q1?
All right.
Now this is how each
of the quartiles are each core
tile represents a quarter,
which is 25% All right.
So guys, I hope all
of you are clear
with interquartile range
and what our quartiles now,
let's look at
variance covariance is
basically a measure
that shows How much
a random variable the first
from its expected value?
Okay.
It's basically the variance
in any variable now variance
can be calculated by using
this formula right here x
basically represents
any data point in your data set
n is the total number
of data points in your data set
and X bar is basically
the main of data points.
All right.
This is how you calculate
variance variance is
basically a Computing
the squares of deviations.
Okay.
That's why it says
s Square there.
Now let's look at
what is deviation deviation is
just the difference
between each element
from the mean.
Okay, so it can be calculated
by using this simple formula
where X I basically
represents a data point
and mu is the mean
of the population
or add this is exactly
how you calculate the deviation
Now population variance
and Sample variance
are very specific to
whether you're calculating the
variance in your population data
set or in your sample data set.
That's the A difference
between population
and Sample variance.
So the formula for population
variance is pretty explanatory.
So X is basically
each data point mu is the mean
of the population
n is the number of samples
in your data set.
All right.
Now, let's look at sample.
Variance Now sample variance
is the average of squared
differences from the mean.
All right here x
i is any data point
or any sample in your data
set X bar is the mean
of your sample.
All right.
It's not the main
of your population.
Ation, it's the mean
of your sample.
And if you notice
n here is a smaller
n is the number
of data points in your sample.
And this is basically
the difference between sample
and population variance.
I hope that is clear coming
to standard deviation is
the measure of dispersion
of a set of data from its mean.
All right, so it's basically
the deviation from your mean.
That's what standard deviation
is now to better understand
how the measures
of spread are calculated.
Let's look at a small use case.
So let's see Daenerys
has 20 dragons.
They have the numbers
nine to five four and so on
as shown on the screen,
what you have to do is
you have to work out
the standard deviation or at
in order to calculate
the standard deviation.
You need to know the mean right?
So first you're going to find
out the mean of your sample set.
So how do you calculate
the mean you add all the numbers
in your data set
and divided by the total number
of samples in your data set
so you get a value of 7.
Here then you calculate the rhs
of your standard
deviation formula.
All right.
So from each data point you're
going to subtract the mean
and you're going to square that.
All right.
So when you do that,
you will get
the following result.
You'll basically get
this 425 for 925
and so on so finally you
will just find the mean
of the squared differences.
All right.
So your standard deviation
will come up to two point
nine eight three
once you take the square root.
So guys, it's pretty simple.
It's a simple
At the magic technique,
all you have to do is you have
to substitute the values
in the formula.
All right.
I hope this was clear
to all of you.
Now let's move on
and discuss the next topic
which is Information Gain
and entropy now.
This is one of my favorite
topics in statistics.
It's very interesting and
this topic is mainly involved
in machine learning algorithms,
like decision trees
and random forest.
All right, it's very important
for you to know
how Information Gain and entropy
really work and why they are
so essential in building
machine learning models.
We focus on the statistic parts
of Information Gain
and entropy and after that
we'll discuss a use case.
And see how Information Gain
and entropy is used
in decision trees.
So for those of you
who don't know what
a decision tree is it is
basically a machine
learning algorithm.
You don't have to know
anything about this.
I'll explain
everything in depth.
So don't worry.
Now.
Let's look at
what exactly entropy
and Information Gain Is Now
guys entropy is
basically the measure
of any sort of uncertainty
that is present in the data.
All right, so it can be measured
by using this formula.
So here s is the set
of all instances in the data set
or all the data items
in the data set
n is the different type
of classes in your data set
Pi is the event probability.
Now this might seem
a little confusing
to y'all but when we go
through the use case,
you'll understand all
of these terms even better.
All right cam.
The information gained
as the word suggests
Information Gain indicates
how much information
a particular feature
or a particular variable gives
us about the final outcome.
Okay, it can be measured
by using this formula.
So again here heads
of s is the entropy
of the whole data set
s SJ is the number
of instances with the J value
of an attribute a s
is the total number
of instances in the data set V
is the set of distinct values
of an attribute a h
of s j is the entropy
of subsets of instances
and hedge of a comma
s is the entropy
of an attribute a even
though this seems confusing.
I'll clear out the confusion.
All right, let's discuss
a small problem statement
where we will understand
how Information Gain
and entropy is used to study
the significance of a model.
So like I said Information Gain
and entropy are very
important statistical measures
that let us understand
the significance of
a predictive model.
Okay to get a more
clear understanding.
Let's look at a use case.
All right now suppose we
are given a problem statement.
All right, the statement is
that you have to predict
whether a match can be played
or Not by studying
the weather conditions.
So the predictor variables here
are outlook humidity wind day
is also a predictor variable.
The target variable
is basically played
or a the target variable
is the variable
that you're trying to protect.
Okay.
Now the value of the target
variable will decide
whether or not a game
can be played.
All right, so that's
why The play has two values.
It has no and yes, no,
meaning that the weather
conditions are not good.
And therefore you
cannot play the game.
Yes, meaning that the weather
conditions are good and suitable
for you to play the game.
Alright, so that was
our problem statement.
I hope the problem statement
is clear to all of you now
to solve such a problem.
We make use of something
known as decision trees.
So guys think
of an inverted tree
and each branch of the tree
denotes some decision.
All right, each branch is
Is known as the branch known
and at each branch node,
you're going to take
a decision in such a manner
that you will get an outcome
at the end of the branch.
All right.
Now this figure
here basically shows
that out of 14 observations
9 observations result in a yes,
meaning that out of 14 days.
The match can be played
only on nine days.
Alright, so here
if you see on day 1 Day
2 Day 8 day 9 and 11.
The Outlook has been Alright,
so basically we try
to plaster a data set
depending on the Outlook.
So when the Outlook is sunny,
this is our data set
when the Outlook is overcast.
This is what we have
and when the Outlook
is the rain this is
what we have.
All right, so
when it is sunny we have
two yeses and three nodes.
Okay, when the
Outlook is overcast.
We have all four
as yes has meaning
that on the four days
when the Outlook was overcast.
We can play the game.
All right.
Now when it comes to rain,
we have three yeses
and two nodes.
All right.
So if you notice here,
the decision is being made by
choosing the Outlook variable
as the root node.
Okay.
So the root node is
basically the topmost node
in a decision tree.
Now, what we've done here is
we've created a decision tree
that starts with
the Outlook node.
All right, then you're splitting
the decision tree further
depending on other parameters
like Sunny overcast and rain.
All right now like we know
that Outlook has three values.
Sunny overcast and brain
so let me explain this
in a more in-depth manner.
Okay.
So what you're doing
here is you're making
the decision Tree by choosing
the Outlook variable
at the root node.
The root note is
basically the topmost node
in a decision tree.
Now the Outlook node has three
branches coming out from it,
which is sunny
overcast and rain.
So basically Outlook
can have three values
either it can be sunny.
It can be overcast
or it can be rainy.
Okay now these three values
Use are assigned
to the immediate Branch
nodes and for each
of these values the possibility
of play is equal
to yes is calculated.
So the sunny
and the rain branches
will give you an impure output.
Meaning that there is a mix
of yes and no right.
There are two yeses
here three nodes here.
There are three yeses here
and two nodes over here,
but when it comes
to the overcast variable,
it results in a hundred
percent pure subset.
All right, this shows that
the overcast baby.
Will result in a definite
and certain output.
This is exactly what entropy
is used to measure.
All right, it calculates
the impurity or the uncertainty.
Alright, so the lesser
the uncertainty or the entropy
of a variable more
significant is that variable?
So when it comes to overcast
there's literally no impurity
in the data set.
It is a hundred percent
pure subset, right?
So be want variables like these
in order to build a model.
All right now,
we don't always Ways get lucky
and we don't always find
variables that will result
in pure subsets.
That's why we have
the measure entropy.
So the lesser the entropy of
a particular variable the most
significant that variable
will be so in a decision tree.
The root node is assigned
the best attribute
so that the decision tree
can predict the most
precise outcome meaning
that on the root note.
You should have the most
significant variable.
All right, that's why
we've chosen Outlook
or and now some of you might ask
me why haven't you chosen
overcast Okay is overcast
is not a variable.
It is a value
of the Outlook variable.
All right.
That's why we've chosen
our true cure
because it has a hundred
percent pure subset
which is overcast.
All right.
Now the question in your head is
how do I decide which variable
or attribute best Blitz
the data now right now,
I know I looked at the data
and I told you that,
you know here we have
a hundred percent pure subset,
but what if it's
a more complex problem
and you're not able
to understand which variable
will best split the data,
so guys when it comes
to decision tree
Information and gain
and entropy will help
you understand which variable
will best split the data set.
All right, or which variable you
have to assign to the root node
because whichever variable
is assigned to the root node.
It will best let the data set
and it has to be the most
significant variable.
All right.
So how we can do this
is we need to use
Information Gain and entropy.
So from the total
of the 14 instances
that we saw nine
of them said yes and five
of the instances said know
that you cannot play
on that particular day.
All right.
So how do you
calculate the entropy?
So this is the formula
you just substitute
the values in the formula.
So when you substitute
the values in the formula,
you will get a value of 0.9940.
All right.
This is the entropy
or this is the uncertainty
of the data present in a sample.
Now in order to ensure
that we choose the best variable
for the root node.
Let us look at all
the possible combinations
that you can use
on the root node.
Okay, so these are All
the possible combinations
you can either have
Outlook you can have
windy humidity or temperature.
Okay, these are four variables
and you can have any one
of these variables
as your root note.
But how do you select
which variable best
fits the root node?
That's what we are going
to see by using
Information Gain and entropy.
So guys now the task at hand
is to find the information gain
for each of these attributes.
All right.
So for Outlook for windy for
humidity and for temperature,
we're going to find
out the information.
Nation gained all right.
Now a point to remember is
that the variable
that results in the highest
Information Gain must be chosen
because it will give us the most
precise and output information.
All right.
So the information gain for
attribute windy will calculate
that first here.
We have six instances of true
and eight instances of false.
Okay.
So when you substitute all
the values in the formula,
you will get a value
of zero point zero four eight.
So we get a value
of You 2.0 for it.
Now.
This is a very low value
for Information Gain.
All right, so the information
that you're going to get from
Windy attribute is pretty low.
So let's calculate
the information gain
of attribute Outlook.
All right, so from the total
of 14 instances,
we have five instances
with say Sunny for instances,
which are overcast
and five instances,
which are rainy.
All right for Sonny.
We have three yeses
and to nose for overcast we have
Or the for as yes for any
we have three years
and two nodes.
Okay.
So when you calculate
the information gain
of the Outlook variable
will get a value
of zero point 2 4 7 now compare
this to the information gain
of the windy attribute.
This value is
actually pretty good.
Right we have zero point 2 4 7
which is a pretty good value
for Information Gain.
Now, let's look
at the information gain
of attribute humidity
now over here.
We have seven instances
with say hi
and seven instances with same.
Right and under
the high Branch node.
We have three instances
with say yes,
and the rest for instances
would say no similarly
under the normal Branch.
We have one two, three,
four, five six seven
instances would say yes
and one instance with says no.
All right.
So when you calculate
the information gain
for the humidity variable,
you're going to get
a value of 0.15 one.
Now.
This is also
a pretty decent value,
but when you compare it
to the Information Gain,
Of the attribute Outlook it
is less right now.
Let's look at the information
gain of attribute temperature.
All right, so the temperature
can hold repeat.
So basically the temperature
attribute can hold
hot mild and cool.
Okay under hot.
We have two instances
with says yes and two instances
for no under mild.
We have four instances of yes
and two instances of no
and under col we have
three instances of yes
and one instance of no.
All right.
When you calculate
the information gain
for this attribute,
you will get a value
of zero point zero to nine,
which is again very less.
So what you can summarize
from here is if we look
at the information gain for each
of these variable will see
that for Outlook.
We have the maximum gain.
All right, we have
zero point two four seven,
which is the highest
Information Gain value
and you must always
choose a variable
with the highest Information
Gain to split the data
at the root node.
So that's why we assign
The Outlook variable
at the root node.
All right, so guys.
I hope this use case was clear.
If any of you have doubts.
Please keep commenting
those doubts now,
let's move on and look at what
exactly a confusion Matrix is
the confusion Matrix
is the last topic
for descriptive statistics
read after this.
I'll be running a short demo
where I'll be showing you
how you can calculate
mean median mode
and standard deviation variance
and all of those values
by using our okay.
So let's talk about
confusion Matrix now guys.
What is the confusion Matrix
now don't get confused.
This is not any complex
topic now confusion.
Matrix is a matrix
that is often used to describe
the performance of a model.
Right?
And this is specifically used
for classification models
or a classifier
and what it does is it
will calculate the accuracy
or it will calculate the
performance of your classifier
by comparing your actual results
and Your predicted results.
All right.
So this is what it looks
like to prosit
of true- and all of that.
Now this is a little confusing.
I'll get back to what
exactly true positive
to negative and all
of this stands for for now.
Let's look at an
example and let's try
and understand what exactly
confusion Matrix is.
So guys.
I made sure
that I put examples
after each and every topic
because it's important you
understand the Practical
part of Statistics.
All right statistics has
literally nothing to do
with Theory you need
to understand how Calculations
are done in statistics.
Okay.
So here what I've done is
let's look at a small use case.
Okay, let's consider
that your given data
about a hundred and sixty-five
patient's out of which hundred
and five patients have a disease
and the remaining 50 patients
don't have a disease.
Okay.
So what you're going to do is
you will build a classifier
that predicts by using
these hundred and sixty five
observations your feed all
of these 165 observations
to your classifier
and It will predict
the output every time
a new patients detail is fed
to the classifier right now
out of these 165 cases.
Let's say that
the classifier predicted.
Yes hundred and ten times
and no 55 times.
Alright, so yes
basically stands for yes.
The person has a disease
and no stands for know.
The person has
not have a disease.
All right, that's
pretty self-explanatory.
But yeah, so it predicted
that a hundred and ten times.
Patient has a disease
and 55 times that
nor the patient
doesn't have a disease.
However in reality only
hundred and five patients
in the samples have
the disease and 60 patients
who do not have
the disease, right?
So how do you calculate
the accuracy of your model?
You basically build
the confusion Matrix?
All right.
This is how the Matrix looks
like and basically denotes
the total number of observations
that you have
which is 165 in our case
actual denotes the actual use
in the data set
and predicted denotes
the predicted values
by the classifier.
So the actual value is no here
and the predicted
value is no here.
So your classifier
was correctly able
to classify 50 cases as no.
All right, since both
of these are no so 50
it was correctly able
to classify but 10
of these cases it
incorrectly classified meaning
that your actual value here
is no but you classifier
predicted it as yes or a
that's why this And over here
similarly it wrongly predicted
that five patients
do not have diseases
whereas they actually
did have diseases
and it correctly
predicted hundred patients,
which have the disease.
All right.
I know this is
a little bit confusing.
But if you look
at these values no,
no 50 meaning
that it correctly
predicted 50 values No
Yes means that it
wrongly predicted.
Yes for the values are it
was supposed to predict.
No.
All right.
Now what exactly is?
Is this true positive
to negative and all of that?
I'll tell you what
exactly it is.
So true positive are the cases
in which we predicted a yes
and they do not actually
have the disease.
All right, so it is
basically this value
already predicted a yes here,
even though they
did not have the disease.
So we have 10 true positives
right similarly true-
is we predicted know
and they don't have
the disease meaning
that this is correct.
False positive is be predicted.
Yes, but they do not actually
have the disease.
All right.
This is also known as type
1 error falls- is we predicted.
No, but they actually
do not have the disease.
So guys basically false negative
and true negatives are basically
correct classifications.
All right.
So this was confusion Matrix
and I hope this concept
is clear again guys.
If you have doubts,
please comment your doubt
in the comment section.
So guys that was
descriptive statistics now,
Before we go to probability.
I promised all
that will run a small demo
in our all right,
we'll try and understand
how mean median mode
works in our okay,
so let's do that first.
So guys again
what we just discussed so far
was descriptive statistics.
All right, next we're going
to discuss probability
and then we'll move
on to inferential statistics.
Okay in financial statistics is
basically the second
type of Statistics.
Okay now to make things
more clear of you,
let me just zoom in.
So guys it's always best
to perform practical
implementations in order
to understand the concepts
in a better way.
Okay, so here will be executing
a small demo that will show you
how to calculate the mean median
mode variance standard deviation
and how to study the variables
by plotting a histogram.
Okay.
Don't worry.
If you don't know
what a histogram is.
It's basically a frequency plot.
There's no big signs behind it.
Alright, this is
a very simple demo
but it also forms
a foundation that everything.
Machine learning algorithm
is built upon.
Okay, you can say
that most of the machine
learning algorithms actually
all the machine learning
algorithms and deep
learning algorithms have
this basic concept behind them.
Okay, you need to know
how mean median mode
and all of that is calculated.
So guys am using
the our language to perform this
and I'm running this
on our studio.
For those of you
who don't know our language.
I will leave a couple of links
in the description box.
You can go through those videos.
So what we're doing is we
are randomly generated.
Eating numbers and Miss
storing it in a variable
called data, right?
So if you want to see
the generated numbers
just to run the line data,
right this variable basically
stores all our numbers.
All right.
Now, what we're going
to do is we're going
to calculate the mean now.
All you have to do in our
is specify the word mean
along with the data
that you're calculating
the mean of and I
was assigned this whole thing
into a variable called mean
Just hold the mean
value of this data.
So now let's look at the mean
for that abuser function
called print and mean.
All right.
So our mean is around 5.99.
Okay.
Next is calculating the median.
It's very simple guys.
All you have to do is use
the function median
or write and pass the data as
a parameter to this function.
That's all you have to do.
So our provides functions
for each and everything.
All right statistics is
very easy when it comes to R
because R is basically
a statistical language.
Okay.
So all you have to do is
just name the function
and that function is Ready
in built in your art.
Okay, so your median
is around 6.4.
Similarly.
We will calculate the mode.
All right.
Let's run this function.
I basically created
a small function
for calculating the mode.
So guys, this is
our mode meaning
that this is the most
recurrent value right now.
We're going to calculate
the variance and the standard
deviation for that.
Again.
We have a function in are called
as we're all right.
All you have to do is pass
the data to that function.
Okay, similarly will calculate
the standard deviation,
which is basically
the square root of your variance
right now will Rent
the standard deviation, right?
This is our
standard deviation value.
Now.
Finally, we will just plot
a small histogram histogram
is nothing but it's
a frequency plot already in
show you how frequently
a data point is occurring.
So this is the histogram
that we've just created
it's quite simple in our
because our has a lot
of packages and a lot
of inbuilt functions
that support statistics.
All right.
It is a statistical language
that is mainly used by
data scientists or by data
and analysts and
machine learning Engineers
because they don't have
to student code these functions.
All they have to do is
they have to mention the name
of the function and pass
the corresponding parameters.
So guys that was the entire
descriptive statistics module
and now we will discuss
about probability.
Okay.
So before we understand
what exactly probability is,
let me clear out a very
common misconception people
often tend to ask
me this question.
What is the relationship between
statistics and probability?
So probability and statistics
are related fields.
All right.
So probability is
a mathematical method used
for statistical analysis.
Therefore we can say
that a probability and
statistics are interconnected
branches of mathematics
that deal with analyzing the
relative frequency of events.
So they're very
interconnected feels
and probability makes
use of statistics
and statistics makes use
of probability or a they're
very interconnected Fields.
So that is the relationship
between said It is
six and probability.
Now.
Let's understand what
exactly is probability.
So probability is the measure
of How likely an event
will occur to be more precise.
It is the ratio
of desired outcome
to the total outcomes.
Now, the probability
of all outcomes always sum up
to 1 the probability will always
sum up to 1 probability
cannot go beyond one.
Okay.
So either your probability
can be 0 or it can be 1
or it can In the form
of decimals like 0.5
to or 0.55 or it can be
in the form of 0.5 0.7 0.9.
But it's valuable always stay
between the range 0 and 1 okay,
another famous example
of probability is rolling
a dice example.
So when you roll a dice you get
six possible outcomes, right?
You get one two,
three four and five six phases
of a dies now each possibility
only has one outcome.
So what is the probability
that on rolling a dice?
You will get 3 the probability
is 1 by 6 right
because there's only one phase
which has the number 3 on it
out of six phases.
There's only one phase
which has the number three.
So the probability of getting 3
when you roll a dice
is 1 by 6 similarly.
If you want to find
the probability of getting
a number 5 again,
the probability is
going to be 1 by 6.
All right.
So all of this will sum up to 1.
All right, so guys,
this is exactly what Ability is
it's a very simple concept.
We all learnt it in 8 standard
onwards right now.
Let's understand the
different terminologies
that are related to probability.
Now that three terminologies
that you often come across
when we talk about probability.
We have something known
as the random experiment.
Okay.
It's basically an experiment
or a process for which
the outcomes cannot be
predicted with certainty.
All right.
That's why you use probability.
You're going to use probability
in order to predict the outcome
with Some sort
of certainty sample space is the
entire possible set of outcomes
of a random experiment
and event is one or more
outcomes of an experiment.
So if you consider the example
of rolling a dice now,
let's say that you want
to find out the probability
of getting a to
when you roll the dice.
Okay.
So finding this probability
is the random experiment
the sample space is basically
your entire possibility.
Okay.
So one two, three,
four five six Is are
there and out of that you
need to find the probability
of getting a 2 right?
So all the possible outcomes
will basically represent
your sample space gives a 1 to 6
are all your possible outcomes.
This represents your sample
space now event is one or more
outcome of an experiment.
So in this case
my event is to get a tattoo
when I roll a dice, right?
So my event is the probability
of getting a to
when I roll a dice,
so guys, this is basically
what random experiment samples.
All space and event
really means alright now,
let's discuss the different
types of events.
There are two types of events
that you should know
about there is disjoint
and non disjoint events.
Disjoint events are events
that do not have
any common outcome.
For example,
if you draw a single card
from a deck of cards,
it cannot be a king
and a queen correct
it can either be king
or it can be Queen now
a non disjoint events are events
that have common out.
For example a student
can get hundred marks
in statistics and hundred
marks in probability.
All right, and also the outcome
of a ball delivered
can be a no ball
and it can be a 6 right.
So this is what non
disjoint events are or n?
These are very simple
to understand right now.
Let's move on and look
at the different types
of probability distribution.
All right, I'll be discussing
the three main probability
distribution functions.
I'll be talking
about probability density.
Aaron normal distribution
and Central limit theorem.
Okay probability density
function also known
as PDF is concerned
with the relative likelihood
for a continuous random variable
to take on a given value.
Alright, so the PDF gives
the probability of a variable
that lies between
the range A and B.
So basically what you're trying
to do is you're going to try
and find the probability
of a continuous random variable
over a specified range.
Okay.
Now this graph denotes the PDF
of a continuous variable.
Now this graph is also known
as the bell curve right?
It's famously called
the bell curve
because of its shape and
the three important properties
that you need to know about
a probability density function.
Now the graph of a PDF
will be continuous
over a range this is
because you're finding
the probability that
a continuous variable lies
between the ranges A and B,
right the second property.
Is that the area bounded by By
the curve of a density function
and the x-axis is equal
to 1 basically the area
below the curve is equal
to 1 all right,
because it denotes probability
again the probability
cannot arrange more
than one it has to be
between 0 and 1
property number three is
that the probability
that our random variable
assumes a value between A
and B is equal to the area
under the PDF bounded
by A and B. Okay.
Now what this means,
is that the probability
You is denoted by
the area of the graph.
All right, so whatever value
that you get here,
which basically one
is the probability
that a random variable will lie
between the range A and B.
All right.
So I hope all
of you have understood the
probability density function.
It's basically the probability
of finding the value
of a continuous random variable
between the range A and B.
All right.
Now, let's look
at our next distribution,
which is normal
distribution now.
Normal distribution,
which is also known as
the gaussian distribution
is a probability distribution
that denotes the
symmetric property
of the mean right meaning
that the idea
behind this function.
Is that the data
near the mean occurs more
frequently than the data
away from the mean.
So what it means to say is
that the data around the mean
represents the entire data set.
Okay.
So if you just take
a sample of data
around the mean it can represent
the entire data set now similar
to Probability density function
the normal distribution appears
as a bell curve right now
when it comes
to normal distribution.
There are two important factors.
All right, we have the mean
of the population
and the standard deviation.
Okay, so the mean and the graph
determines the location
of the center of the graph,
right and the standard deviation
determines the height
of the graph.
Okay.
So if the standard deviation
is large the curve is going
to look something like this.
All right, it'll
be short and wide.
I'd and if the standard
deviation is small the curve
is tall and narrow.
All right.
So this was it
about normal distribution.
Now, let's look
at the central limit theorem.
Now the central
limit theorem states
that the sampling distribution
of the mean of any independent
random variable will be normal
or nearly normal
if the sample size
is large enough now,
that's a little confusing.
Okay.
Let me break it down for
you now in simple terms
if we had a large population
and be Why did it
in too many samples,
then the mean of all the samples
from the population will be
almost equal to the mean
of the entire population right?
Meaning that each of the sample
is normally distributed.
Right?
So if you compare the mean
of each of the sample,
it will almost be equal
to the mean of the population.
Right?
So this graph basically shows
a more clear understanding
of the central limit theorem red
you can see each sample here
and the mean of each sample.
Oil is almost along
the same line, right?
Okay.
So this is exactly
what the central limit theorem
States now the accuracy
or the resemblance to
the normal distribution depends
on two main factors, right?
So the first is the number
of sample points
that you consider.
All right,
and the second is the shape
of the underlying population.
Now the shape obviously depends
on the standard deviation
and the mean
of a sample, correct.
So guys the central
limit theorem basically states
that eats Bill will be normally
distributed in such a way
that the mean of each sample
will coincide with the mean
of the actual population.
All right in short terms.
That's what central
limit theorem States.
All right, and this holds true
only for a large data set mostly
for a small data set
and there are more deviations
when compared to a large
data set is because of
the scaling Factor, right?
The small is deviation
in a small data set will change
the value vary drastically,
but in a large data
set a small deviation
will not matter at all.
Now, let's move.
Vaughn and look
at our next topic
which is the different
types of probability.
This is a important topic
because most of your problems
can be solved by understanding
which type of probability should
I use to solve this problem?
Right?
So we have three important
types of probability.
We have marginal joint
and conditional probability.
So let's discuss each
of these now the probability of
an event occurring unconditioned
on any other event
is known as marginal.
Or unconditional probability.
So let's say that you want
to find the probability
that a card drawn is a heart.
All right.
So if you want to
find the probability
that a card drawn is a heart
The Profit will be 13 by 52
since there are
52 cards in a deck
and there are 13 hearts
in a deck of cards.
Right and there are
52 cards in a total deck.
So your marginal probability
will be 13 by 52.
That's about
marginal probability.
Now, let's understand
what is joint probability.
And now joint probability
is a measure of two events
happening at the same time.
Okay, let's say
that the two events are A and B.
So the probability of event A
and B occurring is
the intersection of A and B.
So for example,
if you want to
find the probability
that a card is a four and a red
that would be joint probability.
All right, because
you're finding a card
that is 4 and the card
has to be red in color.
So for the answer to this
would be to Biceps you do
because we have 1/2
in heart and we have 1/2
and diamonds, correct.
So both of these are red
and color therefore.
Our probability is to by 52
and if you further down
it is 1 by 26, right?
So this is what
joint probability is all
about moving on.
Let's look at what exactly
conditional probability is.
So if the probability
of an event or an outcome
is based on the occurrence
of a previous event
or an outcome.
Then you call it as
a conditional probability.
Okay.
So the conditional probability
of an event B is the probability
that the event will occur given
that an event a
has already occurred.
Right?
So if a and b are
dependent events,
then the expression
for conditional probability
is given by this.
Now this first term
on the left hand side,
which is p b of a is
basically the probability
of event B occurring
given that event a
has already occurred.
So like I said,
if a and b are dependent events
than this is the expression
but if a and b are
independent events,
and the expression
for conditional probability is
like this, right?
So guys P of A and B of B
is obviously the probability
of a and probability
of B right now,
let's move on now in order
to understand conditional
probability joint probability
and marginal probability.
Let's look at a small use case.
Okay now basically
we're going to Take a data set
which examines the salary
package and training
undergone my candidates.
Okay.
Now in this there are
60 candidates a without training
and forty five candidates,
which have enrolled
for Adder Acres training right.
Now the task here is you have
to assess the training
with a salary package.
Okay.
Let's look at this
in a little more depth.
So in total,
we have hundred and five
candidates out of which 60
of them have not enrolled
Frederick has training
and 45 of them have enrolled
for a deer Acres.
Inning.
All right.
This is the small survey
that was conducted
and this is the rating
of the package or the salary
that they got right?
So if you read through the data,
you can understand
there were five candidates
without Eddie record training
who got a very
poor salary package.
Okay.
Similarly, there are
30 candidates with
Ed Eureka training
who got a good package, right?
So guys, basically you're
comparing the salary package
of a person depending on
whether or not they've enrolled
for a A core training right?
This is our data set.
Now.
Let's look at our problem
statement find the probability
that a candidate
has undergone editor Acres
training quite simple,
which type of
probability is this.
This is marginal probability.
Right?
So the probability
that a candidate has undergone
Edge rakers training is
obviously 45 divided
by a hundred and five
since 45 is the number
of candidates with
Eddie record raining
and hundred and five is
the total number of candidates,
so you Value
of approximately 0.4 to or
I that's the probability
of a candidate
that has undergone a Judaica
straining next question
find the probability
that a candidate has attended
edger a constraining
and also has good package.
Now.
This is obviously a joint
probability problem, right?
So how do you
calculate this now?
Since our table is quite
formatted we can directly find
that people who have
gotten a good package
along with Eddie record
raining or 30, right?
So out of hundred and
five people 30 people
have education training
and a good package, right?
They specifically asking
for people with Ado Rekha
training remember that right?
The question is
find the probability
that a candidate has attended
editor Acres training
and also has a good package.
Alright, so we need
to consider two factors
that is a candidate who's
addenda deaderick has training
and who has a good package.
So clearly that number
is 30 30 divided by
total number of candidates,
which is 1 0 Five, right.
So here you get
the answer clearly.
Next we have
find the probability
that a candidate has
a good package given that he
has not undergone training.
Okay.
Now this is clearly
conditional probability
because here you're defining
a condition you're saying
that you want to find
the probability of a candidate
who has a good package given
that he's not undergone.
Any training, right?
The condition is that he's
not undergone any training.
All right.
So the number of people
who have not undergone
training are 60 and out
of that five of them have got
a good package, right?
So that's why this is Phi by 60
and not 5 by hundred and five
because here they
have clearly mentioned has
a good package given that he
has not undergone training.
You have to only consider people
who have not
undergone training, right?
So only five people
who have not undergone
training have gotten
a good package, right?
So 5 divided by 60 you get
a probability of around 208
which is pretty low, right?
Okay.
So this was all
about the different
types of probability.
Now, let's move on and look at
our last Topic in probability,
which is base theorem.
Now guys Bayes theorem is
a very important concept
when it comes
to statistics and probability.
It is majorly used
in knife bias algorithm.
Those of you who aren't aware.
Now I've bias is
a supervised learning
classification algorithm
and it is mainly Used
in Gmail spam filtering,
right a lot of you
might have noticed
that if you open up Gmail,
you'll see that you have
a folder called spam right
or that is carried out
through machine learning
and the algorithm used
there is knife bias, right?
So now let's discuss what
exactly the Bayes theorem is
and what it denotes
the bias theorem is used
to show the relation between
one conditional probability
and it's inverse.
All right, basically Nothing,
but the probability
of an event occurring based
on prior knowledge of conditions
that might be related
to the same event.
Okay.
So mathematically
the bell's theorem
is represented like this,
right like shown
in this equation.
The left-hand term is referred
to as the likelihood ratio,
which measures the probability
of occurrence of event B,
given an event a okay
on the left hand side is
what is known as
the posterior right
is referred to as posterior.
Are which means
that the probability
of occurrence of a given
an event B, right?
The second term is referred
to as the likelihood ratio
or a this measures the
probability of occurrence of B,
given an event a now P
of a is also known as the prior
which refers to the actual
probability distribution of A
and P of B is again,
the probability of B, right.
This is the bias theorem
in order to better
understand the base theorem.
Let's look at a small example.
Let's say that we Three balls we
have about a bowel be
and bouncy okay barley
contains two blue balls
and for red balls bowel
be contains eight blue balls
and for red balls baozi
contains one blue ball
and three red balls.
Now if we draw one ball
from each Bowl,
what is the probability to draw
a blue ball from a bowel a
if we know that we drew
exactly a total
of two blue balls right if you
didn't Understand the question.
Please.
Read it.
I shall pause
for a second or two.
Right.
So I hope all of you
have understood the question.
Okay.
Now what I'm going to do
is I'm going to draw
a blueprint for you
and tell you how exactly
to solve the problem.
But I want you all to give
me the solution
to this problem, right?
I'll draw a blueprint.
I'll tell you
what exactly the steps are
but I want you to come
up with a solution
on your own right the formula
is also given to you.
Everything is given to you.
All you have to do is come up
with the final answer.
Right?
Let's look at how you
can solve this problem.
So first of all,
what we will do is
Let's consider a all right,
let a be the event
of picking a blue ball
from bag in and let
X be the event of picking
exactly two blue balls,
right because these
are the two events
that we need to calculate
the probability of now
there are two probabilities
that you need to consider here.
One is the event of picking
a blue ball from bag a
and the other is the event of
picking exactly two blue balls.
Okay.
So these two are represented
by a and X respectively Lee
so what we want is
the probability of occurrence
of event a given X,
which means that given
that we're picking
exactly two blue balls,
what is the probability
that we are picking
a blue ball from bag?
So by the definition
of conditional probability,
this is exactly what
our equation will look like.
Correct.
This is basically a occurrence
of event a given an event X
and this is
the probability of a and x
and this is the probability
of X alone, correct?
And what we need to do
is we need to find
these two probabilities
which is probability of a
and X occurring together
and probability of X. Okay.
This is the entire solution.
So how do you find P probability
of X this you can do
in three ways.
So first is white ball
from a either white from be
or read from see now first is
to find the probability of x x
basically represents the event
of picking exactly
two blue balls.
Right.
So these are the three ways
in which it is possible.
So you'll pick one blue ball
from bowel a and one from bowel
be in the second case.
You can pick one
from a and another blue ball
from see in the third case.
You can pick a blue
ball from Bagby
and a blue ball from bagsy.
Right?
These are the three ways
in which it is possible.
So you need to find
the probability of each
of this step two is
that you need to find
the probability of a
and X occurring together.
This is the sum
of terms 1 and 2.
Okay, this is
because in both
of these events,
we are picking a ball
from bag, correct.
So there is find out
this probability and let
me know your answer
in the comment section.
All right.
We'll see if you get
the answer right?
I gave you the entire
solution to this.
All you have to do is
substitute the value right?
If you want a second or two,
I'm going to pause on the screen
so that you can go through this
in a more clear away.
Right?
Remember that you need
to calculate two.
Tease the first probability
that you need to calculate is
the event of picking a blue ball
from bag a given
that you're picking
exactly two blue balls.
Okay, II probability
you need to calculate
is the event of picking
exactly two blue bonds.
All right.
These are the two probabilities.
You need to calculate so
remember that and this
is the solution.
All right, so guys make sure
you mention your answers
in the comment section for now.
Let's move on and Look
at our next topic,
which is the
inferential statistics.
So guys, we just completed the
probability module right now.
We will discuss
inferential statistics,
which is the second
type of Statistics.
We discussed descriptive
statistics earlier.
Alright, so like I
mentioned earlier inferential
statistics also known as
statistical inference is
a branch of Statistics
that deals with forming
inferences and predictions
about a population based
on a sample of data.
Are taken from the population.
All right, and the question
you should ask is
how does one form inferences
or predictions on a sample?
The answer is you
use Point estimation?
Okay.
Now you must be wondering
what is point estimation
one estimation is concerned
with the use of the sample data
to measure a single value
which serves as
an approximate value
or the best estimate of
an unknown population parameter.
That's a little confusing.
Let me break it down
to you for Camping
in order to calculate the mean
of a huge population.
What we do is we first draw out
the sample of the population
and then we find the sample mean
right the sample mean
is then used to estimate
the population mean this is
basically Point estimate,
you're estimating the value
of one of the parameters
of the population, right?
Basically the main
you're trying to estimate
the value of the mean.
This is what point estimation is
the two main terms
in point estimation.
There's something known as
as the estimator
and the something known
as the estimate estimator
is a function of the sample
that is used to find
out the estimate.
Alright in this example.
It's basically the sample
mean right so a function
that calculates the sample
mean is known as the estimator
and the realized value
of the estimator is
the estimate right?
So I hope Point
estimation is clear.
Now, how do you
find the estimates?
There are four common ways
in which you can do this.
The first one is method
of Moment you'll
what you do is
you form an equation
in the sample data set
and then you analyze
the similar equation
in the population data
set as well like
the population mean
population variance and so on.
So in simple terms,
what you're doing is you're
taking down some known facts
about the population
and you're extending
those ideas to the sample.
Alright, once you do that,
you can analyze the sample
and estimate more
essential or more complex
values right next.
We have maximum likelihood.
But this method basically uses
a model to estimate a value.
All right.
Now a maximum likelihood
is majorly based on probability.
So there's a lot of probability
involved in this method next.
We have the base estimator
this works by minimizing
the errors or the average risk.
Okay, the base estimator
has a lot to do
with the Bayes theorem.
All right, let's
not get into the depth
of these estimation methods.
Finally.
We have the best unbiased
estimators in this method.
There are seven unbiased
estimators that can be used
to approximate a parameter.
Okay.
So Guys these were
a couple of methods
that are used
to find the estimate
but the most well-known method
to find the estimate is known as
the interval estimation.
Okay.
This is one of the most
important estimation
methods or at this is
where confidence interval also
comes into the picture right
apart from interval estimation.
We also have something
known as margin of error.
So I'll be discussing
all of this.
In the upcoming slides.
So first let's understand.
What is interval estimate?
Okay, an interval
or range of values,
which are used to estimate a
population parameter is known as
an interval estimation, right?
That's very understandable.
Basically what they're trying to
see is you're going to estimate
the value of a parameter.
Let's say you're trying to find
the mean of a population.
What you're going to do is
you're going to build a range
and your value will lie in
that range or in that interval.
All right.
So this way your output
is going to be more accurate
because you've not predicted
a point estimation instead.
You have estimated an interval
within which your value
might occur, right?
Okay.
Now this image clearly shows
how Point estimate and interval
estimate or different.
So where's interval estimate
is obviously more accurate
because you're not just focusing
on a particular value
or a particular point
in order to predict
the probability instead.
You're saying that
the value might be
within this range between
the lower confidence limit
and the upper confidence limit.
All right, this is denotes
the range or the interval.
Okay, if you're still confused
about interval estimation,
let me give you a small example
if I stated that I will take
30 minutes to reach the theater.
This is known
as Point estimation.
Okay, but if I stated
that I will take
between 45 minutes
to an hour to reach the theater.
This is an example
of Will estimation all right.
I hope it's clear.
Now now interval estimation
gives rise to two important
statistical terminologies one
is known as confidence interval
and the other is known
as margin of error.
All right.
So there's it's important
that you pay attention
to both of these terminologies
confidence interval is one
of the most significant measures
that are used to check
how essential machine
learning model is.
All right.
So what is confidence interval
confidence interval is
the measure of your confidence
that the interval
estimated contains
the population parameter
or the population mean
or any of those parameters
right now statisticians
use confidence interval
to describe the amount
of uncertainty associated
with the sample estimate of
a population parameter now guys,
this is a lot of definition.
Let me just make you
understand confidence interval
with a small example.
Okay.
Let's say that you perform
a survey and you survey
a group of cat owners.
The see how many cans of cat
food they purchase in one year.
Okay, you test
your statistics at the 99
percent confidence level
and you get
a confidence interval
of hundred comma 200 this means
that you think
that the cat owners
by between hundred to two
hundred cans in a year and also
since the confidence
level is 99% shows
that you're very confident
that the results are, correct.
Okay.
I hope all of you
are clear with that.
Alright, so your confidence
interval here will be
a hundred and two hundred
and your confidence level
will be 99% Right?
That's the difference
between confidence interval
and confidence level So
within your confidence interval
your value is going to lie and
your confidence level will show
how confident you are
about your estimation, right?
I hope that was clear.
Let's look at margin of error.
No margin of error
for a given level of confidence
is a greatest possible distance
between the Point estimate
and the value of the parameter
that it is estimating
you can say
that it is a deviation from
the actual point estimate right.
Now.
The margin of error
can be calculated
using this formula now zc
her denotes the critical value
or the confidence interval
and this is X standard
deviation divided by root
of the sample size.
All right, n is basically
the sample size now,
let's understand how
you can estimate
the confidence intervals.
So guys the level of confidence
which is denoted by
C is the probability
that the interval estimate
contains a population parameter.
Let's say that you're trying
to estimate the mean.
All right.
So the level of confidence
is the probability
that the interval estimate
contains a population parameter.
So this interval
between minus Z and z
or the area beneath this curve
is nothing but the probability
that the interval estimate
contains a population parameter.
You don't all right.
It should basically
contain the value
that you are predicting right.
Now.
These are known
as critical values.
This is basically
your lower limit
and your higher
limit confidence level.
Also, there's something
known as the Z score now.
This court can be calculated
by using the standard
normal table, right?
If you look it up anywhere
on Google you'll find
the z-score table
or the standard normal
table get to understand
how this is done.
Let's look at a small example.
Okay, let's say that the level
of Vince is 90% This means
that you are 90% confident
that the interval contains
the population mean.
Okay, so the remaining 10%
which is out of hundred percent.
The remaining 10%
is equally distributed
on these Dale regions.
Okay, so you have 0.05 here
and 0.05 over here, right?
So on either side
of see you will distribute
the other leftover percentage
now these these scores
are calculated from the table
as I mentioned before.
All right one.
N64 5 is get collated
from the standard normal table.
Okay.
So guys how you estimate
the level of confidence.
So to sum it up.
Let me tell you the steps
that are involved
in constructing a
confidence interval first.
You'll start by identifying
a sample statistic.
Okay.
This is the statistic
that you will use to estimate
a population parameter.
This can be anything
like the mean
of the sample next you
will select a confidence level
now the confidence level
describes the uncertainty
of a Sampling method right
after that you'll find
something known as the margin
of error, right?
We discuss margin
of error earlier.
So you find this based
on the equation
that I explained
in the previous slide,
then you'll finally specify
the confidence interval.
All right.
Now, let's look
at a problem statement
to better understand
this concept a random sample
of 32 textbook prices is taken
from a local College Bookstore.
The mean of the sample is so so
and so and the sample
standard deviation is
This use a 95% confident level
and find the margin
of error for the mean price
of all text books
in the bookstore.
Okay.
Now, this is a very
straightforward question.
If you want you can read
the question again.
All you have to do is you have
to just substitute the values
into the equation.
All right, so guys,
we know the formula for margin
of error you take the Z score
from the table.
After that we have deviation
Madrid's 23.4 for right
and that's standard deviation
and n stands for the number
of samples here.
The number of samples is
32 basically 32 textbooks.
So approximately your margin
of error is going to be
around 8.1 to this is
a pretty simple question.
All right.
I hope all of you
understood this now
that you know,
the idea behind
confidence interval.
Let's move ahead to one
of the most important topics
in statistical inference,
which is hypothesis
testing, right?
So Sigelei statisticians
use hypothesis testing
to formally check
whether the hypothesis
is accepted or rejected.
Okay, hypothesis.
Testing is an inferential
statistical technique
used to determine
whether there is enough evidence
in a data sample to infer
that a certain condition holds
true for an entire population.
So to understand
the characteristics
of a general population,
we take a random sample,
and we analyze the properties
of the sample right we test.
Whether or not the identified
conclusion represent
the population accurately
and finally we interpret
their results now
whether or not to accept
the hypothesis depends
upon the percentage value
that we get from the hypothesis.
Okay, so to
better understand this,
let's look at a small
example before that.
There are few steps
that are followed in hypothesis,
testing you begin
by stating the null
and the alternative hypothesis.
All right.
I'll tell you what
exactly these terms are
and then you formulate.
Analysis plan right after that
you analyze the sample data
and finally you can
interpret the results
right now to understand
the entire hypothesis testing.
We look at a good example.
Okay now consider
for boys Nick jean-bob
and Harry these boys
were caught bunking a class
and they were asked
to stay back at school
and clean the classroom
as a punishment, right?
So what John did is he decided
that four of them would take
turns to clean their classrooms.
He came up with a plan
of writing each of their names
on chits and putting them
in a bout now every day.
They had to pick up
a name from the bowel
and that person had to play
in the clock, right?
That sounds pretty fair enough
now it is been three days
and everybody's name has come up
except John's assuming
that this event
is completely random
and free of bias.
What is a probability
of John not treating
right or is the probability
that he's not actually
cheating this can Solved
by using hypothesis testing.
Okay.
So we'll Begin by calculating
the probability of John
not being picked for a day.
Alright, so we're
going to assume
that the event is free of bias.
So we need to find
out the probability
of John not cheating right
first we'll find the probability
that John is not picked
for a day, right?
We get 3 out of 4,
which is basically 75%
75% is fairly high.
So if John is not picked
for three days in a row
the Probability will drop down
to approximately 42% Okay.
So three days in a row meaning
that is the probability
drops down to 42 percent.
Now, let's consider a situation
where John is not picked
for 12 days in a row
the probability drops down
to Tea Point two percent.
Okay, that's the probability
of John cheating becomes
fairly high, right?
So in order
for statisticians to come
to a conclusion,
they Define what is known
as the threshold value.
Right considering
the above situation
if the threshold value
is set to 5 percent.
It would indicate
that if the probability lies
below 5% then John is cheating
his way out of detention.
But if the probability is
about threshold value then John
it just lucky and his name
isn't getting picked.
So the probability
and hypothesis testing give rise
to two important components
of hypothesis testing,
which is null hypothesis
and alternative hypothesis.
Null.
Hypothesis is based.
Basically approving
the Assumption alternate
hypothesis is
when your result disapproves
the Assumption right therefore
in our example,
if the probability
of an event occurring
is less than 5% which it is
then the event is biased hence.
It proves the
alternate hypothesis.
Undoubtedly machine learning is
the most in-demand technology
in today's market.
It's applications.
From Seth driving cause
to predicting deadly diseases
such as ALS the high demand
for machine learning skills
is the motivation
behind today's session.
So let me discuss
the agenda with you first.
Now, we're going
to begin the session
by understanding the need
for machine learning and why
it is important after that.
We look at what exactly
machine learning is
and then we'll discuss a couple
of machine learning definitions.
Once we're done with that.
We'll look at the
machine learning process
and how you can solve
a problem by using Using
the machine learning process
next we will discuss the types
of machine learning
which includes
supervised unsupervised
and reinforcement learning.
Once we're done with that.
We'll discuss the different
types of problems
that can be solved by
using machine learning.
Finally.
We will end this session
by looking at a demo
where we'll see how you
can perform weather forecasting
by using machine learning.
All right, so guys,
let's get started
with our first topic.
So what is the importance
or what is the need
for machine learning now?
Since the technical Revolution,
we've been generating
an immeasurable amount
of data as for research
with generating around
2.5 quintillion bytes
of data every single day
and it is estimated
that by 2020 1.7 MB of data
will be created every second
for every person on earth.
Now that is a lot
of data right now.
This data comes
from sources such as
the cloud iot devices social
media and all of that.
Since all of us
are very interested
in the internet right now
with generating a lot of data.
All right, you have no idea
how much data we generate
through social media
all the chatting
that we do and all the images
that we post
on Instagram the videos
that we watch all of this
generates a lot of data.
Now how does machine
learning fit into all of this
since we're producing
this much data,
we need to find a method
that can analyze process
and interpret this much data.
All right, and we
need to find a method.
That can make sense out of data.
And that method
is machine learning.
Now the lot
of talk tire companies
and data driven company
such as Netflix and Amazon
which build machine learning
models by using tons of data
in order to identify
any profitable opportunities.
And if they want to avoid
any unwanted risk it make use
of machine learning.
Alright, so through machine
learning You can predict risk
You can predict profits you
can identify opportunities,
which will help you
grow your business.
Business so now I'll show you
a couple of examples of where
in machine learning is used.
All right, so I'm sure all of
you have been watch on Netflix.
Now the most important thing
about Netflix is
its recommendation engine.
All right.
Most of Netflix's Revenue comes
from its recommendation engine.
So the recommendation engine
basically studies the movie
viewing patterns of its users
and then recommends
relevant movies to them.
All right, it recommends movies
depending on users interests.
Depending on the type
of movies the user
watches and all of that.
Alright, so that is
how Netflix uses
machine learning.
Next.
We have Facebook's
Auto tagging feature.
Now the logic behind Facebook's
Auto tagging feature
is machine learning
and neural networks.
I'm not sure how many
of you know this but Facebook
makes use of deepmind
face verification system,
which is based
on machine learning
natural language processing
and neural networks.
So deep mine basically
studies the facial features
in an image and it tag
your friends and family.
Another such example is
Amazon's Alexa now Alexa
is basically an advanced
level virtual assistant
that is based
on natural language processing
and machine learning.
Now, it can do more
than just play music for you.
All right, it can book
your Uber it can connect
with other I/O devices
that your house it
can track your health.
It can order food
online and all of that.
So data, and machine learning
are basically the main factors
behind Alex has power
another such example is
the Google spam filter.
So guys Gmail basically
makes use of machine learning
to filter out spam messages.
If any of you just
open your Gmail inbox,
you'll see that there
are separate sections.
There's one for primary
this social the spam
and the Joe general made now
basically Gmail makes use
of machine learning algorithms
and natural language processing
to an Is emails in real time
and then classify
them as either spam
or non-spam now,
this is another famous
application of machine learning.
So to sum this up,
let's look at a few reasons.
Why machine learning
is so important.
So the first reason
is obviously increase
in data generation.
So because of excessive
production of data,
we need a method
that can be used to structure
and lies and draw
useful insights from data.
This is where machine learning
comes as in it uses data
to solve problems
and find solutions
to the most complex tasks
faced by organizations.
Another important reason is that
it improves decision-making.
So by making use of various
algorithms machine learning
can be used to make
Better Business decisions.
For example machine learning
is used to forecast sales.
It is used to predict any
downfalls in the stock market.
It is used to identify
risks anomalies and so
on now the next reason
Is it uncovers patterns
and Trends in data finding
hidden patterns and extracting
key insights from data is
the most essential part
of machine learning.
So by building predictive models
and using statistical
techniques machine learning
allows you to dig
beneath the surface
and explore the data
at a minut scale
now understanding data
and extracting patterns manually
will take a lot of days.
Now, if you do this through
machine learning algorithms,
you can perform
such computations.
Nations in less than a second.
Another reason is
that it's solved
complex problems.
So from detecting genes
that are linked
to deadly ALS disease
is to building self-driving cars
and building phase detection
systems machine learning
can be used to solve
the most complex problems.
So guys now that you know,
why machine learning
is so important.
Let's look at what exactly
machine learning is.
The term machine learning
was first coined by
Arthur Samuel in the year
1959 now looking back
that your was probably
the most significant in terms
of technological advancements.
There is if you browse
through the net about
what is machine learning
you'll get at least
a hundred different definitions.
Now the first and very formal
definition was given by Tom
and Mitchell now,
the definition says
that a computer program is set
to learn from experience e
with respect to some class.
Of caste and
performance measure P
if its performance at tasks in D
as measured by P improves
with experience e all right.
Now I know this is
a little confusing.
So let's break it down
into simple words.
Now in simple terms
machine learning is a subset
of artificial intelligence
which provides machines the
ability to learn automatically
and improve from experience
without being explicitly
programmed to do
so in the sense.
It is the practice of getting
machines to solve problems
by gaining the ability
to think but wait now
how can a machine think
or make decisions?
Well, if you feel a machine
a good amount of data,
it will learn
how to interpret process
and analyze this data by using
machine learning algorithm.
Okay.
Now guys, look
at this figure on top.
Now this figure basically shows
how a machine learning algorithm
or how the machine learning
process really works.
So the machine learning Begins
by feeding the machine lots
and lots of data okay
by using this data.
The machine is trained to detect
hidden insights and Trends.
Now these insights
are then used to build
a machine learning model
by using an algorithm
in order to solve a problem.
Okay.
So basically you're
going to feed a lot
of data to the machine.
The machine is going to get
trained by using this data.
It's going to use this data
and it's going to
draw useful insights
and patterns from it,
and then it's going
to build a model by Using
machine learning algorithms.
Now this model will help
you predict the outcome
or help you solve
any complex problem
or any business problem.
So that's a simple explanation
of how machine learning works.
Now, let's move on and look
at some of the most commonly
used machine learning terms.
So first of all,
we have algorithm.
Now, this is
quite self-explanatory.
Basically algorithm
is a set of rules
or statistical techniques,
which are used to learn
patterns from data now
an algorithm is The logic
behind a machine learning model.
All right, an example
of a machine learning
algorithm is linear regression.
I'm not sure how many of you
have heard of linear regression.
It's the most simple and basic
machine learning algorithm.
All right.
Next we have model now
model is the main component
of machine learning.
All right.
So model will basically map
the input to your output
by using the machine learning
algorithm and by using the data
that you're feeding the machine.
So basically the model is
a representation of the entire
machine learning process.
So the model is
basically fed input
which has a lot of data
and then it will output
a particular result
or a particular outcome by using
machine learning algorithms.
Next we have something
known as predictor variable.
Now predictor variable
is a feature of the data
that can be used
to predict the output.
So for example, let's say
that you're trying to predict
the weight of a person depending
on the person's height
and their age.
All right.
So over here the predictor
variables are your height
and your age
because you're using
height and age of a person
to predict the person's weight.
Alright, so the height
and the A's are
the predictor variables now,
Wait on the other hand
is the response
or the target variable.
So response variable is
a feature or the output variable
that needs to be predicted by
using the predictor variables.
All right,
after that we have something
known as training data.
So guys the data
that is fed to a machine
learning model is always split
into two parts first.
We have the training data
and then we have
the testing data now training
data is basically used to build
the machine learning model.
So usually training data
is much larger.
Than the testing data
because obviously
if you're trying to train
the machine then you're going
to feed it a lot more data.
Testing data is just used
to validate and evaluate
the efficiency of the model.
Alright, so that was training
data and testing data.
So Guys, these were a few terms
that I thought you should know
before we move any further.
Okay.
Now, let's move on and discuss
the machine learning process.
Now, this is going
to get very interesting
because I'm going
to give you an example
and make you understand
how the machine learning.
process works So first of all,
let's define
the different stages
or the different steps involved
in the machine learning process.
So machine learning
process always begins
with defining the objective
or defining the problem
that you're trying to solve
next is is data Gathering
or data collection.
Now the data that you
need to solve this problem
is collected at this stage.
This is followed
by data preparation
or data processing after that.
You have data
exploration and Analysis.
Isis and the next
stage is building
a machine learning model.
This is followed
by model evaluation.
And finally you have
prediction or your output.
Now, let's try to understand
this entire process
with an example.
So our problem statement here
is to predict the possibility
of rain by studying
the weather conditions.
So let's say
that you're given
a problem statement
and you're asked to use
a machine learning process
to solve this problem statement.
So let's get started.
Alright, so the first step
is to Find the objective
of the problem statement.
Our objective here is
to predict the possibility
of rain by studying
the weather conditions.
Now in the first stage
of a machine learning process.
You must understand
what exactly needs
to be predicted.
Now in our case the objective
is to predict the possibility
of rain by studying
weather conditions, right?
So at this stage,
it is also essential to take
mental notes on what kind
of data can be used
to solve this problem
or the type of approach
that you can follow to get.
Get to the solution.
All right, a few questions
that are worth asking
during this stage is
what are we trying to predict?
What are the Target features
or what are
the predictor variables?
What kind of input
data do we need?
And what kind
of problem are we facing?
Is it a binary classification
problem or is it
a clustering problem
now, don't worry.
If you don't know
what classification
and clustering is
I'll be explaining this
in the upcoming slides.
So guys this was the first step
of a machine learning process,
which is Define
the Double the problem.
All right.
Now, let's move on and look
at step number two.
So step number two is
basically data collection
or data Gathering
now at this stage.
You must be asking questions
such as what kind of data
is needed to solve the problem
is the data available and
if it is available,
how can I get the data?
Okay.
So once you know the type
of data that is required,
you must understand
how you can derive
this data data collection
can be done manually
or by web scraping,
but if you're a beginner Nor
and you're just looking to learn
machine learning you don't have
to worry about getting the data.
OK there are thousands
of data resources on the web.
You can just go ahead
and download the datasets
from websites such as kaggle.
Okay, now coming
back to the problem
at hand the data needed
for weather forecasting includes
measures such as humidity level
temperature pressure locality
whether or not you live
in a hill station
and so on so guys
such data must be collected
and stored for analysis.
Now the next stage
in machine learning
is preparing your data
the data you collected is almost
never in the right format.
So basically you'll encounter
a lot of inconsistencies
in the data set.
Okay, this includes
missing values redundant
variables duplicate values
and so on removing
such values is very important
because they might lead
to wrongful computations
and predictions.
So that's why at this stage you
must can the entire data set
for any inconsistencies.
You have to fix them
at this stage.
Now.
The next step is
exploratory data analysis.
Now data analysis is
all about diving deep
into data and finding all
the hidden data Mysteries.
Okay.
This is where you
become a detective.
So edu or exploratory data
analysis is like a brainstorming
of machine learning
data exploration involves
understanding the patterns
and the trends in your data.
So at this stage all
the useful insights are drawn
and all the correlations.
Turns between the
variables are understood.
So you might ask what sort
of correlations are
you talking about?
For example in the case
of predicting rain fall.
We know that there is
a strong possibility of rain
if the temperature
has fallen low.
Okay.
So such correlations
have to be understood
and mapped at this stage.
Now.
This stage is followed
by stage number 5,
which is building
a machine learning model.
So all the insights
and the patterns
that you derive
during data exploration are used
to build the machine learning.
So this stage always Begins
by splitting the data set
into two parts training data
and the testing data.
So earlier in the session.
I already told you what training
and testing data is
now the training data
will be used to build
and analyze the model
and the logic of the model
will be based on the machine
learning algorithm
that is being implemented.
Okay.
Now in the case
of predicting rainfall
since the output will be
in the form of true
or false we can use
a classification algorithm
like logistically.
Regression now choosing
the right algorithm depends
on the type of problem.
You're trying to solve
the data set you have
and the level of complexity
of the problem.
So in the upcoming sections will
be discussing different types
of problems that can be solved
by using machine learning.
So don't worry.
If you don't know
what classification algorithm is
and what logistic regression in.
Okay.
So all you need to know
is at this stage,
you'll be building
a machine learning model
by using machine
learning algorithm
and by using the training
data set the next
But in on machine learning
process is model evaluation
and optimization.
So after building a model
by using the training data set
it is finally time to put
the model to a test.
Okay.
So the testing data set
is used to check the efficiency
of the model and how accurately
it can predict the outcome.
So once you calculate
the accuracy any improvements
in the model have
to be implemented in this stage.
Okay, so methods like parameter
tuning and cross-validation
can be used to improve
the The performance
of the model this is followed
by the last stage,
which is predictions.
So once the model is evaluated
and improved it is finally
used to make predictions.
The final output can be
a categorical variable
or it can be a continuous
quantity in our case
for predicting the occurrence
of rainfall the output
will be a categorical variable
in the sense.
Our output will be
in the form of true or false.
Yes or no.
Yes, basically represents
that is going to rain
and no will represent that.
It wondering okay
as simple as that,
so guys that was the entire
machine learning process.
A linear regression is one
of the easiest algorithm
in machine learning.
It is a statistical model
that attempts to
show the relationship
between two variables.
So the linear equation,
but before we drill down
to linear regression
algorithm in depth,
I'll give you a quick overview
of today's agenda.
So we'll start a session
with a quick overview
of what is regression
as linear regression
is one of a type
of regression algorithm.
Once we learn about regression,
its use case the various
types of it next.
We'll learn about the algorithm
from scratch where I live
To its mathematical
implementation first,
then we'll drill down
to the coding part
and Implement linear
regression using python
in today's session will deal
with linear regression
algorithm using least Square
method checketts goodness of fit
or how close the data is
to the fitted regression line
using the R square method
and then finally
what we'll do well
optimized it using
the gradient descent method
in the last part
on the coding session.
I'll teach you to implement
linear regression using Python
and the coding session.
Would be divided into two parts
the first part would consist
of linear regression
using python from scratch
where you will use
the mathematical algorithm
that you have learned
in this session.
And in the next part
of the coding session
will be using scikit-learn
for direct implementation
of linear regression.
All right.
I hope the agenda is clear
to you guys are like
so let's begin our session
with what is regression.
Well regression analysis
is a form of predictive
modeling technique
which investigates
the relationship between
a dependent and independent.
Able a regression analysis
involves graphing a line
over a set of data points
that most closely fits
the overall shape of the data
or regression shows the changes
in a dependent variable
on the y-axis
to the changes
in the explanatory variable
on the x-axis fine.
Now you would ask
what are the uses of regression?
Well, they are major three uses
of regression analysis
the first being determining
the strength of predicator,
's the regression
might be used to identify
the strength of the effect
that the independent.
Variables have on
the dependent variable.
For example, you
can ask question.
Like what is the strength
of relationship between sales
and marketing spending or what
is the relationship between age
and income second is forecasting
an effect in this the regression
can be used to forecast effects
or impact of changes.
That is the regression analysis
help us to understand
how much the dependent variable
changes with the change
in one or more
independent variable fine.
For example, you can ask
question like how Additional
seal income will I get
for each thousand dollars spent
on marketing third
is Trend forecasting
in this the regression analysis
to predict Trends
and future values.
The regression analysis
can be used to get
Point estimates in this
you can ask questions.
Like what will be
the price of Bitcoin
and next six months, right?
So next topic is linear versus
logistic regression by now.
I hope that you know,
what a regression is.
So let's move on
and understand its type.
So there are various kinds
of regression like linear.
Session logistic regression
polynomial regression
and others.
All right, but for this session
will be focusing on linear
and logistic regression.
So let's move on and let me tell
you what is linear regression.
And what is logistic regression
then what we'll do
we'll compare both of them.
All right.
So starting with
linear regression
in simple linear regression.
We are interested in things
like y equal MX plus C.
So what we are trying to find
is the correlation between X
and Y variable this means
that every value of X has
a corresponding value of y in it
if it is continuous.
I like however
in logistic regression we
are not fitting our data
to a straight line
like linear regression instead
what we are doing.
We are mapping Y versus X
to a sigmoid function
in logistic regression.
What we find out is is y 1 or 0
for this particular value of x
so thus we are essentially
deciding true or false value
for a given value of x fine.
So as a core concept
of linear regression You can say
that the data is modeled
using a straight line
where in the case
of logistic regression
the data is model using
a sigmoid function.
The linear regression is used
with continuous variables
on the other hand
the logistic regression.
It is used with categorical
variable the output
or the prediction
of a linear regression
is the value of the variable
on the other hand
the output of production
of a logistic regression
is the probability
of occurrence of the event.
Now, how will you
check the accuracy
and goodness of fit in case
of linear regression?
We are various methods.
Take measured by loss r squared
adjusted r squared Etc
while in the case
of logistic regression you
have accuracy precision
recall F1 score,
which is nothing but
the harmonic mean of precision
and recall next is Roc curve
for determining the probability
threshold for classification
or the confusion Matrix Etc.
There are many all right.
So summarizing the difference
between linear and
logistic regression.
You can say that the type
of function you are mapping
to is the main point
of difference between linear
and regression a linear
regression Maps a continuous X2
a continuous fi
on the other hand a logistic
regression Maps a continuous x
to the bindery why
so we can use logistic
regression to make category
or true false decisions
from the data find
so let's move on ahead.
Next is linear
regression selection criteria,
or you can say when will
you use linear regression?
So the first is classification
and regression capabilities
regression models predict
a continuous variable
such as the Don't a day
or predict the temperature
of a city their Reliance
on a polynomial
like a straight line
to fit a data set
poses a real challenge
when it comes towards building
a classification capability.
Let's imagine that you fit
a line with the training points
that you have now imagine you
add some more data points to it.
But in order to fit it,
what do you have to do?
You have to change
your existing model
that is maybe you have
to change the threshold itself.
So this will happen
with each new data point you add
to the model, hence.
The linear regression is
not good for classification.
All's fine.
Next is data quality
each missing value removes
one data point that could
optimize the regression
in simple linear regression.
The outliers can significantly
disrupt the outcome
just for now.
You can know that if you
remove the outliers your model
will become very good.
All right.
So this is about data quality.
Next is computational complexity
a linear regression is often
not computationally expensive as
compared to the decision tree
or the clustering
algorithm the order
of complexity for n
training example and X features.
Usually Falls in either Big O
of x square or big
of xn next is comprehensible
and transparent the
linear regression are
easily comprehensible
and transparent in nature.
They can be represented by
a simple mathematical notation
to anyone and can be
understood very easily.
So these are some
of the criteria based
on which you will select
the linear regression algorithm.
All right.
Next is where is linear
regression used first
is evaluating Trends
and sales estimate.
Well linear regression
can be used in Business
to evaluate Trends
and make estimates
or focused for example,
if a company sales have
increased steadily every month
for past few years then
conducting a linear analysis
on the sales data
with monthly sales on the y axis
and time on the x axis.
This will give you a line
that predicts the upward Trends
in the sale after creating
the trendline the company
could use the slope
of the lines too focused sale
in future months.
Next is analyzing.
The impact of price changes
will linear regression
can be To analyze the effect
of pricing on consumer behavior.
For instance.
If a company changes
the price on a certain
product several times,
then it can record the quantity
itself for each price level
and then perform
a linear regression
with sold quantity as
a dependent variable and price
as the independent variable.
This would result in a line
that depicts the extent
to which the customer reduce
their consumption of the product
as the prices increasing.
So this result would help us
in future pricing decisions.
Next is assessment
of risk and fine.
Financial services
and insurance domain.
Well linear regression
can be used to analyze the risk,
for example health insurance
company might conduct
a linear regression algorithm
how it can do it can do it
by plotting the number of claims
per customer against its age
and they might discover
that the old customers
then to make more
health insurance claim.
Well the result
of such analysis might guide
important business decisions.
All right, so by now you
have just a rough idea of
what linear regression
algorithm as like,
What it does where it is used
when you should use
it early now,
let's move on and understand
the algorithm and depth.
So suppose you have independent
variable on the x-axis
and dependent variable
on the y-axis.
All right suppose.
This is the data point
on the x axis.
The independent variable
is increasing on the x axis.
And so does the dependent
variable on the y-axis?
So what kind of linear
regression line you would get
you would get a positive
linear regression line.
All right as the slope
would be positive.
Next is suppose.
You have an independent
variable on the x-axis
which is increasing
and on the other hand the
dependent variable on the y-axis
that is decreasing.
So what kind of line
will you get in that case?
You will get
a negative regression line.
In this case as the slope
of the line is negative.
And this particular line
that is line of y equal MX
plus C is a line
of linear regression
which shows the relationship
between independent variable
and dependent variable
and this line is only known
as line of linear regression.
Okay?
So let's add some data
points to our graph.
So these are some observation
or data points on our graphs.
Let's plot some more.
Okay.
Now all our data points
are plotted now our task is
to create a regression line
or the best fit line.
All right now
once our regression
line is drawn now,
it's the task
of production now suppose.
This is our estimated value
or the predicted value
and this is our actual value.
Okay.
So what we have to do our main
goal is to reduce this error.
That is to reduce the distance
between the estimated
or the predicted value
and the actual value.
The best fit line would be the
one which had the least error
or the least difference
in estimated value
and the actual value.
All right, and other words we
have to minimize the error.
This was a brief
understanding of linear
regression algorithm soon.
We'll jump towards
mathematical implementation.
All right, but for then
let me tell you this
suppose you draw a graph
with speed on the x-axis
and distance covered.
On the y axis with the time
demeaning constant,
if you plot a graph
between the speed travel
by the vehicle
and the distance traveled
in a fixed unit of time,
then you will get
a positive relationship.
All right.
So suppose the equation
of line as y equal MX plus C.
Then in this case Y is
the distance traveled
in a fixed duration of time x
is the speed of vehicle m
is the positive slope
of the line and see is
the y-intercept of the line.
All right suppose
the distance remaining constant.
You have to plot a graph
between the Rid of the vehicle
and the time taken to travel
a fixed distance then
in that case you will get a line
with a negative relationship.
All right, the slope of the line
is negative here the equation
of line changes to y
equal minus of MX plus C
where Y is the time
taken to travel
a fixed distance X is the speed
of vehicle m is
the negative slope
of the line and see is
the y-intercept of the line.
All right.
Now, let's get back
to our independent
and dependent variable.
So in that term why is
our dependent variable and That
is our independent variable.
Now, let's move on and see
the mathematical implementation
of the things.
Alright, so we have x
equal 1 2 3 4 5 let's plot
them on the x-axis.
So 0 1 2 3 4 5 6 alike
and we have y as 3 4 2 4 5.
All right.
So let's plot 1
2 3 4 5 on the y-axis now,
let's plot our coordinates 1
by 1 so x equal 1 and y equal 3,
so We have here x
equal 1 and y equal 3.
So this is the point
1 comma 3 so similarly
we have 1 3 2 4 3 2 4 4 & 5 5.
All right.
So moving on ahead.
Let's calculate the mean of X
and Y and plot it on the graph.
All right, so mean of X is 1
plus 2 plus 3 plus 4
plus 5 divided by 5.
That is 3.
All right, similarly mean
of Y is 3 plus 4 plus 2
plus 4 plus 5 that is 18.
So it in divided by 5.
That is nothing
but 3.6 aligned so next
what we'll do we'll plot
our mean that is 3 comma 3 .6
on the graph.
Okay.
So there's a point 3 comma 3 .6
see our goal is to find
or predict the best fit line
using the least Square
Method All right.
So in order to find
that we first need to find
the equation of line,
so let's find the equation
of our regression line.
All right.
So let's suppose this is
our regression line
y equal MX plus C.
Now.
We have an equation of line.
So all we need to do is
find the value of M and see
where m equals summation of x
minus X bar X Y minus y bar
upon the summation of x
minus X bar whole Square
don't get confused.
Let me resolve it for you.
All right.
So moving on ahead
as a part of formula.
What we are going to do
will calculate x minus X bar.
So we have X as 1 minus X bar
as 3 so 1 minus 3
that is minus 2 next.
We have x equal
to minus its mean 3
that is minus 1 similarly.
We have 3 minus 3 is 0 4 -
3 1 5 - 3 2 alight
so x minus X bar.
It's nothing but the distance
of all the point
through the line y equal 3
and what does this y
minus y bar implies it implies
that distance of all the point
from the line x equal 3 .6 fine.
So let's calculate the value
of y minus y bar.
So starting with y equal 3 -
value of y. A bar
that is 3.6.
So it is three minus 3.6
how much -
of 0.6 next is 4 minus 3.6
that is 0.4 next to minus 3.6
that is minus of 1 point
6 next is 4 minus 3.6
that is 0.4 again,
5 minus 3.6 that is 1.4.
Alright, so now we are done
with Y minus y bar fine now next
we will calculate x
minus X bar whole Square
Let's calculate x
minus X bar whole Square.
So it is minus 2 whole square.
That is 4 minus 1 whole square.
That is 1 0 squared is
0 1 Square 1 2 square for fine.
So now in our table we have x
minus X bar y minus y bar
and x minus X bar whole Square.
Now what we need.
We need the product of x
minus X bar X Y minus y bar.
Alright, so let's see
the product of x
minus X bar X Y minus
y bar that is minus
of 2 x minus of 0.6.
That is one.
Point 2 minus of 1 x
0 point 4 that is
minus of 0 point 4 0 x
minus of 1.6.
That is 0 1 multiplied
by zero point four
that is 0.4.
And next 2 multiplied
by 1 point for that is 2.8.
All right.
Now almost all the parts
of our formula is done.
So now what we need
to do is get the summation
of last two columns.
All right, so the summation of x
minus X bar whole square is 10
and the summation
of x minus X bar.
X Y minus y bar is 4
so the value of M will be equal
to 4 by 10 fine.
So let's put this value
of m equals zero point 4
and our line y equal MX plus C.
So let's file all the points
into the equation
and find the value of C.
So we have y as 3.6 remember
the mean by m as 0.4
which we calculated just
now X as the mean value of x
that is 3 and we have the
in as 3 point 6 equals
0 point 4 x 3 plus C. Alright
that is 3.6 equal
1 Point 2 plus C.
So what is the value of C
that is 3.6 minus 1 Point 2.
That is 2 point 4.
All right.
So what we had we had m
equals zero point four see
as 2.4 and then finally
when we calculate the equation
of the regression line
what we get is y equal
zero point four times of X
plus two point four.
So there is the regression line.
Like so there's
how you're plotting your points.
This is your actual point.
All right.
Now for given m equals
zero point four and SQL 2.4.
Let's predict the value of y
for x equal 1 2 3 4 & 5.
So when x equal
1 the predicted value
of y will be zero point four x
one plus two point
four that is 2.8.
Similarly when x equal
to predicted value
of y will be zero point 4 x
2 plus 2 point 4
that equals to 3 point.
Two similarly x
equal 3 y will be 3 point 6 x
equal 4 y will be 4 point 0
x equal 5 y will be
four point four.
So let's plot them on the graph
and the line passing through
all these predicting point
and cutting y-axis at 2.4
as the line of regression.
Now your task is to calculate
the distance between the actual
and the predicted value
and your job is
to reduce the distance.
All right, or in other words,
you have to reduce the error
between the actual
and the predicted.
The line with the least
error will be the line
of linear regression
or regression line and it
will also be the best fit line.
Alright, so this is
how things work in computer.
So what it do it performs
a number of iteration
for different values of M
for different values of M.
It will calculate
the equation of line
where y equals MX plus C.
Right?
So as the value
of M changes the line
is changing so iteration
will start from one.
All right, and it will perform
a number of iteration so
after Every iteration
what it will do it will
calculate the predicted value
according to the line
and compare the distance
of actual value
to the predicted value
and the value of M
for which the distance
between the actual
and the predicted value is
minimum will be selected
as the best fit line.
All right.
Now that we have calculated
the best fit line now,
it's time to check the goodness
of fit or to check
how good a model is performing.
So in order to do that,
we have a method
called R square method.
So what is this R square?
Well r-squared value is
a statistical measure of
how close the data are
to the fitted regression
line in general.
It is considered
that a high r-squared
value model is a good model,
but you can also have
a lower squared value
for a good model as well or
a higher Squad value for a model
that does not fit at all.
All right.
It is also known as
coefficient of determination
or the coefficient
of multiple determination.
Let's move on and see
how a square is calculated.
So these are our actual values
plotted on the graph.
We had calculated
the predicted values
of Y as 2.8 3.2 3.6 4.0 4.4.
Remember when we calculated
the predicted values
of Y for the equation Y
predicted equals 0 1 4 x
of X plus two point
four for every x
equal 1 2 3 4 & 5 from there.
We got the power.
Good values of Phi.
All right.
So let's plot it on the graph.
So these are point
and the line passing
through these points are nothing
but the regression line.
All right.
Now, what you need to do is
you have to check and compare
the distance of actual -
mean versus the distance
of predicted - mean.
Alright.
So basically what you are doing
you are calculating the distance
of actual value
to the mean to distance
of predicted value to the mean.
All right, so there is nothing
but a square in mathematically
you can represent our school.
Whereas summation of Y
predicted values minus y
bar whole Square divided
by summation of Y minus
y bar whole Square
where Y is the actual value
y p is the predicted value
and Y Bar is the mean value of y
that is nothing but 3.6.
Remember, this is our formula.
So next what we'll do
we'll calculate y minus y bar.
So we have y is 3y bar as
3 point 6 so we'll calculate
it as 3 minus 3.6
that is nothing but
minus of 0.6 similarly
for y equals 4
and Y Bar equal 3.6.
We have y minus y bar as
zero point 4 then 2 minus 3.6.
It has 1 point
6 4 minus 3.6 again
zero point four and five
minus 3.6 it is 1.4.
So we got the value
of y minus y bar.
Now what we have to do we
have to take it Square.
So we have minus of 0.6 Square
as 0.36 0.4 Square as 0.16 -
of 1.6 Square as 2.56 0.4 Square
as 0.16 and 1.4 squared
is 1.96 now is a part
of formula what we need.
We need our YP
minus y BAR value.
So these are VIP values
and we have to subtract it
from the No, right.
So 2 .8 minus 3.6
that is minus 0.8.
Similarly.
We will get 3.2 minus 3.6
that is 0.4 and 3.6 minus 3.6
that is 0 for 1 0 minus
3.6 that is 0.4.
Then 4 .4 minus 3.6 that is 0.8.
So we calculated the value
of YP minus y bar now,
it's our turn to calculate
the value of y b minus
y bar whole Square next.
We have -
of 0.8 Square as 0.64 - of Point
four square as 0.160 Square
0 0 point 4 Square as again 0.16
and 0.8 Square as 0.64.
All right.
Now as a part of formula
what it suggests it suggests
me to take the summation of Y P
minus y bar whole square
and summation of Y minus
y bar whole Square.
All right.
Let's see.
So on submitting y
minus y bar whole Square
what you get is five point two
and summation of Y P minus
y bar whole Square you
get one point six.
So the value of R square
can be calculated as
1 point 6 upon 5.2 fine.
So the result which will get is
approximately equal to 0.3.
Well, this is not a good fit.
All right, so it suggests
that the data points are far
away from the regression line.
Alright, so this is
how your graph will look
like when R square is 0.3
when you increase the value
of R square to 0.7.
So you'll see
that the actual value would like
closer to the regression line
when it reaches to 0.9 it comes.
More clothes and when the value
of approximately equals
to 1 then the actual values lies
on the regression line itself,
for example, in this case.
If you get a very low value
of R square suppose 0.02.
So in that case what you'll see
that the actual values are
very far away from
the regression line,
or you can say
that there are too
many outliers in your data.
You cannot focus
anything from the data.
All right.
So this was all about
the calculation of R square now,
you might get a question
like are low values
of Square always bad.
Well in some field it
is entirely expected that I ask
where value will be low.
For example any field
that attempts to predict human
behavior such as psychology
typically has r-squared values
lower than around 50%
through which you can conclude
that humans are simply harder
to predict the under
physical process furthermore.
If you are squared value is low,
but you have statistically
significant predictors,
then you can still
draw important conclusion
about how changes in the
predicator values associated.
Oh sated with the changes
in the response value regardless
of the r-squared
the significant coefficient
still represent the mean change
in the response for one unit
of change in the predicator
while holding other predators
in the model constant,
obviously this type
of information can be
extremely valuable.
All right.
All right.
So this was all about
the theoretical concept now,
let's move on to the coding
part and understand
the code in depth.
So for implementing
linear regression using python,
I will be using Anaconda
with jupyter notebook
installed on it.
So I like there's
a jupyter notebook
and we are using python 3.01 it
alright, so we are going
to use a data set consisting
of head size and human brain
of different people.
All right.
So let's import our data set
percent matplotlib and line.
We are importing numpy
as NP pandas as speedy and
matplotlib and from matplotlib.
We are importing pipe
out of that as PLT.
Alright next we will import
our data had brain dot CSV
and store it
in the data variable.
Let's execute the Run button
and see the armor.
But so this asterisk
symbol it symbolizes
that it still executing.
So there's a output
or dataset consists
of two thirty seven rows
and four columns.
We have columns as
gender age range head size
in centimeter Cube
and brain weights
and Graham fine.
So there's our sample data set
that is how it looks it consists
of all these data set.
So now that we
have imported our data,
so as you can see they are
237 values in the training set
so we can find a linear.
Relationship between the head
size and the Brain weights.
So now what we'll do
we'll collect X & Y
the X would consist
of the head size values
and the Y would consist
of brain with values.
So collecting X and Y.
Let's execute the Run.
Done next what we'll do we
need to find the values of b 1
or B not or you can say m and C.
So we'll need the mean of X
and Y values first of all
what we'll do we'll calculate
the mean of X and Y so mean x
equal NP dot Min X.
So mean is a predefined function
of Numb by similarly mean
underscore y equal
NP dot mean of Y,
so what it will return
if you'll return
the mean values of Y
next we'll check
the total number of values.
So m equals.
Well length of X. Alright,
then we'll use the formula
to calculate the values of b 1
and B naught or fnc.
All right, let's execute
the Run button and see
what is the result.
So as you can see here
on the screen we have got
b 1 as 0 point 2 6 3 +
B not as three twenty
five point five seven.
Alright, so now
that we have a coefficient.
So comparing it with
the equation y equal MX plus C.
You can say
that brain weight equals
zero point 2 6 3 X Head size
plus three twenty five point
five seven so you can say
that the value of M here
is 0.26 3 and the value
of C. Here is three twenty
five point five seven.
All right, so there's
our linear model now,
let's plot it
and see graphically.
Let's execute it.
So this is how our plot looks
like this model is not so bad.
But we need to find out
how good our model is.
So in order to find
it the many methods
like root means Square method
the coefficient of determination
or the a square method.
So in this tutorial,
I have told you
about our score method.
So let's focus on that and see
how good our model is.
So let's calculate
the R square value.
All right here SS underscore T
is the total sum of square SS.
Our is the total sum of square
of residuals and R square
as the formula is
1 minus total sum
of squares upon total sum
of square of residuals.
All right next
when you execute it,
you will get the value
of R square as 0.63
which is pretty very good.
Now that you have implemented
simple linear regression model
using least Square method,
let's move on and see
how will you implement the model
using machine learning library
called scikit-learn.
All right.
So this scikit-learn
is a simple machine.
Young Library in Python welding
machine learning model are
very easy using scikit-learn.
So suppose there's
a python code.
So using the scikit-learn
libraries your code shortens
to this length
like so let's execute
the Run button and see you
will get the same our
to score as Well,
this was all
for today's discussion.
Most of the entities
in this world are
related in one way
or another at times finding
relationship between entities
can help you take valuable
business decisions today.
I'm going to talk
about logistic regression,
which is one
such approach towards
predicting relationships.
Now, let us see
what all we are going to cover
in today's training.
So we'll start off the session
by getting a quick introduction
to what is regression.
Then we'll see the different
types of regression
and we'll be discussing the what
and by of logistic regression.
So in this part,
we'll discuss what
exactly it is.
It is used why it is used
and all those things moving
ahead will compare
linear regression
versus logistic regression
along with the various
real-life use cases
and finally towards the end.
I will be practically
implementing logistic
regression algorithm.
So let's quickly start off
with the very first topic
what is regression.
The regression analysis is
a predictive modeling technique.
So it always
involves predictions.
So in this session,
we'll just talk
about predictive analysis
and not prescriptive analysis.
Now why because
if descriptive analysis
you Need to have a good base
and a stronghold
on the predictive part first.
Now, it estimates relationship
between the dependent variable
and an independent variable.
So for those of you
who are not aware
of these terminologies,
let me give you
a quick summary of it.
So dependent variable is
nothing but a variable
which you want to predict now,
let's say I want to know
what will be the sales
on 26th of this month.
So sales becomes
a dependent variable
or you can see
the target variable.
Now this dependent variable
or Target variable are going
to depend on a lot of actors.
The number of products
you sold till date
or what is the season out there?
Is there the availability
of product or how is
the product quality
and all these things?
So these are
the NeverEnding factors
which are nothing
but the different features
that leads to sail
so these variables are called
as an independent variable
or you can say the predictor now
if you look
at the graph over here,
we have some values of X
and we have values of Y now
as you can see over here
if X increases the value
of by also increases so
let me explain you this
with an example.
Let's say we have
until the value of x
which is six point seven five
and somebody asked you.
What was the value of y
when the value
of x is 7 so the way
that you can do it
or how regression comes
into the picture is
by fitting a straight line
by all these points
and getting the value
of M and C.
So this is straight line guys
and the formula for the straight
line is y is equal to MX plus C.
So using this we can try to
predict the value of y so here
if you notice the X variable
can increase as much as it can
but the Y variable
will increase according to x
so Why is basically dependent
on your X variable?
So for any arbitrary value
of x You can predict the value
of y and this is always
done through regression.
So that is
how regression is useful.
Now regression is basically
classified into three types
your linear regression,
then your logistic regression
and polynomial regression.
So today we will be discussing
logistic regression.
So let's move forward
and understand the what and by
of logistic regression.
Now this algorithm
is most widely used
when the dependent variable
or you can see the output is
in the binary.
A format.
So here you need
to predict the outcome
of a categorical
dependent variable.
So the outcome should be
always discreet or categorical
in nature Now by discrete.
I mean the value
should be binary
or you can say you just have
two values it can either be 0
or 1 it can either be yes
or a no either be true
or false or high or low.
So only these can be
the outcomes so the value
which you need to create
it should be discrete
or you can say
categorical in nature.
Whereas in linear regression.
We have the value of by
or you can see Val you need
to predict within a range
that is how there's a difference
between linear regression
and logistic regression.
We must be having question.
Why not linear regression now
guys in linear regression
the value of by or the value,
which you need to
predict is in a range,
but in our case as
in the logistic regression,
we just have two values
it can be either 0
or it can be one.
It should not entertain
the values which is
below zero or above one.
But in linear regression,
we have the value of y
in the range so here
in order to implement
logic regression we
need To clip this part
so we don't need the value
that is below zero
or we don't need the value
which is above 1
so since the value of y will be
between only 0 and 1
that is the main rule
of logistic regression.
The linear line has
to be clipped at 0 and 1 now.
Once we clip this graph it
would look somewhat like this.
So here you're getting the curve
which is nothing but
three different straight lines.
So here we need to make
a new way to solve this problem.
So this has to be
formulated into equation.
And hence we come up
with logistic regression.
So here the outcome is either 0
Or one which is the main rule
of logistic regression.
So with this our resulting curve
cannot be formulated.
So hence our main aim
to bring the values to 0
and 1 is fulfilled.
So that is how we came up with
large stick regression now here
once it gets formulated
into an equation.
It looks somewhat like this.
So guys, this is
nothing but an S curve
or you can say the sigmoid curve
a sigmoid function curve.
So this sigmoid function
basically converts any value
from minus infinity to Infinity
to your discrete values,
which a Logitech regression
wants or it Can say the values
which are in binary
format either 0 or 1.
So if you see here
the values as either 0
or 1 and this is nothing
but just a transition of it,
but guys there's
a catch over here.
So let's say I have
a data point that is 0.8.
Now, how can you decide
whether your value is 0
or 1 now here you
have the concept
of threshold which basically
divides your line.
So here threshold value
basically indicates the
probability of either winning
or losing so here by winning.
I mean the value is equal.
One and by losing I mean
the values equal to 0
but how does it do that?
Let's have a data point
which is over here.
Let's say my cursor is at 0.8.
So here I check
whether this value is less
than the threshold value or not.
Let's say if it is more
than the threshold value.
It should give me the result
as 1 if it is less than that,
then should give me
the result is zero.
So here my threshold
value is 0.5.
I need to Define that
if my value let's is 0.8.
It is more than 0.5.
Then the value shall
be rounded of two one.
One and let's say
if it is less than 0.5.
Let's I have a value 0.2 then
should reduce it to zero.
So here you can use the concept
of threshold value
to find output.
So here it should be discreet.
It should be either 0
or it should be one.
So I hope you caught this curve
of logistic regression.
So guys, this is
the sigmoid S curve.
So to make this curve
we need to make an equation.
So let me address
that part as well.
So let's see how an equation
is formed to imitate
this functionality so over here,
we have an equation
of a straight.
Line, which is y is
equal to MX plus C.
So in this case,
I just have only one independent
variable but let's say
if we have many independent
variable then the equation
becomes m 1 x 1
plus m 2 x 2 plus m 3 x
3 and so on till M NX n now,
let us put in B and X.
So here the equation
becomes Y is equal to b 1 x
1 plus beta 2 x
2 plus b 3 x 3 and so on
till be nxn plus
C. So guys equation
of the straight line has a range
from minus infinity to Infinity.
Yeah, but in our case
or you can say largest equation
the value which we need
to predict or you can say
the Y value it can have
the range only from 0 to 1.
So in that case we need
to transform this equation.
So to do that what we
had done we have just divide
this equation by 1 minus y
so now Y is equal
to 0 so 0 over 1 minus
0 which is equal to 1
so 0 over 1 is again 0
and if we take Y is equals to 1
then 1 over 1 minus 1 which is 0
so 1 over 0 is infinity.
So here are my range is now.
Between 0 to Infinity,
but again, we want the range
from minus infinity to Infinity.
So for that
what we'll do we'll have
the log of this equation.
So let's go ahead
and have the logarithmic
of this equation.
So here we have this transform
it further to get the range
between minus infinity
to Infinity so over
here we have log of Y
over 1 minus 1
and this is your final
logistic regression equation.
So guys, don't worry.
You don't have to write
this formula or memorize
this formula in Python.
You just need to
call this function
which is logistic regression
and Everything will be
automatically for you.
So I don't want to scare
you with the maths
in the formulas behind it.
But it is always good to know
how this formula was generated.
So I hope you guys are clear
with how logistic regression
comes into the picture next.
Let us see what are
the major differences
between linear regression was
a logistic regression the first
of all in linear regression,
we have the value
of y as a continuous variable
or the variable
between need to predict
are continuous in nature.
Whereas in logistic regression.
We have the categorical variable
so here the value
which you need to Should
be discrete in nature.
It should be either 0
or 1 or should have
just two values to it.
For example,
whether it is raining
or it is not raining
is it humid outside
or it is not humid outside.
Now, how's it going to snow
and it's not going to snow.
So these are the few example,
we need to predict
where the values are discrete
or you can just predict
where this is happening or not.
Next linear equation solves
your regression problems.
So here you have a concept
of independent variable
and a dependent variable.
So here you can calculate
the value of y
which you need to Plate it.
Using the value of x.
So here your y variable
or you can see the value
that you need to
predict are in a range.
But whereas in
logistic regression,
you have discrete values.
So logistic regression basically
solves a classification problem
so it can basically classify it
and it can just give you result
whether this event
is happening or not.
So I hope it is pretty
much Clear till now
next in linear regression.
The graph that you have seen
is a straight line
graph so over here,
you can calculate the value of y
with respect to the value of x
where as in logistic regression.
Glad that we got was a Escobar.
You can see the sigmoid curve.
So using the sigmoid function
You can predict your y values.
So I hope you guys are clear
with the differences
between the linear regression
and logistic regression
moving the a little see
the various use cases
where in logistic regression
is implemented in real life.
So the very first is
weather prediction now
largest aggression helps
you to predict your weather.
For example, it
is used to predict
whether it is raining
or not whether it is sunny.
Is it cloudy or not?
So all these things
things can be predicted
using logistic regression.
Where as you need
to keep in mind
that both linear regression
and logistic regression can be
used in predicting the weather.
So in that case linear
regression helps you to predict
what will be
the temperature tomorrow
whereas logistic regression
will only tell you
which is going to rain or not
or whether it's cloudy or not,
which is going to snow or not.
So these values are discrete.
Whereas if you apply
linear regression,
you will predicting things like
what is the temperature tomorrow
or what is the temperature
day after tomorrow
and all those thing?
So these are the slight?
Is between linear regression
and logistic regression
the moving ahead.
We have classification problem.
So python performs
multi-class classification,
so here it can help you tell
whether it's a bird.
It's not a board.
Then you classify
different kind of mammals.
Let's say whether it's a dog
or it's not a dog similarly,
you can check it for reptile
whether it's a reptile
or not a reptile.
So in logistic regression,
it can perform
multi-class classification.
So this point
I've already discussed
that it is using
classification problems next.
It also helps you
to determine the illnesses.
Where so let me take an example.
Let's say a patient goes for
a routine check up in hospital.
So what doctor will do it,
it will perform various tests
on the patient and we'll check
whether the patient is
actually a law or not.
So what will be the features
so doctor can check
the sugar level
the blood pressure then what
is the age of the patient?
Is it very small or is
it the old person then?
What is the previous medical
history of the patient
and all of these features
will be recorded by the doctor
and finally, dr.
Checks the patient
data and Data -
the outcome of Illness
and the severity of illness.
So using all the data
of a doctor can identify
whether a patient is ill or not.
So these are
the various use cases
in which you can use
logistic regression now,
I guess enough of theory part.
So let's move ahead and see some
of the Practical implementation
of logistic regression
so over here,
I be implementing two projects
when I have the data set
of a Titanic so over here
will predict what factors made
people more likely
to survive the sinking
of the Titanic ship anime.
Second project will see
the data analysis.
On the SUV cars so over here.
We have the data of the SUV cars
who can purchase it
and what factors made people
more interested in buying SUV.
So these will be
the major questions as
to why you should Implement
logistic regression and
what output will you get by it?
So let's start by
the very first project
that is Titanic data analysis.
So some of you might know
that there was a ship
called as Titanic
with basically hit an iceberg
and sank to the bottom
of the ocean and it was
a big disaster at that time
because it was the first
voyage of the ship.
It was supposed to be really
really strongly built and one
of the best ships of that time.
So it was a big disaster
of that time.
And of course there is a movie
about this as well.
So many of you
might have washed it.
So what we have we have data
of the passengers those
who survived and those
who did not survive
in this particular tragedy.
So what you have to do you
have to look at this data
and analyze which factors
would have been contributed
the most to the chances
of a person survival
on the ship or not.
So using the logistic
regression, we can predict
whether the person survived
or the person died.
Now apart from this
we also have a look
with the various features
along with that.
So first it is explore
the data set so over here,
we have the index value
then the First Column
is passenger ID,
then my next column
is survived so over here,
we have two values
a 0 and a 1 so 0 stands
for did not survive
and one stands for survive.
So this column is categorical
where the values
are discrete next.
We have passenger class
so over here,
we have three values 1 2 and 3.
So this basically tells you
that whether a I think
a stabbing in the first class
second class or third class.
Then we have the name
of the passenger.
We have the six or you can see
the gender of the passenger
where the passenger
is a male or female.
Then we have the age
we have the Sip SP.
So this basically means
the number of siblings
or the spouses aboard
the Titanic so over here,
we have values such as 1
0 and so on then we have
Parts apart is basically
the number of parents
or children aboard
the Titanic so over here,
we also have some values
then we I have
the ticket number.
We have the fear.
We have the cabin number
and we have the embarked column.
So in my inbox column,
we have three values
we have SC and Q.
So s basically stands
for Southampton C
stands for Cherbourg
and Q stands for Queenstown.
So these are the features
that will be applying
our model on so here
we'll perform various steps
and then we'll be implementing
logistic regression.
So now these are
the various steps
which are required
to implement any algorithm.
So now in our case
we are implementing
logistic regression, so,
Very first step is
to collect your data
or to import the libraries
that are used for
collecting your data
and then taking it forward then
my second step is to analyze
your data so over here,
I can go to the various fields
and then I can analyze the data.
I can check did the females
or children survive
better than the males
or did the rich
passenger survived more
than the poor passenger
or did the money matter as in
who paid more to get
into the shape
with the evacuated first?
And what about the workers
does the worker survived
or what is the survival rate?
If you were the worker
in the ship and not just
a traveling passenger,
so all of these are very
very interesting questions
and you would be going
through all of them one by one.
So in this stage,
you need to analyze our data
and explore your data as much as
you can then the third step is
to Wrangle your data now
data wrangling basically means
cleaning your data so over here,
you can simply remove
the unnecessary items or
if you have a null values
in the data set.
You can just clear that data and
then you can take it forward.
So in this step you can build
your model using the train data.
And then you can test it
using a test so over here you
will be performing a split
which basically split
your data set into training
and testing data set and find
you will check the accuracy.
So as to ensure
how much accurate
your values are.
So I hope you guys got
these five steps
that you're going to implement
in autistic regression.
So now let's go into all
these steps in detail.
So number one.
We have to collect your data
or you can say
import the libraries.
So it may show you
the implementation part as well.
So I just open
my jupyter notebook
and I just Implement
all of these steps.
It's side-by-side.
So guys this is
my jupyter notebook first.
Let me just rename
jupyter notebook to let's say
Titanic data analysis.
Now our first step was
to import all the libraries
and collect the data.
So let me just import
all the libraries first.
So first of all,
I'll import pandas.
So pandas is used
for data analysis.
So I'll say input pandas as PD
then I will be importing numpy.
So I'll say import numpy as NP
so numpy is a library in Python
which basically stands
for numerical Python
and it is widely used to perform
any scientific computation.
Next.
We will be importing Seaborn.
So c 1 is a library for
statistical brought think so.
Say import Seaborn as SNS.
I'll also import matplotlib.
So matplotlib library
is again for plotting.
So I'll say import
matplotlib dot Pi plot
as PLT now to run this library
in jupyter Notebook all I have
to write in his percentage
matplotlib in line.
Next I will be importing
one module as well.
So as to calculate the basic
mathematical functions,
so I'll say import mats.
So these are the libraries
that I will be needing
in this Titanic data analysis.
So now let me just
import my data set.
So I will take a variable.
Let's say Titanic data
and using the pandas.
I will just read my CSV
or you can see the data set.
I like the name of my data set
that is Titanic dot CSV.
Now.
I have already showed you
the data set so over here.
Let me just print
the top 10 rows.
So for that I will just say
I take the variable
Titanic data dot head
and I'll say the top ten rules.
So now I'll just run this
so to run these fellows
have to press shift + enter
or else you can just directly
click on this cell so over here.
I have the index.
We have the passenger ID,
which is nothing.
But again the index
which is starting from 1 then
we have the survived column
which has a category.
Call values or you can say
the discrete values,
which is in the form of 0 or 1.
Then we have
the passenger class.
We have the name
of the passenger 6 8
and so on so this
is the data set
that I will be going forward
with next let us bring
the number of passengers
which are there in
this original data set for that.
I'll just simply type in print.
I'll say a number of passengers.
And using the length function,
I can calculate
the total length.
So I'll say length
and inside this I
will be passing this variable
because Titanic data,
so I'll just copy it from here.
I'll just paste it dot index
and next set me
just bring this one.
So here the number of passengers
which are there in the original
data set we have is 891
so around this number
were traveling in
the Titanic ship so over here,
my first step is done
where you have just collected
data imported all the libraries
and find out the total
number of passengers,
which are Titanic so
now let me just go back
to presentation and let's see.
What is my next step.
So we're done with
the collecting data.
Next step is to analyze
your data so over here,
we will be creating different
plots to check the relationship
between variables as
in how one variable
is affecting the other
so you can simply explore
your data set by making use
of various columns
and then you can plot
a graph between them.
So you can either plot
a correlation graph.
You can plot
a distribution curve.
It's up to you guys.
So let me just go back
to my jupyter notebook and let
me analyze some of the data.
Over here.
My second part is
to analyze data.
So I just put this in headed
to now to put this in here
to I just have to go
and code click on mark down
and I just run this so first
let us plot account plot
where you can pay
between the passengers
who survived and
who did not survive.
So for that I will be using
the Seabourn Library so over
here I have imported
Seaborn as SNS
so I don't have
to write the whole name.
I'll simply say
SNS dot count plot.
I say axis with the survive
and the data
that I'll be using
is the Titanic data
or you can say the name
of variable in which you
have store your data set.
So now let me just run this
so who were here as you can see
I have survived column on my x
axis and on the y axis.
I have the count.
So 0 basically stands
for did not survive
and one stands
for the passengers
who did survive so over here,
you can see that around 550
of the passengers
who did not survive and they
were around 350 passengers
who only survive so here
you can basically compute.
There are very less survivors
than on survivors.
So this was the very
first floor now
that is not another plot
to compare the sex as to whether
out of all the passengers
who survived and
who did not survive.
How many were men and
how many were female
so to do that?
I'll simply say
SNS dot count plot.
I add the Hue as six
so I want to know
how many females and
how many male survive
then I'll be
specifying the data.
So I'm using Titanic data
set and let me just run
this you have done a mistake
over here so over here you
can see I have survived
column on the x-axis
and I have the count
on the why now.
So here your view color stands
for your male passengers
and orange stands
for your female.
So as you can see
here the passengers
who did not survive
that has a value
0 so we can see that.
Majority of males did not
survive and if we see the people
who survived here,
we can see the majority
of female survive.
So this basically concludes
the gender of the survival rate.
So it appears on average
women were more than three
times more likely
to survive than men next.
Let us plot another plot
where we have the Hue as
the passenger class so over
here we can see which class at
the passenger was traveling in
whether it was traveling
in class one two,
or three so for that I just
tried the same command.
I'll say SNS dot count plot.
I keep my x-axis as
subtly I'll change my you
to passenger class.
So my variable
named as PE class.
And the data said
that I'll be using
is Titanic data.
So this is my result
so over here you can see I have
blue for first-class orange
for second class and green
for the third class.
So here the passengers
who did not survive a majorly
of the third class
or you can say the lowest class
or the cheapest class to get
into the dynamic and the people
who did survive majorly belong
to the higher classes.
So here 1 & 2 has more eyes
than the passenger
who were traveling
in the third class.
So here we have concluded
that the passengers
who did not survive
a majorly of third class.
Us all you can see
the lowest class
and the passengers
who were traveling
in first and second class
would tend to survive more next.
I just got a graph for
the age distribution over here.
I can simply use my data.
So we'll be using
pandas library for this.
I will declare an array
and I'll pass in the column.
That is age.
So I plot and I want a histogram
so I'll say plot da test.
So you can notice over here
that we have more
of young passengers,
or you can see the children
between the ages 0 to 10
and then we have
the average people
and if you go ahead Lester
would be the population.
So this is the analysis
on the age column.
So we saw that we have
more young passengers and more
mediocre eight passengers,
which are traveling
in the Titanic.
So next let me plot
a graph of fare as well.
So I'll say Titanic data.
I say fair.
And again, I got a histogram
so I'll say haste.
So here you can see
the fair size is
between zero to hundred now.
Let me add the bin size.
So as to make it
more clear over here,
I'll say Ben is equals to let's
say 20 and I'll increase
the figure size as well.
So I'll say fixed size.
Let's say I'll give
the dimensions as 10 by 5.
So it is bins.
So this is more clear now next.
It is analyzed
the other columns as well.
So I'll just type
in Titanic data
and I want the information as
to what all columns are left.
So here we have passenger ID,
which I guess it's
of no use then you have see
how many passengers survived
and how many did not we
also see the analysis
on the gender basis.
We saw when the female
tend to survive more
or the maintain to survive more
then we saw the passenger class
where the passenger is traveling
in the first class second class
or third class.
Then we have the name.
So in name,
we cannot do any analysis.
We saw the sex we
saw the age as well.
Then we have sea bass P.
So this stands for the number
of siblings or the spouses
which Are aboard the Titanic so
let us do this as well.
So I'll say SNS dot count plot.
I mentioned X SC SP.
And I will be using
the Titanic data
so you can see the plot
over here so over here you
can conclude that.
It has the maximum value
on zero so you can conclude
that neither children
nor a spouse was
on board the Titanic now
second most highest value is 1
and then we have various values
for 2 3 4 and so on next
if I go above the store
this column as well.
Similarly can do four parts.
So next we have part
so you can see the number
of parents or children
which were aboard the Titanic
so similarly can do.
As well then we have
the ticket number.
So I don't think so.
Any analysis is
required for Ticket.
Then we have fears of a we
have already discussed as
in the people would tend
to travel in the first class.
You will be the highest view
then we have the cable number
and we have embarked.
So these are the columns
that will be doing
data wrangling on
so we have analyzed the data
and we have seen
quite a few graphs
in which we can conclude which
variable is better than another
or what is the relationship
the whole third step
is my data wrangling
so data wrangling basically
means Cleaning your data.
So if you have a large data set,
you might be having
some null values
or you can say Nan values.
So it's very important
that you remove all
the unnecessary items
that are present
in your data set.
So removing this directly
affects your accuracy.
So I'll just go ahead
and clean my data
by removing all the n n values
and unnecessary columns,
which has a null value
in the data set
the next time you're
performing data wrangling.
Supposed to fall I check
whether my data set
is null or not.
So I'll say Titanic data,
which is the name of my data set
and I'll say is null.
So this will basically tell
me what all values are null
and will return me
a Boolean result.
So this basically
checks the missing data
and your result will be
in Boolean format
as in the result will be true
or false so Falls mean
if it is not null
and prove means
if it is null,
so let me just run this.
Over here you can see
the values as false or true.
So Falls is where the value is
not null and Drew is
where the value is none.
So over here you can see
in the cabin column.
We have the very first value
which is null so we have to do
something on this so you can see
that we have a large data set.
So the counting does not stop
and we can actually
see the some of it.
We can actually print
the number of passengers
who have the Nan value
in each column.
So I'll say Titanic
underscore data is null
and I want the sum of it all.
Same thought some so
this is basically print
the number of passengers
who have the n n values
in each column
so we can see
that we have missing values
in each column that is 177.
Then we have the maximum value
in the cave in column
and we have very Less
in the Embark column.
That is 2 so here
if you don't want
to see this numbers,
you can also plot a heat map
and then you can visually
analyze it let me just do
that as well.
So I'll say SNSD heat map.
And save I take labels.
False Choice run this
as we have already seen
that there were three columns
in which missing data
value was present.
So this might be age
so over here almost 20%
of each column has
a missing value.
Then we have
the cabling columns.
So this is quite a large value
and then we have two values
for embark column as well.
Add a see map for color coding.
So I'll say see map.
So if I do this
so the graph becomes
more attractive so over here
yellow stands for Drew or you
can say the values are null.
So here we have computed
that we have the missing value
of H. We have a lot
of missing values
in the cabin column
and we have very less value,
which is not even visible
in the Embark column as well.
So to remove
these missing values,
you can either replace
the values and you can put in
some dummy values to it or you
can simply drop the column.
So here let us suppose
pick the age column.
So first, let me
just plot a box plot
and they will analyze
with having a column as H.
So I'll say SNS dot box plot.
I'll say x is equals
to passenger class.
So it's p class.
I'll say Y is equal
to H and the data set
that I'll be using
is Titanic side.
So I'll say three times goes
to Titanic data.
You can see the edge
in first class and second class
tends to be more older rather
than we have it
in the third class.
Well that depends
on The Experience
how much you earn
or might be there any number
of reasons so here we concluded
that passengers who were
traveling in class one and class
two a tend to be older than
what we have in the class 3
so we have found that we have
some missing values in EM.
Now one way is to either just
drop the column
or you can just simply fill
in some values to them.
So this method is called
as imputation now
to perform data wrangling
or cleaning it is for spring
the head of the data set.
So I'll say
tightening knot head.
So it's Titanic.
Data, let's say I
just want the five rows.
So here we have survived
which is again categorical.
So in this particular column,
I can apply
logic to progression.
So this can be my y value
or the value
that you need to predict.
Then we have
the passenger class.
We have the name.
Then we have ticket number.
We're taping so over here.
We have seen that in keeping.
We have a lot of null values
or you can say that any invalid
which is quite visible as well.
So first of all,
we'll just drop this column
for dropping it.
I'll just say
Titanic underscore data.
And I'll simply type
in drop and the column
which I need to draw so I
have to drop the cable column.
I mention the access equals
to 1 and I'll say
in place also to true.
So now again, I just print
the head and let us see
whether this column
has been removed
from the data set or not.
So I'll say Titanic dot head.
So as you can see here,
we don't have
given column anymore.
Now, you can also
drop the na values.
So I'll say
Titanic data dot drop
all the any values
or you can say Nan
which is not a number
and I will say in place is equal
to True its Titanic.
So over here,
let me again plot
the heat map and let's say
for the values we should before
showing a lot of null values.
Has it been removed or not.
So I'll say SNS dot heat map.
I'll pass in the data set.
I'll check it is null.
I'll say why tick labels
is equal to false.
And I don't want color coding.
So again I say false.
So this will basically
help me to check
whether my values
has been removed
from the data set or not.
So as you can see here,
I don't have any null values.
So it's entirely black now.
You can actually know
the some as well.
So I'll just go above So
I'll just copy this part
and I just use the sum function
to calculate the sum.
So here the tells me
that data set is clean as
in the data set does not contain
any null value or any Nan value.
So now we have R Angela data.
You can see cleaner data.
So here we have done just
one step in data wrangling
that is just removing
one column out of it.
Now you can do a lot
of things you can actually
fill in the values
with some other values
or you can just
calculate the mean
and then you can just fit
in the null values.
But now if I see my data set,
so I'll say
Titanic data dot head.
But now if I see you over here I
have a lot of string values.
So this has to be converted
to a categorical variables
in order to implement
logistic regression.
So what we will do
we will convert this
to categorical variable
into some dummy variables and
this can be done using pandas
because logistic regression
just take two values.
So whenever you apply machine
learning you need to make sure
that there are
no string values present
because it won't be taking
these as your input variables.
So using string you don't have
to predict anything but
in my case I have the survived
columns 2210 how many?
People tend to survive
and how many did not so CEO
stands for did not survive
and one stands for survive.
So now let me just
convert these variables
into dummy variables.
So I'll just use pandas
and a say PD not get dummies.
You can simply press
tab to autocomplete
and say Titanic data
and I'll pass the six
so you can just simply click
on shift + tab to get
more information on this.
So here we have
the type data frame
and we have the passenger ID
survived and passenger class.
So if Run this you'll see
that 0 basically stands
for not a female and one stand
for it is a female similarly
for male 0 Stanford's not made
and one Stanford may now we
don't require both these columns
because one column
itself is enough to tell us
whether it's male
or you can say female or not.
So let's say if I want
to keep only male I'll say
if the value of mail is 1
so it is definitely a maid
and is not a female.
So that is how you don't need
both of these values.
So for that I just
remove the First Column,
let's say a female so
I'll say drop first.
Andrew it has given
me just one column
which is male and has
a value 0 and 1.
Let me just set this
as a variable hsx so
over here I can say sex dot head
and just want to see
the first five rows.
Sorry, it's dot.
So this is how my data
looks like now here.
We have done it for sex.
Then we have
the numerical values in age.
We have the numerical
values in spouses.
Then we have the ticket number.
We have the pair and we
have embarked as well.
So in Embark the values are in.
C and Q so here also we can
apply this get dummy function.
So let's say I
will take a variable.
Let's say embark.
I'll use the pandas Library.
I'll enter the column name
that is embarked.
Let me just print
the head of it.
So I'll say Embark
dot head so over here.
We have c q and s now here also
we can drop the First Column
because these two
values are enough
with the passenger
is either traveling for Q.
That is Q in stone S4 sound time
and if both the values
are 0 then definitely
the passenger is from Cherbourg.
That is the third value
so you can again
drop the first value.
So I'll say drop and true.
Let me just run this.
So this is how my output looks
like now similarly you can do it
for The class as well.
So here also we have
three classes one two,
and three so I'll just
copy the whole statement.
So let's say I want
the variable name.
Let's say PCL.
I'll pass in the column name
that is PE class and I'll just
drop the First Column.
So here also the values
will be 1 2 or 3
and I'll just remove
the First Column.
So here we just left
with two and three so
if both the values
are 0 then definitely
the passengers travelling
in the first class now,
we have made the values
as categorical now,
my next step would be
to concatenate all
these new rules into a data set.
We can see Titanic data using
the pandas will just concatenate
all these columns.
So I'll Superior.
One cat and then say
if we have to concatenate sex,
we have to concatenate
embarked and PCL
and then I will mention
the access to one.
I'll just run this can you
to print the head so
over here you can see
that these columns
have been added over here.
So we have the mail column
with basically tells
where the person is male
or it's a female then
we have the Embark
which is basically q
and s so if it's traveling
from Queenstown value
would be one else it
would be 0 and If both
of these values are zeroed,
it is definitely
traveling from Cherbourg.
Then we have the passenger
class as 2 and 3.
So the value of both
these is 0 then passengers
travelling in class one.
So I hope you got this
till now now these are
the irrelevant columns
that we have it
over here so we can just
drop these columns will drop
in PE class the embarked column
and the sex column.
So I'll just type
in Titanic data dot drop
and mention the columns
that I want to drop.
So I say I even read
the passenger ID
because it's nothing
but just the index value
which is starting from one.
So I'll drop this as well then
I don't want name as well.
So I'll delete name as well.
Then what else we can drop we
can drop the ticket as well.
And then I'll just
mention the axis.
I'll say in place
is equal to True.
Okay.
So now my column name
starts uppercase.
So these has been dropped now,
let me just bring
my data set again.
So this is my final
leader said guys,
we have the survived column
which has the value 0
and 1 then we have
the passenger class or we
forgot to drop this as well.
So no worries.
I'll drop this again.
So now let me just run this.
So over here we
have the survive.
We have the age.
We have the same SP.
We have the part.
We have Fair mail and these
we have just converted.
So here we have just
performed data angle.
You can see clean the data
and then we have just
converted the values of gender
to male then embarked to q
and s and the passenger
Class 2 2 & 3.
So this was all
about my data wrangling
or just cleaning the data then
my next up is training
and testing your data.
So here we will split
the data set into train subset
and test steps.
And then what we'll do
we'll build a model
on the train data
and then predict the output
on your test data set.
So let me just go
back to Jupiter
and it is implement
this as well over here.
I need to train my data set.
So I just put this
indeed heading 3.
So over here,
you need to Define
your dependent variable
and independent variable.
So here my Y is the output
for you can say the value
that you need to
predict so over here,
I will write Titanic data.
I'll take the column
which is survive.
So basically I have
to predict this column
whether the passenger
survived or not.
And as you can see we have
the discrete outcome,
which is in the form of 0
and 1 and rest all the things we
can take it as a features or you
can say independent variable.
So I'll say Titanic data.
Not drop so we just
simply drop the survive
and all the other columns
will be my independent variable.
So everything else as
a features which leads
to the survival rate.
So once we have defined
the independent variable
and the dependent variable
next step is to split
your data into training
and testing subset.
So for that we will
be using SK loan.
I just type in from sklearn
dot cross validation.
import train test plate Now here
if you just click
on shift and tab,
you can go to the documentation
and you can just see
the examples over here.
I second class to open it
and then I just go
to examples and see
how you can split your data.
So over here you have
extra next test wide range
why test and then using
this train test platelet
and just passing
your independent variable
and dependent variable
and just Define a size
and a random straight to it.
So, let me just copy this
and I'll just paste over here.
Over here we will train test
then we have the dependent
variable train and test
and using the split function
will pass in the independent
and dependent variable
and then we'll set a split size.
So let's say I'll put it up 0.3.
So this basically means
that your data set
is divided in 0.3
that is in 70/30 ratio,
and then I can add
any random straight to it.
So let's say I'm applying
one this is not necessary.
If you want the same result
as that of mine,
you can add the random shape.
So this will basically
take exactly the same sample
every Next I have to train
and predict by creating a model.
So here logistic
regression will graph
from the linear regression.
So next I'll just type in
from SK loan dot linear model
import logistic regression.
Next I'll just create
the instance of this
logistic regression model.
So I'll say log model is equals
to largest aggression now.
I just need to fit my model.
So I'll say log model dot fit
and I'll just pass
in my ex train.
and white rain Alright,
so here it gives
me all the details
of logistic regression.
So here it gives me the class
way dual fit intercept
and all those things then
what I need to do,
I need to make prediction.
So I'll take a variable
and checked addictions
and I'll pass
on the model to it.
So I'll say log model dot
predict and I'll pass
in the value that is X test.
So here we have just
created a model fit
that model and then we
had made predictions.
So now to evaluate how my model
has been performing.
So you can simply
calculate the accuracy
or you can also calculate
a classification report.
So don't worry guys.
I'll be showing both
of these methods.
So I'll say
from sklearn dot matrix
input classification report.
It's all here are used
as fiction report.
And inside this I'll be
passing in white test
and the predictions.
So guys this is
my classification report.
So over here,
I have the Precision.
I have the recall.
We have the advanced code
and then we have support.
So here we have the value
of decision as 75 72 and 73
which is not that bad now
in order to calculate
the accuracy as well.
You can also use the concept
of confusion Matrix.
So if you want to print
the confusion Matrix,
I will simply say
from sklearn dot matrix import
confusion Matrix first of all,
and then we just print this So
how my function has
been imported successfully
so I'll say confusion Matrix.
And again passing
the same variables
which is why
test and predictions.
So I hope you guys already know
the concept of confusion Matrix.
So I just tell you
in a brief what
confusion Matrix is all about?
So confusion Matrix is nothing
but a 2 by 2 Matrix
which has a four outcomes.
This basically tells us that
how accurate your values are.
So here we have
the column as predicted.
No predicted.
Why?
And we have actual no
and then actually yes.
So this is the concept
of confusion Matrix.
So here let me just fade
in these values
which we have just calculated.
So here we have 105.
105 2125 and 63 So
as you can see here,
we have got four outcomes now
105 is the value
where a model has predicted.
No, and in reality.
It was also a no so
where we have predicted know
an actual know similarly.
We have 63 as a predicted.
Yes.
So here the model predicted.
Yes, and actually
also it was a yes.
So in order to
calculate the accuracy,
you just need to add the sum
of these two values and divide
the whole by the some.
So here these two values
tells me where the order
has actually predicted
the correct output.
This value is also
called as true-
This is called
as false positive.
This is called as true positive
and this is called
a false negative.
Now in order to
calculate the accuracy.
You don't have
to do it manually.
So in Python,
you can just import
accuracy score function
and you can get
the results from that.
So I'll just do that as well.
So I'll say from sklearn
dot-matrix import accuracy score
and I'll simply
print the accuracy
and we'll pass in
the same variables.
That is why it is
and predictions so over.
Here, it tells me the address.
He has 78 which is quite good so
over here if you want to do it
manually, we have 2
plus these two numbers,
which is 105 263.
So this comes out to almost 168
and then you have to divide
by the sum of all
the phone numbers.
So 105 plus 63 plus 21 plus 25,
so this gives me
a result of to 1/4.
So now if you divide
these two number,
you'll get the same accuracy
that is 78 percent or you
can say point seven eight.
So that is how you
can calculate the See,
so now let me just go back
to my presentation.
I let's see what all we
have covered till now.
So here we have first
plate our data into train
and test subset then
we have build a model
on the train data
and then predicted the output
on the test data set
and then my fifth step
is to check the accuracy.
So here we have calculator
accuracy to almost 78 percent
which is quite good.
You cannot say
that accuracy is bad.
So here it tells me
how accurate your results are
so him accuracy score defines
that and hence got
a good accuracy.
So now moving ahead.
Let us see the second project
that is SUV data analysis.
So in this a car company has
released new SUV in the market
and using the previous data
about the sales of their SUV.
They want to predict
the category of people
who might be interested
in buying this.
So using the
logistic regression,
you need to find what factors
made people more interested
in buying this SUV.
So for this let us hear data set
where I have user ID.
I have gender as male
and female then we have
the age we have the estimated.
Melody and then we have
the purchased column.
So this is my discreet column
or you can see
the categorical column.
So here we just have the value
that is 0 and 1 and this column
we need to predict
whether a person can actually
purchase a SUV or Not.
So based on these factors,
we will be deciding
whether a person can actually
purchase a SUV or not.
So we know the salary
of a person we know the age
and using these we can predict
whether person can actually
purchase SUV or not.
So, let me just go
to my jupyter notebook
and it is Implement
a logistic regression.
So guys, I I will not be going
through all the details
of data cleaning and analyzing
the part start part.
I'll just leave it on you.
So just go ahead
and practice as much as you can.
Alright, so the second project
is SUV predictions.
So first of all,
I have to import
all the libraries
so I say import numpy as
NP and similarly.
I'll do the rest of it.
Alright, so now let
me just print the head
of this data set.
So this we have already seen
that we have columns as user ID.
We have gender.
We have the H we have the salary
and then we have to calculate
whether person can actually
purchase a SUV or not.
So now let us just simply go on
to the algorithm part.
So we'll directly start off
with the logistic regression
on how you can train a model.
So for doing all those things,
we first need to Define
your independent variable
and dependent variable.
So in this case,
I want my ex at is
an independent variable is
a data set.
I lock so here I will
be specifying all the School
and basically stands for that
and in the columns,
I want only two and
three dot values.
So here we should fetch
me all the rows
and only the second
and third column which is age
and estimated salary.
So these are the factors
which will be used to predict
the dependent variable
that is purchase.
So here my dependent
variable is purchase
and independent variable
is of age and salary
so I'll say Lena said dot
I love I'll have all the rows
and add just one fourth column.
That is my purchased column.
You don't values.
All right, so I just forgot
when one square
bracket over here.
Alright, so over here.
I have defined my independent
variable and dependent variable.
So here my independent variable
is age and salary
and dependent variable
is the column purchase.
Now, you must be wondering
what is this?
I lock function.
So I look function is basically
an index of a panda's data frame
and it is used
for integer based indexing
or you can also say
selection by index now,
let me just bring
these independent variables
and dependent variable.
If I bring the independent
variable I have age as
well as a salary next.
Let me print the dependent
variable as well.
So over here you can see I
just have the values in 0
and 1 so 0 stands
for did not purchase next.
Let me just divide my data set
into training and test subset.
So I'll simply write in
from sklearn dot cross plate
not cross-validation.
Import drain test next I'll just
press shift + Tab
and over here.
I'll go to the examples
and just copy the same line.
So I'll just copy this.
As move the points now,
I want to text size
to be let's see 25,
so I have divided the train
in tested in 75/25 ratio.
Now, let's say I'll take
the random set of 0 So
Random State basically
ensures the same result
or you can say the same samples
taken whenever you run the code.
So let me just run this now.
You can also scale
your input values
for better performing
and this can be done
using standard scalar.
So let me do that as well.
So I'll say from sklearn
Dot pre-processing.
Import standard scale now.
Why do we scale it now?
If you see a data set we
are dealing with large numbers.
Well, although we are using
a very small data set.
So whenever you're working
in a prod environment,
you'll be working
with large data set we
will be using thousands and
hundred thousands of you pulls
so they're scaling
down will definitely
affect the performance
by a large extent.
So here let me just show you
how we can scale down
these input values and then
the pre-processing contains all
your methods & functionality,
which is Required
to transform your data.
So now let us scale down
for test as well as
a training data set.
So else First Make
an instance of it.
So I'll say standard scalar.
Then I have Extreme sasc Dot
fit fit underscore transform.
I'll pass in my Xtreme video.
And similarly I can do
it for test wherein
I'll pass the X test.
All right.
Now my next step is
to import logistic regression.
So I'll simply apply
logistic regression
by first importing it.
So I'll say from sklearn sklearn
the linear model import
logistic regression over here.
I'll be using classifier.
So I said classifier dot
is equals to logistically
aggression so over here,
I just make an instance of it.
So I'll say logistic
regression and over here.
I just pass in the random state,
which is 0 No,
I simply fit the model.
And I simply passing
next rain and white rain.
So here it tells
me all the details
of logistic regression.
Then I have to
predict the value.
So I'll say why I prayed
it's equal to classifier.
Then predict function
and then I just pass in X test.
So now we have
created the model.
We have scaled down
our input values.
Then we have applied
logistic regression.
We have predicted the values
and now we want
to know the accuracy.
So now the accuracy first we
need to import accuracy scores.
So I'll say from sklearn dot
matrix input accuracy school
and using this function we
can calculate the accuracy
or you can manually do
that by creating
a confusion Matrix.
So I'll just pass.
my lightest and my y
predicted All right,
so over here I get the accuracy
as 89% So we want to know
the accuracy in percentage.
So I just have to multiply it
by a hundred and if I run this
so it gives me 89%
So I hope you guys are clear
with whatever I
have taught you today.
So here I have taken
my independent variable as age
and salary and then
we have calculated
that how many people
can purchase SUV
and then we have calculated
our model by checking
the accuracy so over here
we get the accuracies
89 which is great.
Alright guys that is
it for today.
So I'll Scoffs what all we have
covered in today's training.
First of all,
we had a quick introduction
to what is regression
and where the regression
is actually use then
we have understood
the types of regression
and then got into the details
of what and why
of logistic regression
of compared linear was
in logistic regression.
We have also seen
the various use cases
where you can Implement
logistic regression in real life
and then we have picked
up two projects
that is Titanic data analysis
and SUV prediction so
over here we have seen
how we can collect your data
analyze your data then perform.
Modeling on that data train
the data test the data
and then finally
have calculated the accuracy.
So in your SUV prediction,
you can actually
analyze clean your data
and you can do a lot of things
so you can just go ahead
pick up any data set
and explore it as much as you
can open your eyes and see
around you will find
dozens of applications
of machine learning
which you are using
and interacting with
in your daily life peed
be using the phase detection.
And Facebook are getting
the recommendation
for similar products
from Amazon machine learning
is applied almost everywhere.
So hello and welcome all
to this YouTube session
will learn about
how to build a decision tree.
This session is
designed in a way
that you get most out of it.
Alright.
So this decision tree is a type
of classification algorithm
which comes under these
supervised learning technique.
So before learning
about decision tree,
I'll give you a short
introduction to classification
where we'll learn about.
What is classification
what I'd say,
Various types where it is used
or what I'd see use cases now,
once you get your fundamental
clear will jump
to the decision tree
part under this.
First of all, I will teach
you to mathematically
create a decision tree
from scratch then once you
get your Concepts clear,
we'll see how you can write
a decision tree classifier
from scratch in Python
using the card algorithm.
All right.
I hope the agenda is scared you
guys what is classification?
I hope every one of you
must have used Gmail.
So how do you think the male
is getting classified as Spam
or not spam mail.
Well, there's nothing
but classification So
What It Is Well
classification is the process
of dividing the data set
into different categories
or groups by adding label.
In other way, you can say
that it is a technique
of categorizing the observation
into different category.
So basically what you
are doing is you are taking
the data analyzing it
and on the basis
of some condition
you finely divided
into various categories.
Now, why do we classify it?
Well, we classify it to perform
predictive analysis on it.
Like when you get the mail
the machine predicts it
to be a Spam or not spam mail
and on the basis
of that prediction it
add the irrelevant or spam mail
to the respective folder
in general this classification.
Algorithm handle questions.
Like is this data belongs
to a category or B category?
Like is this a male or is this
a female something like that now
the question arises
where will you use it?
Well, you can use this
of protection order
to check whether
the transaction is genuine
or not suppose I am using.
A credit card here
in India now due to some reason
I had to fly to Dubai now.
If I'm using the credit
card over there,
I will get a notification alert
regarding my transaction.
They would ask me to confirm
about the transaction.
So this is also kind
of predictive analysis
as the machine predicts
that something fishy is
in the transaction
as very for our ago.
I made the transaction using
the same credit card and India
and 24 hour later.
The same credit card is being
used for the payment in Dubai.
So the Machine predicts
that something fishy is going on
in the transaction.
So in order to confirm it it
sends you a notification alert.
All right.
Well, this is one of
the use case of classification
you can even use it
to classify different items
like fruits on the base
of its taste color size
overweight a machine.
Well trained using
the classification algorithm
can easily predict the class
or the type of fruit whenever
new data is given to it.
Not just the fruit.
It can be any item.
It can be a car.
It can be a house.
It can be
a I'm bored or anything.
Have you noticed
that while you visit some sites
or you try to login
into some you get
a picture capture for that right
where you have to identify
whether the given image is of
a car or its of a pole or not?
You have to select it
for example that 10 images
and you're selecting
three Mages out of it.
So in a way you are training the
machine right you are telling
that these three are the picture
of a car and rest are not
so who knows you are training at
for something big right?
So moving on ahead.
Let's discuss the types.
S of classification online.
Well, there are
several different ways
to perform the same tasks
like in order to predict
whether a given person is a male
or a female the machine
had to be trained first.
All right,
but there are multiple ways
to train the machine and you
can choose any one of them just
for Predictive Analytics.
There are many different
techniques but the most
common of them all is
the decision tree,
which we'll cover in depth
in today's session.
So as a part of classification
algorithm we have
decision tree random Forest name
buys k-nearest neighbor.
Logistic regression
linear regression support
Vector machines and so
on there are many.
Alright, so let me give
you an idea about few
of them starting
with decision tree.
Well decision tree is
a graphical representation
of all the possible solution
to a decision the decisions
which are made they
can be explained very easily.
For example here is a task,
which says that should I go
to a restaurant
or should I buy a hamburger
you are confused on that.
So for that what you will do,
you will create a dish entry
for it starting
with the root node
will be first of all,
you will check
whether you are hungry or not.
All right,
if you're not hungry then
just go back to sleep.
Right?
If you are hungry
and you have $25 then you
will decide to go to restaurant.
And if you're hungry
and you don't have $25,
then you will just
go and buy a hamburger.
That's it.
All right.
So there's about decision tree
now moving on ahead.
Let's see.
What is a random Forest.
Well random Forest build
multiple decision trees
and merges them together
to get a more accurate
and stable production.
All right, most of the time
random Forest is trained
with a bagging method.
The bagging method
is based on the idea
that the combination
of learning model increases
the overall result.
If you are combining the
learning from different models
and then clubbing it together
what it will do it will Increase
the overall result fine.
Just one more thing.
If the size of your
data set is huge.
Then in that case one single
decision tree would lead
to our Offutt model same way
like a single person
might have its own perspective
on the complete population as
a population is very huge.
Right?
However, if we implement
the voting system and ask
different individual
to interpret the data,
then we would be able
to cover the pattern
in a much meticulous way even
from the diagram.
You can see that in section A
we have Howard large
training data set what we do.
We first divide
our training data set
into n sub-samples on it
and we create a decision tree
for each cell sample.
Now in the B part
what we do we take the vote
out of every decision made by
every decision tree.
And finally we Club
the vote to get
the random Forest dition fine.
Let's move on ahead.
Next.
We have neighbor Buys.
So named by is is
a classification technique,
which is based on Bayes theorem.
It assumes that It's
of any particular feature
in a class is completely
unrelated to the presence
of any other feature
named buys is simple
and easy-to-implement algorithm
and due to a Simplicity
this algorithm might out
perform more complex model
when the size of the data set
is not large enough.
All right, a classical use case
of name bias is
a document classification.
And that what you
do you determine
whether a given text corresponds
to one or more categories
in the text case,
the features used might be
the presence or absence.
Absence of any keyword.
So this was about Nev
from the diagram.
You can see
that using neighbor buys.
We have to decide
whether we have
a disease or not.
First what we do we
check the probability
of having a disease
and not having the disease
right probability
of having a disease is 0.1
while on the other hand
probability of not having
a disease is 0.9.
Okay first, let's see
when we have disease
and we go to the doctor.
All right, so when we
visited the doctor
and the test is positive
Adjective so probability
of having a positive test
when you're having a disease
is 0.8 0 and probability
of a negative test
when you already have
a disease that is 0.20.
This is also a false negative
statement as the test
is detecting negative,
but you still have
the disease, right?
So it's a false
negative statement.
Now, let's move ahead
when you don't have
the disease at all.
So probability of not having
a disease is 0.9.
And when you visit the doctor
and the doctor is like, yes,
you have the disease.
But you already know
that you don't have the disease.
So it's a false
positive statement.
So probability of having
a disease when you actually
know there is no disease
is 0.1 and probability
of not having a disease
when you actually know
there is no disease.
So and the probability
of it is around 0.90 fine.
It is same as probability
of not having a disease
in the test is showing
the same results
a true positive statement.
So it is 0.9.
All right.
So let's move on ahead and
discuss about kn n algorithm.
So this KNN algorithm
or the k-nearest neighbor,
it stores all
the available cases
and classifies new cases based
on the similarity measure the K
in the KNN algorithm as
the nearest neighbor,
we wish to take vote
from for example,
if k equal 1 then the object
is simply assigned to the class
of that single nearest neighbor
from the diagram.
You can see the difference
in the image
when k equal 1 k equal 3
and k equal 5, right?
Well the And systems
are now able to use
the k-nearest neighbor
for visual pattern
recognization to scan
and detect hidden packages
in the bottom bin
of a shopping cart
at the checkout
if an object is detected
which matches exactly
to the object listed
in the database.
Then the price of the spotted
product could even
automatically be added
to the customers Bill
while this automated
billing practice is not used
extensively at this time,
but the technology
has been developed
and is available for use
if you want you can
just use It and yeah,
one more thing k-nearest
neighbor is also used
in retail to detect patterns
in the credit card
uses many new
transaction scrutinizing
software application
use Cayenne algorithms
to analyze register data
and spot unusual pattern
that indicates a
species activity.
For example,
if register data indicates
that a lot
of customers information
is being entered manually rather
than to automated scanning
and swapping then in that case.
This could indicate
that the employees
were using the register.
Are in fact stealing customers
personal information or
if I register data indicates
that a particular good
is being returned
or exchanged multiple times.
This could indicate
that employees are misusing
the return policy
or trying to make money
from doing the fake returns.
Right?
So this was about KNN algorithm
since our main focus
for this session will be
on decision tree.
So starting with
what is decision tree,
but first, let me tell
you why did we choose
the Gentry to start with?
Well, these decision tree
are really very easy.
Easy to read and understand it
belongs to one of the few models
that are interpretable
where you can understand exactly
why the classifier has made
that particular decision right?
Let me tell you a fact
that for a given data set.
You cannot say
that this algorithm performs
better than that.
It's like you cannot say that
the Asian trees
better than a buys
or name biases performing better
than decision tree.
It depends on
the data set, right?
You have to apply hit
and trial method with all
the algorithms one by one
and then compare the The model
which gives the best
result as a model
which you can use
at for better accuracy
for your data set.
All right.
So let's start with
what is decision tree.
Well a decision tree is
a graphical representation
of all the possible solution
to our decision based
on certain conditions.
Now, you might be wondering why
this thing is called
as decision tree.
Well, it is called so
because it starts with the root
and then branches off
to a number of solution just
like a tree right
even the trees.
Starts from a roux and it
starts growing its branches
once it gets bigger
and bigger similarly
in a decision tree.
It has a roux
which keeps on growing with
increasing number of decision
and the conditions now,
let me tell you
a real life scenario.
I won't say that all of you,
but most of you
must have used it.
Remember whenever you dial
the toll-free number
of your credit card company.
It redirects you
to his intelligent
computerised assistant
where it asks
you questions like,
press one for English
or press 2 for Henry,
press 3 for this press
4 for that.
Great now once you
select one now again,
it redirects you
to a certain set
of questions like press
1 for this press 1 for that
and similarly, right?
So this keeps on repeating
until you finally get
to the right person, right?
You might think
that you are caught
in a voicemail hell
but what the company
was actually doing it
was just using a decision tree
to get you to the right person.
I lied.
I'd like you to focus
on this particular image
for a moment on
this particular slide.
You can see I image
where the task is.
Should I accept a new job offer?
Or not.
All right, so you have
to decide that for that
what you did you created
a decision tree starting
with the base condition
or the root node.
Was that the basic salary
or the minimum salary
should be $50,000
if it is not $50,000.
Then you are not at all
accepting the offer.
All right.
So if your salary is
greater than $50,000,
then you will further check
whether the commute is
more than one hour or not.
If it is more than one are you
will just decline the offer
if it is less than one hour,
then you are getting closer
to accepting the job offer.
Photo what you will
do you will check
whether the company
is offering free coffee or not.
Right if the company
is not offering the free coffee,
then you will just declined off
and if it is offering
the free coffee and
yeah, you will happily
accept the offer right there
are just an example
of a decision tree.
Now, let's move ahead
and understand a decision tree.
Well, here is a sample data set
that I will be using
it to explain you
about the decision tree.
Alright in this data set
each row is an example
and the first two columns
provide features.
Attributes that describes
the data and the last column
gives the label or the class
we want to predict and
if you like you can just
modify this data
by adding additional features
and more example
and our program will work
in exactly the same way fine.
Now this data set
is pretty straightforward
except for one thing.
I hope you have noticed that
it is not perfectly separable.
Let me tell you something
more about that as
in the second
and fifth examples,
they have the same features,
but different labels,
both are Yellow as a Colour
and diameter as three,
but the labels are mango
and lemon right?
Let's move on and see
how our decision tree
handles this case.
All right, in order
to build a tree will use
a decision tree algorithm
called card this card algorithm
stands for classification
and regression tree
algorithm online.
Let's see a preview
of how it works.
All right to begin with We'll
add a root note for the tree
and all the nodes receive a list
of rows as input and the root
will receive the entire.
Training data set now each node
will ask true and false question
about one other feature.
And in response
to that question will split
or partition the data set
into two different subsets
these subsets then become
input to child node.
We are to the tree
and the goal of the question
is to finally unmix the labels
as we proceed down or in
other words to produce
the purest possible distribution
of the labels at each node.
For example, the input
of this node contains only
one single type of label.
So we See
that it's perfectly unmixed.
There is no uncertainty
about the type of label
as it consists
of only grapes right
on the other hand the labels
in this node are still mixed up.
So we would ask another question
to further drill it down.
Right but before that
we need to understand
which question to ask and
when and to do
that we need to conduct
by how much question
helps to unmix the label
and we can quantify
the amount of uncertainty
at a single node using a metric.
Called gini impurity
and we can quantify
how much a question reduces
that uncertainty using a concept
called Information Gain will use
these to select the best
question to ask at each point.
And then what we'll do
we'll iterate the steps
will recursively build the tree
on each of the new node
will continue dividing the data
until they are
no further question to ask
and finally we
reach to our Leaf.
Alright, alright.
So this was about decision tree.
So in order to create
a decision tree,
first of all what you have
to do you have to identify
A different set of questions
that you can ask to a tree
like is this color green
and what will be these question?
These questions will be decided
by your data set like as
this colored green is
the diameter greater than equal
to 3 is the color yellow
right questions resembles
to your data set remember that?
All right.
So if my color is green,
then what it will do it
will divide into two parts.
First.
The Green Mango will be
in the true while on the false.
We have lemon and the Mac.
All right if the color
is green or the diameter.
Meter is greater than equal to 3
or the color is yellow
Asian tree terminologies.
So starting with root node
root node is a base node
of a tree the entire tree
starts from a root node.
In other words.
It is the first node
of a tree it represents
the entire population or sample
and this entire population
is further segregated
or divided into two or more
homogeneous set fine.
Next is the leaf node.
Well Leaf node is the one
when you reach
at the The tree right
that is you cannot
further segregated down
to any other level
that is the leaf node.
Next is splitting splitting
is dividing your root node
or node into different sub part
on the basis of some condition.
All right, then comes
the branch or the sub tree.
Well, this Branch
or subtree gets formed
when you split the tree suppose
when you split a root node,
it gets divided
into two branches
or two subtrees.
Right?
Next is the concept of pruning.
Well you can Say that pruning is
just opposite of splitting
what we are doing here.
We are just removing
the sub node of a decision tree
will see more about pruning
later in this session.
All right, let's move on ahead.
Next is parent or child node.
Well, first of all root node
is always the parent node
and all other nodes associated
with that is known
as chalky node.
Well, you can understand it
in a way that all the top node
belongs to a parent node
and all the bottom node,
which are derived
from a top node is a child node.
Node producing a further note is
a child node and the node
which is producing
it as a parent node
simple concept, right?
It's use the cart algorithm
and design a tree manually.
So first of all
what you will do you
decide which question
to ask and when so
how will you do that?
So let's first of all visualize
the decision tree.
So there's the decision tree
which will be creating manually
or like first of all,
let's have a look
at the data set.
You have Outlook
temperature humidity
and windy as your different
attribute on the basis of
that you have to predict
that whether you
can play or not.
So which one among them should
you pick first answer determine
the best attribute that
classifies the training data?
All right.
So how will you choose
the best attribute
or how does a tree decide
where to split or how the tree
will decide its root node?
Well before we move on
and split a tree there
are some terminologies
that you should know.
All right, first
being the gini index.
So what is this gini index?
The gini index is the measure
of impurity or Purity used
in building a day.
Gentry and cart algorithm.
All right.
Next is Information Gain
this Information Gain is
the decrease in entropy
after data set is split
on the basis of an attribute
constructing a decision tree is
all about finding an attribute
that Returns the highest
Information Gain.
All right, so you
will be selecting the node
that would give you
the highest Information Gain.
Alright next is
reduction in variance.
This reduction in variance
is an algorithm,
which is used for
continuous Target variable
or regression problems the split
With lower variance
is selected as a criteria
to let the population see
in general term.
What do you mean by variance?
Variance is how much
your data is wearing?
Right?
So if your data is
less impure or is more pure
than in that case
the variation would be less
as all the data
almost similar, right?
So there's also a way
of setting a tree the split
with lower variance
is selected as the criteria
to split the population.
Alright.
Next is the chi Square C Square.
It is an algorithm
which is used to find out
these statistical significance
between the Is between sub nodes
and the parent nodes fine.
Let's move ahead.
Now.
The main question is
how will you decide
the best attribute
for now just understand
that you need to calculate
something known as
Information Gain the attribute
with the highest Information
Gain is considered the best.
Yeah.
I know your next question
might be like,
what is this information again?
But before we move on and see
what exactly Information Gain
Is let me first introduce you
to a term called entropy
because this term
will be used in calculating
the Information Gain.
Mmmmmm.
Well entropy is just a metric
which measures the impurity
of something or in other words,
you can say that as
the first step to do
before you solve the problem
of a decision tree
as I mentioned is
something about impurity.
So let's move on and understand
what is impurity suppose.
You are a basket full of apples
and another Bowl
which is full of same label,
which says Apple now
if you are asked to pick
one item from each basket
and ball then the probability
of getting the apple
and it's correct label is 1
so in this case,
You can see
that impurities zero.
All right.
Now what if there are
four different fruits
in the basket and four different
labels in the bowl,
then the probability
of matching the fruit
to a label is obviously not one.
It's something less than that.
Well, it could be possible
that I picked banana
from the basket
and when I randomly picked
the label from the ball,
it says a cherry
any random permutation
and combination can be possible.
So in this case I'd say
that impurities is nonzero.
I hope the concept
of impurities care.
Are so coming back to entropy
as I said entropy is
the measure of impurity
from the graph on your left.
You can see that
as the probability
is zero or one
that has either they
are highly impure
or they are highly pure
than in that case the value
of entropy is zero.
And when the probability is 0.5,
then the value
of entropy is maximum.
Well, what is impurity
impurities the degree
of Randomness how random data is
so if the data is
completely pure in that case
the randomness equals 0 or
if the Dies completely
Empire even in that case
the value of impurity
will be zero question.
Like why is it
that the value
of entropy is maximum
at 0.5 might arise
in a mine, right?
So let me discuss about that.
Let me derive at mathematically
as you can see here
on the slide,
the mathematical formula
of entropy is -
of probability of yes,
let's move on and see
what this graph has to say
mathematically suppose s is
our total sample space
and it's divided into two parts.
Yes, and no.
No, like in our data
set the result for playing
was divided into two parts.
Yes or no,
which we have to predict
either we have to play or not.
Right?
So for that particular case,
you can Define the formula
of entropy as entropy
of total sample
space equals negative
of probability of e
is multiplied by log
of probability of years
with a base 2 minus probability
of no X log of probability
of no with base to where s
is your total sample space
and P of v s is the probability
of E. And be of known as
the probability of no, well,
if the number of yes
equal number of know
that is probability
of s equals 0.5 right since you
have equal number of yes,
and no so in that case value
of entropy will be one just
put the value over there.
All right.
Let me just move
to the next slide.
I'll show you this.
Alright next is
if it contains all Yes,
or all know that is probability
of a sample space is either 1
or 0 then in that case entropy
will be equal to 0
Let's see the
mathematically one by one.
So let's start
with the first condition
where the probability was 0.5.
So this is our formula
for entropy, right?
So there's our first case
right which we discuss the art
when the probability
of vs equal probability of node
that is in our data set.
We have equal number
of yes, and no.
All right.
So probability of yes
equal probability of no
and that equals
0.5 or in other words,
you can say that yes
plus no equal to Total sample.
He's all right,
since the probability is 0.5.
So when you put the values
in the formula you get
something like this
and when you calculate it,
you will get the entropy of
the total sample space as one.
All right.
Let's see for the next case.
What is the next case
either you have totally us
or you have totally know so
if you have total,
yes, let's see the formula
when we have totally as so
you have all yes and 0
no fine.
So probability of
e s equal 1 and yes.
Yes as the total
sample space obviously.
So in the formula
when you put that thing up here,
you get entropy
of sample space equal negative X
of 1 multiplied by log of 1
as the value of log 1 equals 0.
So the total thing will result
to 0 similarly is the case
with no even in that case,
you will get the entropy
of total sample space as 0
so this was all about entropy.
All right.
Next is what is
Information Gain?
Well Information Gain
what it does is it measures
the reduction in entropy?
It decides which attributes
should be selected
as the decision node.
If s is our total collection
than Information Gain
equals entropy,
which we calculated
just now that -
weighted average X entropy
of each feature.
Don't worry.
We'll just see
how it to calculate
it with an example.
Let's manually build
a decision tree
for our data set.
So there's our data set
which consists of
14 different instances
out of which we have nine.
Yes and five know I like
so we have the formula
for entropy just put
over that since 9 years.
So total probability
of e s equals 9
by 14 and total probability
of no equals Phi by 14
and when you put up the value
and calculate the result,
you will get the value
of entropy as 0.94.
All right.
So this was your first step
that is compute the entropy for
the entire data set only now,
you have to select
that out of Outlook
temperature humidity and windy,
which of the node should you
select as the root node
big question right?
I will Decide
that this particular node should
be chosen at the base note.
And on the basis of
that only I will be creating
the entire tree.
I will select that.
Let's see.
So you have to do it one
by one you have
to calculate the entropy
and Information Gain for all
of the different nodes.
So starting with Outlook.
So Outlook has
three different parameters
Sunny overcast and rainy.
So first of all select
how many number of years
and no are there in the case
of Sunny like when it is sunny
how many number of years
and how many number of knows?
Are there so in total we
have to yes and three Nos
and case of sunny
in case of overcast.
We have all yes.
So if it is overcast then
we will surely go to play.
It's like that.
Alright and next it
is rainy then total number
of vs equal 3 and total number
of no equals 2 fine next
what we do we
calculate the entropy
for each feature for here.
We are calculating the entropy
when Outlook equals Sunny.
First of all,
we are assuming
that Outlook is our root node
and for that we are calculating
the Can gain for it.
All right.
So in order to calculate
the Information Gain remember
the formula it was entropy
of the total sample space -
weighted average X entropy
of each feature.
All right.
So what we are doing here,
we are calculating
the entropy of Outlook
when it was sunny.
So total number of yes,
when it was Sonny was
to and total number of know
that was three fine.
So let's put up in the formula
since the probability
of yes is 2 by 5
and the probability
of no is 3 by 5.
So you will get
something like this.
All right.
So you are getting the entropy
of sunny as zero point
nine seven one fine.
Next we will calculate
the entropy for overcast
when it was overcast.
Remember it was all yes, right.
So the probability
of e is equal 1 and
when you put over
that you will get the value
of entropy as 0 fine
and when it was rainy rainy
has 3s and to nose.
So probability of e s
in case of Sonny's 3 by 5
and probability of know
in case of Sonny's 2 by 5
and when you add the You
of probability of vs
and probability of note
the formula you get the entropy
of sunny as zero point
nine seven one point.
Now, you have to calculate
how much information you
are getting from Outlook
that equals weighted average.
All right.
So what was this weighted
average total number of years
and total number of no fine.
So information from Outlook
equals 5 by 14 from
where does this 5 came over?
We are calculating
the total number of sample space
within that particular Outlook
when it was sunny, right?
So in case of Sunny there
was two years and three NOS.
All right.
So weighted average for Sonny
would be equal to 5 by 14.
All right,
since the formula was five
by 14 x entropy of each feature.
All right, so
as calculated the entropy
for Sonny is zero point
nine seven one, right?
So what we'll do we'll multiply
five by 14 with 0.97 one, right?
Well, this was
the calculation for information
when Outlook equal sunny,
but Outlook even equals
overcast and rainy.
In that case,
what we'll do again similarly
will calculate for everything
for overcast and sunny
for overcast weighted averages
for by 14 x its entropy.
That is 0 and for Sonny
it is same 5i 14-3.
Yes and two nodes X its entropy
that is zero point
nine seven one.
And finally we'll take the sum
of all of them which equals
to 0.693 right next.
We will calculate
the information gained this
what we did earlier was
Malaysian taken from Outlook.
Now.
We are calculating.
What is the information?
We are gaining
from Outlook right.
Now this Information Gain
that equals to Total entropy
minus the information
that is taken from Outlook.
All right.
So total entropy we had 0.94 -
information we took
from Outlook as 0.693.
So the value of information
gained from Outlook results
to zero point two four seven.
All right.
So next what we have to do.
Let's assume that
Wendy is our root node.
So Wendy consists of
two parameters false and true.
Let's see how many years
and how many nodes are there
in case of true and false.
So when Wendy has
Falls as its parameter,
then in that case,
it has six years
and two nodes and when it
as true as its parameter,
it has 3 S and 3 nodes.
All right.
So let's move ahead
and similarly calculate
the information taken from Wendy
and finally calculate the
information gained from Wendy.
Alright, so first of all,
what we'll do we'll calculate
the entropy of each feature.
ER starting with
windy equal true.
So in case of true we
had equal number of yes
and equal number of know.
We'll remember the graph
when we had the probability as
0.5 as total number of years
equal total number of know
and for that case
the entropy equals 1
so we can directly
write entropy of room
when it's windy is one
as we had already proved it
when probability equals 0.5
the entropy is the maximum
that equals to 1.
All right.
Next is entropy of false
when it is Vending.
I like so similarly just
put the probability of yes
and no in the formula
and then calculate the result
since you have six years
and to nose.
So in total,
you'll get the probability
of yes 6 by 8 and probability
of no as 2 by 8.
All right, so when you
will calculate it,
you will get the entropy
of false as zero point
eight one one.
Alright now, let's calculate
the information from windy.
So total information
collected from Windy
equals information taken
when Wendy equal true
plus Action taken
when Wendy equal false.
So we'll calculate the weighted
average for each one of them
and then we'll sum it up
to finally get the total
information taken from windy.
So in this case,
it equals to 8 by 14
multiplied by 0.8 1 1
plus 6 by 14 x 1.
What is this?
8 it is total number of yes, and
no in case when when D
equals false, right?
So when it was false,
so total number of BS
that equals to 6 and total more
of know that equal to 2
that some UPS to 8.
Alright, so that is
why the waiter.
Resul results to Aid by
14 similarly information taken
when windy equals
true equals to 3 plus 3
that is 3 S and 3 no equal 6
divided by total number
of sample space that is 14 x 1
that is entropy of true.
All right.
So it is 8 by 14 multiplied
by 0.8 1 1 plus 6 by 14 x one
which results to 0.89 to this
is information taken from Windy.
All right.
Now how much information
you are gaining from Wendy?
So for that what you will do,
so total information
gained from Windy
that equals to Total entropy -
information taken from Windy.
All right, that is 0.94 -
0.89 to that equals
to zero point zero four eight.
So 0.048 is the information
gained from Windy.
Similarly.
We calculated for the rest too.
So for Outlook
as you can see,
the information was 0.693,
and it's Information Gain was
zero point two four seven in
case of temperature
the information was around.
Zero point nine one one
and the Information Gain
that was equal to 0.02
9 in case of humidity.
The information gained was 0.15
to and in the case of windy.
The information
gained was 0.048.
So what we'll do
we'll select the attribute
with the maximum fine.
Now, we are selected
Outlook as our root node,
and it is further subdivided
into three different parts
Sunny overcast and rain,
so in case of overcast
we have seen
that it consists of all ears
so we can consider
it as a Leaf node,
but in case of sunny
and rainy it's doubtful
as it consists of both.
Yes and both know
so you need to recalculate
the things right again
for this node.
You have to
recalculate the things.
All right, you have to again
select the attribute
which is having
the maximum Information Gain.
All right, so there is
how your complete tree
will look like.
All right.
So, let's see when you can play
so you can play
when Outlook is overcast.
All right in that case.
You can always play
if the Outlook is sunny.
You will further drill.
Time to check
the humidity condition.
All right, if the
humidity is normal,
then you will play
if the humidity is high
then you won't play right
when the Outlook predicts
that it's raining then
further you will check
whether it's windy or not.
If it is a week went
then you will go
and offer play but
if it has strong wind,
then you won't play right?
So this is
how your entire decision tree
would look like at the end.
Now comes the concept
of pruning say is
that what should I do to play?
Well you have to do pruning
pruning will decide
how you will play.
Say what is this pruning?
Well, this pruning is nothing
but cutting down the nodes
and order to get
the optimal solution.
All right.
So what pruning does it
reduces the complexity?
All right,
as are you can see on the screen
that it showing only the result
for yes that is it showing
all the result which says
that you can play
before we drill down
to a practical session
a common question
might come in your mind.
You might think that our tree
based model better
than linear model right?
You can think like if I
can Was a logistic regression
for classification problem
and linear regression
for regression problem.
Then why there is
a need to use the tree.
Well, many of us have
this question in their mind
and well there's
a valid question too.
Well actually as I said earlier,
you can use any algorithm.
It depends on
the type of problem.
You're solving let's look
at some key factor,
which will help you to decide
which algorithm to use and
when so the first point being
if the relationship between
dependent and independent
variable as well approximated
by By a linear model,
then linear regression
will outperform tree
base model second case
if there is a high non-linearity
and complex relationship
between dependent
and independent variables
at remodel will outperform
a classical regression model
in third case.
If you need to build a model
which is easy to explain
to people a decision tree model
will always do better
than a linear model
as the decision tree models
are simpler to interpret
then linear regression.
All right.
Now let's move on ahead and see
how you can write it as
Gentry classifier from scratch
and python using
the cart algorithm.
All right for this.
I will be using
jupyter notebook with python
3.0 installed on it.
Alright, so let's
open the Anaconda
and the jupyter notebook.
Where is that?
So this is
our Anaconda Navigator
and I will directly jump over
to jupyter notebook and hit
the launch button.
I guess everyone
knows that jupyter.
Notebook is a web-based
interactive Computing notebook
environment where you
can run your python codes.
So my Jupiter notebook it opens
on my Local Host w89
1 so I will be using
this jupyter notebook
in order to write
my decision tree classifier
using python for this
decision tree classifier.
I have already written
the set of codes.
Let me explain you
just one by one.
So we'll start with initializing
our training data set.
So there's our sample data set
for which each row
is an example.
The last column is a label
and the first two columns
are the features.
If you want you can add some
more features an example
for your practice
interesting fact is
that This data set
is design and way
that the second and fifth
example have almost
the same features,
but they have different labels.
All right, so
let's move on and see
how the tree handles this case
as you can see here.
Both of them II and the fifth
column have the same features.
What did different
is just their label?
Right?
So let's move ahead.
So this is our training data
set next what we are doing we
are adding some column labels.
So they are used only
to print the trees fine.
So what we'll do we'll add
header to the columns
like the First Column is
of Close second is of diameter
and third is a label column.
All right, next
what we'll do we'll Define
a function as unique values
in which will pass the rows
and the columns.
So this function
what it will do it will find
the unique values for a column
in the data set.
So there's an example for that.
So what we are doing here,
we are passing
training data Hazard row
and column number as 0 so
what we are doing we are finding
unique values in terms of color.
And in this
since the row is training data
and the column is 1
so what you are doing here,
so we are finding the you Values
in terms of diameter fine.
So this is just an example next
what we'll do we'll Define
a function as class count
and we'll pass the rows into it.
So what it does,
it counts the number
of each type of example
within data set.
So in this function
what you are basically doing
we are counting the number
of each type for example
in the data set
or what we are doing we
are counting the unique values
for the label in the data
set as a sample.
You can see here we can pass
that entire training data set
to this particular function
as class underscore count
what it will do it will find
all the different types of Label
within the training data set
as you can see here
the unique label consists
of mango grape and lemon.
So next what we'll do.
We'll Define a function
is numeric and we'll pass
a value into it.
So what it will do
it will just test
if the value is numeric or not
and it will return if the value
is an integer or a float.
For example, you
can see is numeric.
We are passing 7
so it is an integer
so it will return in value
and if we are passing red,
it's not a numeric value, right?
So moving on ahead
where you define a class
named as question,
so This question
does this question is used
to partition the data set.
This class voted does it
just records a column number?
For example 0 for color a light
and a column value for example,
green next what we are doing
we are defining a match method
which is used to compare
the feature value in the example
to the feature values
stored in the question.
Let's see how first of all
what you are doing.
We are defining an init
function and inside
that we are passing
the self column
and the value as parameter.
So next what we do
we Define a function
as match what it Does it
compares the feature value
in an example to the feature
value in this question
when next we'll Define
a function as re PR,
which is just a helper method
to print the question
in a readable format next
what we are doing we are
defining a function partition.
Well, this function
is used to partition
the data set each row
in the data set it checks
if it match the question or not
if it does so it adds it
to the true rose or
if not then it adds
to the false Rose.
All right, for example,
as you can see, it's
partition the training data.
Based on whether the roses
are red or not here.
We are calling
the function question
and we are passing a value
of zero and read to it.
So what did we do it
will assign all the red rose
to True underscore Rose
and everything else
will be assigned
to false underscore rose fine.
Next.
What we'll do we'll Define
a gini impurity function
and inside that will pass
the list of rows.
So what it will do it will just
calculate the gini impurity
for the list of rows.
Next what we are doing here.
We defining a function
as Information Gain.
So what this Information Gain
function does it calculates
the information game
using the uncertainty
of the starting node -
the weighted impurity
of the child node.
The next function
is find the best plate.
Well, this function is used
to find the best question to ask
by iterating over
every feature of value
and then calculating
the Information Gain.
But the detail explanation
on the code,
you can find the code
in the description given below.
All right next we'll define
a class as leave
for classifying the data.
It holds a dictionary of glass
like mango for how many times
it appears in the row
from the training data
that reaches the sleeve.
Alright, next is
the decision node.
So this decision node,
it will ask a question.
This holds a reference
to the question
and the two child nodes
on the base of it.
You are deciding which node
to add further to which branch.
Alright so next.
What we are doing we
are defining a function
of build tree and inside
that we are passing
our number of rows.
So this is the function
that is used to build the tree.
So initially what we did we
Define all the various function
that we'll be using
in order to build a tree.
So let's start
by partitioning the data set
for each unique attribute,
then we'll calculate
the information gain
and then return the question
that produces the highest gain
and on the basis of that
will split the tree.
So what we are doing here,
we are partitioning
the data set calculating
the Information Gain.
And then what this is returning
it is returning the question
that is producing
the highest gain.
All right.
Now if gain equals
0 return Leaf Rose,
so what it will do.
So if you are getting
no for the gain
that is gain equals
0 then in that case
since no further question
could be asked
so what it will do it
will return a leaf fine now true
or underscore Rose
or false underscore Rose
equal partition with rose
and the question.
So if we are reaching
till this position,
then you have already found.
A feature of value
which will be used
to partition the data set then
what you will do you
will recursively build
the true branch
and similarly recursively
build the false Branch.
So return Division
and Discord node and side
that will be passing question
to branch and false front.
So what it will do it
will return a question node.
Alice question owed this
recalls the best feature
or the value to ask
at this point fine.
Now that we have built
our tree next
what we'll do we'll Define
a print underscore tree function
which will be used
to print the tree fine.
So finally what we are doing
in this particular function
that we are printing our tree
next is the classify function
which will use it to decide
whether to follow the true
Branch or the false branch
and then compared
to the feature values stored
in the node to the example.
We are considering and last
what we'll do we'll finally
print the production at Leaf.
So let's execute
it and see okay,
so there's our testing data.
All right.
So we printed all Leaf
as well now that we
have trained our algorithm
with our training data set
now it's time to test it.
So there's our testing data set.
So let's finally execute
it and see what is the result.
So this is the result you
will get so first question,
which is asked by the algorithm
is is diameter greater
than equal to 3 if it is true,
then it will further ask
if the color is yellow again,
if it is true,
then it will predict mango
as one and lemon with one.
And in case it is false,
then it will just
predict the mango.
Now.
This was the true part.
Now next coming
to diameter is not greater
than or equal to 3 then
in that case it's false
and what it will do it will just
predict the grape fine.
Okay.
So this was all
about the coding part now,
let's conclude this session.
But before concluding let me
just show you one more thing.
Now, there's a scikit-learn
algorithm cheat sheet,
which explains you
which algorithm you should use
and when all right,
let's build in
a decision tree format.
Let's see how it is built.
So first condition it will check
whether you have
50 samples or not.
If your samples
are greater than 50,
then we'll move ahead
if it is less than 50,
then you need
to collect more data
if you sample
is greater than 50,
then you have to decide
whether you want to predict
a category or not.
If you want to
predict a category,
then further you will see
that whether you
have labeled data or not.
If you have label data,
then that would
be a classification
algorithm problem.
If you don't have
the label data,
then it would be
a clustering problem.
Now if you don't want
to Category then what?
Do you want to predict
predict a quantity?
Well, if you want
to predict a quantity,
then in that case,
it would be
a regression problem.
If you don't want
to predict a quantity
and you want to
keep looking further,
then in that case,
you should go for dimensionality
reduction problems and still
if you don't want to look
and the predicting structure
is not working.
Then you have
tough luck for that.
I hope this doesn't recession
clarifies all your doubt
over decision tree algorithm.
Let's begin this tutorial
by looking at the topics
that we'll be covering today.
So first of all,
we'll start Away by getting
a brief introduction
of random forest
and then we'll go
as to see why we actually need
random Forest right?
Why not anything else
but actually random Forest.
So once we understand
it's need at first place,
then we'll go on to learn more
about what is random forest
and we'll also look at various.
Examples of random Forest
so that we get a very
clear understanding of it.
So for the will
also delve inside
in to understand the working
of random Forest as to
how exactly random Forest Works
will also watch out
the random Forest
algorithm step by step,
right so that you are able
to write any piece
of code any domain specific
algorithm on your own now,
I personally believe
that any learning is
really incomplete.
If it's not put into application
so for its completion will also
Implement random forest in r
with a very simple use case
that is diabetes prevention.
So let's get started
with the introduction then.
No, random Forest is actually
one of the classifiers
which is used for solving
classification problems.
Now since some of you
might not be really aware
of what classification is.
So let's quickly understand
classification first,
and then we'll try to related
to the random Forest.
So basically classification is
a machine learning technique
in which you already
have predefined categories
under which you can
classify your data.
So it's nothing but to
supervised learning model
where you already have a data
based on which you can train
your machine, right?
So your machine actually
learns from this data.
So whatever all
that predefined data
that you already have it
actually works as a fuel
for your machine, right?
So let's say for
an example ever wondered
how your Gmail gets to know
about the spam emails
and filters it out
from the rest of the genuine
emails any guesses.
All right.
I'll give you a hint try
to think something on the line
that what would it actually
look for what can be
the possible parameters based
on which you can decide or read.
This is a genuine email
or this is a spam email.
So there are certain parameters
that your classifier
will actually look
for like The subject line
or the text or the HTML tags
and also the IP address
of the source from where
is this mail getting
from so it will analyze
all these variables
and then it will classify
them into this Pam
or the genuine folder.
So let's say for an example
if your subject line
States like mad
or cute or pretty
and some other absurd keywords.
Your classifier is smart enough
and it's trained
in such a manner
that it will Get to know.
All right, this is
a spam email and it
will automatically filter it out
from your genuine emails.
So that is how you classify
it works basically,
so that's pretty much
about the classification now,
let's move forward and see
what always can be there
through which you can actually
perform classification.
So we have three classifiers
namely decision tree
random forest and a base,
right so speaking briefly
about Season 3 at first
so decision tree actually
splits your entire data set
in this structure of a tree
and it makes decision
at every node and hence
called decision tree.
So no big bang theory, right?
So you have certain data set.
There are certain
nodes at each node.
It will for the split
into the child nodes
and at each node.
It will make a decision.
So final decision will be
in the form of positive
and negative, right?
So let's say for an example you
want to purchase a car, right?
So what all will
be the parameters?
Let's say I have a go
and I want to purchase a car
and I will keep certain
parameters in my mind.
That would be what
exactly is my income.
What is my budget?
What is the particular brand
that I want to go for?
What is the mileage of the car?
What is the cylinder capacity
of the car and so on
and so forth, right?
So I'll make
my decision based on.
All these parameters,
right and that is how you make
decisions and further.
If you really want to know more
about decision tree as to
how it exactly works.
You can also check out our
decision tree tutorial as well.
So let's begin now
to the random Forest now.
So Random Forest
isn't in simple classifier.
Actually now, let's understand
what this war in symbol means.
So in simple methods actually.
Use multiple machine learning
algorithms to obtain
better predictive performance.
So particularly talking
about random Forest
So Random forests uses
multiple decision trees
for prediction, right?
So you are in assembling a lot
of decision trees to come up
to your final outcome.
As you can also look
here in the image
that your entire data set
is actually for the split
into three subsets,
right and each subset for Leads
to a particular decision tree.
So here you have
three decision trees
and each decision tree will lead
to certain outcome.
Now what random Forest will do
is it will compile the results
from all the decision trees
and then it will lead
to a final outcome.
Right?
So it's compiled a section of
all the multiple decision trees.
That's all about
the random Forest now,
let's see what's lies
there in a pace, right?
So naive Bayes is
very famous classifier,
which is made on a very famous
rule called Bayes theorem.
You might have studied
about Nee Bayes theorem
in your 10 standard as well.
So let's just see
what Bayes theorem describes.
So based on actually
describes the probability
of an event based on certain
prior knowledge of conditions
that might be related
to the event, right?
So for example,
if cancer is related
to age, right,
so then person's age
can be used to more
accurately assess probability
of having a cancer
than without having
the knowledge of age.
So if you know the age then it
will become handy in addicting
the occurrence of cancer
for a particular person.
Right?
So the outcome of first event
here is actually affecting
your final outcome, isn't it?
Yeah.
So this is how naive
Bayes classifier actually works.
So that was all
to give an overview
of Nave Bayes classifier.
And this were pretty much
about the types
of classifiers now,
we'll try to find out the answer
to this particular question as
to why we need
random Forest fine.
So like human beings learn
from the past experiences.
So unlike human beings
a computer does not have
experiences then how does
machine takes decisions?
Where does it learn from?
Um, well a computer
system actually learns
from the data which represents
some past experiences
of an application domain.
So now let's see
how random Forest helps
in building up in learning model
with a very simple use case
of credit risk detection.
Now needless to say
that credit card companies
have a very nested
interest in identifying
Financial transactions
that are illegitimate
and criminal in nature.
And also I would like
to mention this point
that according to the Federal
Reserve payment study Americans
used credit cards to pay
for twenty six point
two million purchases in 2012,
and the estimated loss
due to unauthorized transactions
that here was us six point
1 billion dollars now
in the banking industry
measuring risk is very critical
because the stakes are too high.
So the overall goal is
actually to figure out Out
who all can be fraudulent
before too much Financial
damage has been done.
So for this a credit card
company receives thousands
of applications for new cards
and each application
contains information
about an applicant, right?
So so here as you can see
that from all those applications
what we can actually
figure out is
that predictor variables.
Like what is the marital
status of the person?
What is the gender
of the person?
The age of the person
and the status
which is actually
whether it is a default pair
or a non-default pair.
So default payments are
basically when payments
are not made in time
and according to the agreement
signed by the cardholder.
So now that account is actually
set to be in the default.
So you can easily
figure out the history
of the particular card holder
from this then we can also look
at the time of payment
whether he has been
a regular pair or not.
Regular one, what is
the source of income
for that particular person?
And so and so forth.
So to minimize loss
the back actually needs
certain decision rule to predict
whether to approve
a particular loan of
that particular person or not.
Now here is where the random
Forest actually comes
into the picture right now.
Let's see how random
Forest can actually help us
in this particular scenario.
Now, we have taken
randomly two parameters.
Out of all
the predictive variables
that we saw previously now,
we have taken two
predictor variables here.
The first one is the income
and the second one
is the H right
and similarly parallel
it to decision trees
have been implemented
upon those predicted variables
and let's first assume the case
of the income variable, right?
So here we have divided
our income into three categories
the first one being the person
earning over 35,000.
And dollars second
from 15 to 35 thousand dollars
the third one running
in the range of 0 to
15 thousand dollars.
Now if a person
is earning over $35,000,
which is a pretty good
income pretty decent.
So now we'll check out
for the credit history.
Now the here the probability is
that if a person is earning
a good amount then
there is very low risk
that he won't be able to pay
back already earning good.
So the It is
that his application
of loan will get approved.
Right?
So there is actually low risk
or moderate risk,
but there's no real
issue of high risk
as such we can approve
the applicants request here.
Now, let's move on and watch out
for the second category
where the person
is actually earning
from 15 to 35 thousand dollars
right now here the person may
or may not pay back.
So in such scenarios
will look for the credit.
History as to what has been
his previous history.
Now if his previous
history has been bad
like he has been a default.
ER in the previous transactions
will definitely not consider
approving his request
and he will be at the high risk
in which is not good
for the bank.
If the previous history
of that particular applicant
is really good then we
will just to clarify our doubt
will consider another pair.
Dress.
Well, that will be on depth.
I have his already in
really high depth then
the risks again increases
and there are chances
that he might not pay
repay in the future.
So here will not accept
the request of the person
having high dipped
if the person is
in the low depth
and he has been a good pair
in his past history.
Then there are chances
that he might be back
and we can consider
approving the request
of this particular applicant.
And let's look
at the third category,
which is a person earning
from 0 to 15 thousand dollars.
Now, this is something
which actually raises I broke
and this person
will actually lie
in the category of high risk.
All right.
So the probability is
that his application of loan
would probably get rejected now,
we'll get one final outcome from
this income parameter, right?
Now let us look
at our second variable
that is age which will lead
into the second decision tree.
Now.
Let us say
if the person is Young, right?
So now we will look forward to
if it is a student now
if it is a student then
the chances are high
that he won't be
able to repay back
because he has
no learning Source, right?
So here the risks are too high
and probability is
that his application
of loan will get rejected fine.
Now if the person is Young
And he's not a student
then we'll probably go on
and look for another variable.
That is pan balance.
Now.
Let's look if the bank balance
is less than 5 lakhs.
So again the risk arises
and the probabilities
that his application
of loan will get rejected.
Now if the person
is Young is not a student
and his bank balance
of greater than 5 lakhs
is got a pretty good
and stable and balanced
then the probabilities
that his zone of application
will get approved.
Of not let us
take another scenario
if he's a senior, right?
So if he is a senior
will probably go and check out
for this credit history.
How well has he been
in his previous transactions?
What kind of a person he is like
whether he's a defaulter or is
Ananda falter now
if he is a very
fair kind of person
in his previous transactions
then again the risk arises
and the probability
of his application
getting rejected actually
increases right now.
If he has been
an excellent person as
per his transactions
in the previous history.
So now again here
there is least risk
and the probabilities
that his application
of loan will get approved.
So now here these two variables
income and age have led
to two different decision trees.
Right and these two different
decision trees actually led
to two different results.
Now what random forest does is
it will actually compile
these two different results
from these two different.
Decision trees and then finally,
it will lead
to a final outcome.
That is how random
Forest actually works.
Right?
So that is actually the motive
of the random Forest.
Now let us move forward and see
what is random Forest right?
You can get an idea
of the mechanism from the name
itself random forests.
So a collection
of trees is a fortress
that's why I called
for is probably and here
also the trees are actually
because being trained on subsets
which are being
selected at random.
And therefore they
are called random forests.
So a random forests
is a collection
or an in symbol of decision.
Eat straight head
a decision trees actually
built using the whole data
set considering all features,
but actually in random Forest
only a fraction of the number
of rows is selected
and that too at random and
a particular number of features,
which are actually selected
at random are trained
upon and that is
how the decision trees
are built upon.
Right?
So similarly number
of decision trees will be grown
and each decision tree
will result in two.
With a certain final outcome
and random Forest
will do nothing,
but actually just
compiled the results
of all those decision trees
to bring up the final result.
As you can see
in this particular figure
that a particular instance
actually has resulted
into three different
decision trees right sonar tree
one results into a final
outcome called Class A
and tree to results
into class B. Similarly tree
three results into class P
So Random Forest
will compile the results
of all these decision trees.
And it will go by the goal
of the majority voting now
since head to decision trees
have actually voted
into the favor of the Class B
that is decision tree two,
and three therefore
the final outcome will be
in the favor of the Class B.
And that is how random
Forest actually works upon.
Now one really
beautiful thing about
this particular algorithm is
that it is one
of the versatile algorithms
which is capable
of Performing both regression
as well as classification.
Now, let's try to understand
random Forest further
with a very beautiful example
or a this is my favorite one.
So let's say you want to decide
if you want to watch edge
of tomorrow or not, right?
So in this particular scenario,
you will have two different
actions to work Bond either.
You can just straight away go
to your best friend asked
him about or read.
Whether should I go
for Edge of Tomorrow?
And what will I like this movie
or you can ask a bunch?
Your friends and take
their opinion consideration
and then based
on the final results.
You can go out and watch Edge
of Tomorrow, right?
So now let's just take
the first scenario.
So where you go
to your best friend asked about
whether you should go
out to watch edge
of tomorrow or not.
So your friend will probably
ask you certain questions
like the first one
being here Jonah.
So so let's say
your friend asks you
if you really like
The Adventurous kind
of movies or not.
So you say yes,
definitely I would love to watch
it Venture kind of movie.
So the probabilities
that you will like edge
of tomorrow as well.
Since it's of Tomorrow is
also a movie of Adventure
and sci-fi kind of Jonah, right?
So let's say you do not like
the adventure John a movie.
So then again
the probability reduces
that you might really
not like edge of Morrow right.
So from here you can come
to a certain conclusion right?
Let's say your best friend puts
you into another situation
where he'll ask you
or a do you like Emily plant?
And you see definitely
I like Emily Blunt
and then he puts
another question to you.
Do you like Emily Blunt
to be in the main lead
and you say yes, then again,
the probability arises
that you will definitely
like edge of tomorrow as
well because Edge of Tomorrow
is Has the Emily plant
in the main lead cast so
and if you say oh I do not like
Emily Blunt then again,
the probability reduces
that you would like Edge
of Tomorrow to write.
So this is one way
where you have one decision tree
and your final outcome.
Your final decision will be
based on your one decision tree,
or you can see your final
outcome will be based
on just one friend.
No, definitely not
really convinced.
You want to consider the options
of your other friends also
so that you can make
very precise and crisp
decision right you go out
and you approach some other
bunch of friends of yours.
So now let's say you go
to three of your friends
and you ask them
the same question
whether I would like to watch
Age of Tomorrow or not.
So you go out and approach
three or four friends friend
one friend twin friend three.
Now, you will consider
each of their Sport
and then you will your decision
now will be dependent
on the compiled results of all
of your three friends, right?
Now here, let's say you go
to your first friend
and you ask him whether you
would like to watch it
if tomorrow not and your first
friend puts you to one question.
Did you like Top Gun?
And you say yes,
definitely I did like
the movie Top Gun
and the probabilities
that you would like
edge of tomorrow as
well because topgun is actually
a military action drama,
which is also Tom Cruise.
So now again the probability
Rises that yes,
you will like edge
of tomorrow as well and
If you say no I didn't like
Top Gun then again.
The chances are
that you wouldn't like Edge
of Tomorrow, right?
And then another question
that he puts you across is
that do you really like
to watch action movies?
And you say yes,
I would love to watch them.
Then again.
The chances are
that you would like
to watch Edge of Tomorrow.
So from your friend
when you can come
to one conclusion,
I hear since the ratio
of liking the movie
to don't like is actually 2
is to 1 so the final result.
Actually, you would like
Edge of Tomorrow.
Now you go to your second friend
and you ask the same question.
So now you are second friend
asks you did you like far
and away when we went
out and did the last time
when we washed it
and you say no I really
didn't like far and away
then you would say then
you are definitely going
to like Edge of Tomorrow.
Why does so because far
and away is actually
since most of whom
might not be knowing it so far
in a ways Johner of romance
and it revolves around a girl
and a guy Guy falling in love
with each other and so on.
So the probability is
that you wouldn't like
edge of tomorrow.
So he ask you another question.
Did you like Bolivian
and to really like
to watch Tom Cruise?
And you say Yes, again.
The probability is
that you would like
to watch Edge of Tomorrow.
Why because Oblivion
again is a science fiction
casting Tom Cruise full
of strange experiences.
And where Tom Cruise is
the savior of the masses.
Kind well,
that is the same kind of plot
in edge of tomorrow as well.
So here it is pure yes
that you would like
to watch edge of tomorrow.
So you get
another second decision
from your second friend.
Now you go to your third
friend and ask him so
probably our third friend is
not really interesting
in having any sort
of conversation with you say
just simply asks you did you
like Godzilla and you say
no I didn't like Godzilla's we
say definitely you wouldn't like
Edge of Tomorrow why
so because Godzilla is
also actually Fiction movie
from the adventure Jonah.
So now you have got
three results from
three different decision trees
from three different friends.
Now you compile the results
of all those friends
and then you make
a final call that yes,
would you like to watch edge
of tomorrow or not?
So this is some very real time
and very interesting example
where you can actually
Implement random Forest
into ground reality.
Now let us look
at various domains
where random Forest
is actually used.
So because of its diversity
random Forest is actually used
in various diverse to means
like so beat banking beat
medicine beat land use
beat marketing name it
and random Forest is there so
in banking particularly
random Forest is being
actually used to make it out
whether the applicant
will be a default a pair
or it will be Older one
so that it can accordingly
approve or reject
the applications of loan,
right?
So that is how random Forest
is being used in banking
talking about medicine.
Random.
Forest is widely used
in medicine field
to predict beforehand.
What is the probability
if a person will actually have
a particular disease or not?
Right?
So it's actually used to look
at the various disease Trends.
Let's say you want to figure
out what is the probability
that a person will have
diabetes or not?
It and so what would you do?
It'd probably look
at the medical history
of the patient
and then you will see.
All right.
This has been
the glucose concentration.
What was the BMI?
What was the insulin levels
in the patient in the past
previous three months.
What is the age
of this particular person
and do it'll make a different
decision trees based on each one
of these predictor variables
and then you'll finally
compiled the results
of all those variables and then
you will make a final decision.
As to whether the person
will have diabetes
in the near future or not.
That is how random
Forest will be used
in medicine sector now move.
Random Forest is also actually
used to find out the land use.
For example, I want to set
up a particular industry
in certain area.
So what would I probably
look for a look for?
What is the
vegetation over there?
What is the Urban
population over there?
Right and how much
is the distance
from the nearest modes
of Transport like
from the bus station
or the railway station
and accordingly.
I will split my parameters
and I will make decision
on each one of these parameters
and finally I'll compile
my decision of all
these parameters in that
will be my final outcome.
So that is how I
am finally going to predict
whether I should put my industry
at this particular
location or not.
Right?
So these three examples
have actually been of
majorly Classification problem
because we are
trying to classify
whether or not with actually
trying to answer this question
whether or not right now,
let's move forward and look
how marketing is revolving
around random Forest.
So particularly in marketing
we try to identify
the customer churn.
So this is particularly
the regression kind
of problem right now
how let's see so customer churn
is nothing but actually
the number of people
which are actually
on the number.
Of customers who are losing out.
So we're going
out of your market.
Now you want to identify
what will be your customer churn
in near future.
So you'll most of them
e-commerce Industries are
actually using this
like Amazon Flipkart Etc.
So they particularly look
at your each Behavior as to
what has been your past history.
What has been
your purchasing history.
What do you like
based on your activity
around certain things
around certain ads
around certain discounts?
And I'm certain kind
of materials right
if you would like a particular
top your activity will be more
around that particular top.
So that is how they track each
and every particular move
of yours and then
they try to predict
whether you will be
moving out or not.
So that is how they identify
the customer churn.
So these all are various domains
where random Forest is used.
And this is not the only
list so there are
numerous other examples,
which actually Lee
are using random forests
that makes it
so special actually.
Now, let's move
forward and see how random
Forest actually works.
Right.
So let us start with the random
Forest algorithm first.
Let's just see it step
by step as to how random
Forest algorithm works.
So the first step is
to actually select
certain M features from T.
Where m is less than T.
So here T is the total number
of the predictor variables
that you have
in your data set and out of
those total predictor variables.
You will select
some random Lisa.
Um few features out of those now
why we are actually selecting
a few features only.
The reason is
that if you will select all
the predictive variables
or the total predictor variables
then each of your decision tree
will be same
so we model is not actually
learning something new.
It is learning
the same previous thing
because all those decision trees
will be similar right
if you actually split
your predicted variables
and you select randomly
a few predicted variables.
Need let's say there are
14 total number of variables
and out of those
who randomly pick
just three right?
So every time you will get
a new decision tree,
so there will be
a variety right?
So the classification model
will be actually
much more intelligent
than the previous one.
Now.
It has got very yet experiences.
So definitely it will make
different decisions each time.
And then when you will compile
all those different decisions,
it will be a new more.
Are accurate and
efficient result, right?
So the first important step
is to select certain number
of features out of all
the features now,
let's move on to
the second step.
Let's say for any node D. Now.
The first step is to calculate
the best plate at that point.
So, you know that decision tree
how decision trees
actually implemented so
you pick up a the most
significant variable right?
And then you will split
that particular node.
For the child nodes,
that is how the split
takes place, right?
So you will do it
for M number of variables
that you've selected.
Let's say you
have selected three
so you will implement
the split at all.
Those three nodes
in one particular decision tree,
right the third step
is split up the node
into two daughter nodes.
So now you can split
your root note
into as many notes
as you want to but here
we'll split our node
into 2.2 notes as to this or
that so it will be an answer.
In terms of this or that right
at fourth step will be
to repeat all these three steps
that we've done previously
and we'll repeat
all this splitting
until we have reached all
the N number of nodes, right?
So we need to repeat
until we have reached
till the leaf nodes
of a decision tree that is
how we will do it right now
after these four steps.
We will have
our one decision tree.
But random Forest is actually
about Decision trees.
So here our fifth step
will come into the picture
which will actually repeat
all these previous steps
for D number of times now
hit these the the number
of decision trees.
Let's say I want to implement
five decision trees.
So my first step
will be to implement
all the previous steps 5 times.
So the head the eye tration is
4/5 number of times right now.
Once I have created
these five decision trees still
my task is not completed.
Pleat yet.
Now.
My final task will be
to compile the results
of all these five
different decision trees
and I will make a call
in the majority
voting right here.
As you can see in this picture.
I had in different instances.
Then I created
indifferent decision trees.
And finally,
I will compile the result of all
these n different decision trees
and I will take my call
on the majority voting right.
So whatever my majority vote
says It will be my final result.
So this is basically an overview
of the random Forest algorithm
how it actually works.
Let's just have a look
at this example to get
much better understanding
of what we have learnt.
So let's say I have
this data set
which consists of four
different instances, right?
So basically it consists
of the weather information
of previous 14 days right
from D1 tildy 14,
and this basically
Outlook humidity and Win,
this basically gives me
the weather condition
of those 14 days.
And finally I have play
which is my target variable
weather match did take place
on that particular day
or not right.
Now.
My main goal is to find out
whether the match
will actually take place
if I have following
these weather conditions
with me on any particular day.
Let's say the Outlook
is rainy that day
and humidity is high
and the wind is very weak.
So now I need to predict
whether I will be able
to play The match
that they are not all right.
So this is
a problem statement fine.
Now, let's see
how random Forest is used
in this to sort it out now here
the first step is to actually
split my entire data set
into subsets here.
I have split my entire
14 variables into further
smaller subsets right
now these subsets may
or may not overlap
like there is certain
overlapping between d 1
till D3 and D3 till D6.
Fine, so there is
an overlapping of D3.
So it might happen
that there might be overlapping
so you need not really worry
about the overlapping
but you have to make sure
that all those subsets are
actually different right?
So here I have taken
three different subsets
my first sub set consists of D1
till D3 Mexican subset
consists of D3
till D6 and methods subset
consists of D7 tildy.
Now now I will first be focusing
on my first subset now here,
let's say that
particular day the out
It was overcast fine.
If yes, it was overcast
then the probabilities
that the match will take place.
So overcast is basically
when your weather is too cloudy.
So if that is the condition then
definitely the match will take
place and let's say
it wasn't overcast.
Then you will consider these
second most probable option
that will be the wind
and we will make a decision
based on this now
whether wind was weak or strong
if wind was weak,
then you will definitely go out.
And play the match
else you would not.
So now the final outcome out of
this decision tree will be Play
Because here the ratio
between the play
and no play is to is to 1
so we get to a certain decision
from a first decision tree.
Now, let us look
at the second subset now
since second subset has
different number of variables.
So that is why this decision
trees absolutely different from
what we saw in our four subsets.
So let's say if it was overcast
then you will play the match.
If it isn't the overcast
and you would go and look out
for humidity now further,
it will get split into two
whether it was high or normal.
Now, we'll take the first case
if the humidity was high
and wind was week.
Then you will play
the match else
if humidity was high
but wind was too strong,
then you would not go out
and play the match right now.
Let us look at the second dot
to node of humidity
if the humidity was Oil
and the wind was weak then
you will definitely go out
and play the match
as you want go out
and play the match.
So here if you look
at the final result,
then the ratio of placed no play
is 3 is to 2 then again.
The final outcome
is actually play, right?
So from second subset,
we get the final
decision of play now,
let us look at our third subset
which consists of D7
till D9 here
if again the overcast is yes,
then you will A match
it's you will go
and check out for humidity.
And if the humidity
is really high then you
won't play the match
and you will play the match
again the probability
of playing the matches.
Yes, because the ratio
of no play is Twist one, right?
So three different subsets
three different decision trees
three different outcomes
and one final outcome
after compiling all the results
from these three different
decision trees are so I hope
this gives a better perspective
a bit understanding
of random Forest like
how it really works.
All right.
So now let's just have a look
at various features
of random Forest Ray.
So the first
and the foremost feature is
that it is one
of the most accurate
learning algorithms, right?
So why it is so
because single decision trees
are actually prone
to having high variance
or Hive bias and on
the contrary actually.
Random Forest it averages
the entire variance
across the decision trees.
So let's say
if the variances say
X4 decision tree,
but for random Forest,
let's say we have
implemented n number
of decision trees parallely.
So my entire variance
gets averaged to upon
and my final variance
actually becomes X upon n so
that is how the entire variance
actually goes down
as compared to other algorithms.
Thumbs right now second most
important feature is
that it works?
Well for both classification
and regression problems
and by far I have come
across this is one
and the only algorithm
which works equally well
for both of them.
Beh classification kind
of problem or a regression kind
of problem, right?
Then it's really runs efficient
on large databases.
So basically it's
really scalable.
Even if you work for
the lesser amount of database
or if you work for really
huge volume of data, right?
So that's a very
good part about it.
Then the fourth most
important point is
that it requires almost
no input preparation.
Now, why am I saying this is
because it has got
certain implicit methods,
which actually take care.
And remove all the outliers
and all the missing data and you
really don't have to take care
about all that thing
while you are in the stages
of input preparations.
So Random Forest is
all here to take care
of everything else and next.
Is it performs implicit
feature selection, right?
So while we are implementing
multiple decision trees,
so it has got implicit method
which will automatically
pick up some random features.
Result of all your parameters
and then it will go
on and implementing
different decision trees.
So for example,
if you just give
one simple command
that all right,
I want to implement
500 decision trees no matter
how so Random Forest
will automatically take care
and it will Implement all
those 500 decision trees
and those all 500 decision trees
will be different
from each other and this is
because it has
got implicit methods
which will automatically
collect different parameters.
Has itself out of
all the variables
that you have right,
then it can be easily grown
in parallel why it is so
because we are actually
implementing multiple
decision trees and all
those decision trees are running
or all those decisions
trees are actually
getting implemented parallely.
So if you say I want thousand
trees to be implemented.
So all those thousand trees are
getting implemented parallely.
So that is how the
computation time reduces.
Right, and the last point is
that it has got methods
for balancing error
in unbalanced it
as it's now what exactly
unbalanced data sets
are let me just give
you an example of that.
So let's say you're working
on a data set fine
and you create a random
forest model and get
90% accuracy immediately.
Fantastic you think right.
So now you start diving deep
you go a little Little deeper
and you discovered
that ninety percent
of that data actually belongs
to just one class tan
your entire data set
your entire decision
is actually biased
to just one particular class.
So Random Forest actually
takes care of this thing
and it is really not biased
towards any particular decision
tree or any particular variable
or any class.
So it has got methods
which looks after it
and they Does all the balance
of errors in your data sets?
So that's pretty much
about the features
of random forests.
K-nearest neighbor is
a simple algorithm
which uses entire data set
in its training phase
when our prediction
is required for unseen data.
What it does is it searches
through the entire training data
set for kaymu similar instances
and the data with the most
similar instance is finally
returned as the prediction.
So hello.
Oh and welcome all
to this YouTube session
and in today's session will
be dealing with KNN algorithm.
So without doing any further,
let's move on and discuss agenda
for today's session.
So we'll start our session
with what is KN
where I'll brief you
about the topic
and we'll move ahead to see
what its popular use cases
or how the industry is using KN
for their benefit.
Once we are done with it.
We will drill down
to the working of algorithm
and while learning the algorithm
you will also understand
the significance of K,
or what does this case stands
for in the nearest
neighbor algorithm?
Then we'll see
how the prediction is made
using Canon algorithm
manually or mathematically.
All right.
Now once we are done
with the theoretical concept
will start the Practical
or the demo session
where we'll learn
how to implement KNN
algorithm using python.
So let's start our session.
So starting with what
is KNN algorithm will k-nearest
neighbor is a simple algorithm
that stores all
the available cases
and classify the new data
or case based
on a similarity measure.
It suggests that
if you are similar
to your neighbors,
then you have one
of them right for example,
if apple looks more similar
to banana orange
or Melon rather than
a monkey rat or a cat
that most likely Apple belong
to the group of fruits.
All right.
Well in general Cayenne is used
in Search application
where you are looking
for similar items
that is when your task is
some form of fine items
similar to this one.
Then you call this search
as a Cayenne in search.
But what is this KN KN?
Well this K denotes the number
of nearest neighbor
which are voting class
of the new data
or the testing data.
For example,
if k equal 1 then the Sting data
are given the same label
as a close this example
in the training set similarly.
If k equal 3 the labels are the
three closes classes are checked
and the most common label is
assigned to then testing data.
So this is
what a KN KN algorithm means
so moving on ahead.
Let's see some
of the example of scenarios
where KN is used
in the industry.
So, let's see
the industrial application
of KNN algorithm starting
with recommender system.
Well the biggest use case
of cayenne and search
is a recommender system.
Thus recommended system is
like an automated.
Good form of a shop counter guy
when you asked him for a product
not only shows you the product
but also suggest you or displays
your relevant set of products,
which are related to the item.
You're already interested
in buying this KNN algorithm
applies to recommending
products like an Amazon
or for recommending media,
like in case of Netflix or even
for recommending advertisement
to display to a user
if I'm not wrong almost all
of you must have used Amazon
for shopping, right?
So just to tell you
more than 35% of
amazon.com revenue is generated
by its recommendation engine.
So what's the strategy Amazon
uses recommendation as
a targeted marketing tool
in both the email
campaigns around most
of its website Pages Amazon will
recommend many products
from different categories based
on what you have browser
and it will pull those products
in front of you
which you are likely to buy
like the frequently
bought together option
that comes at the bottom
of the product page to tempt you
into buying the combo.
Well, this recommendation
has just one main goal
that is increase average
order value or to upsell
and cross-sell customers by
providing product suggestions.
Eastern items in the shopping
cart or based on the product.
They're currently
looking at on site.
So next industrial
application of KNN
algorithm is concept search
or searching semantically
similar documents
and classifying documents
containing similar topics.
So as you know,
the data on the Internet
is increasing exponentially
every single second.
There are billions and billions
of documents on the internet
each document on the internet
contains multiple Concepts,
that could be
a potential concept.
Now, this is a situation
where the main problem is
to Extract concept
from a set of documents
as each page could have
thousands of combination
that could be potential Concepts
an average document could have
millions of concept combined
that the vast amount
of data on the web.
Well, we are talking
about an enormous amount
of data set and Sample.
So what we need is we need
to find a concept
from the enormous amount
of data set and samples, right?
So for this purpose,
we will be using KNN
algorithm more advanced example
could include handwriting
detection like an OCR
or image recognization
or even video.
Organization.
All right.
So now that you know
various use cases
of KNN algorithm.
Let's proceed and see
how does it work.
So how does
a KNN algorithm work?
Let's start by plotting
these blue and orange
point on our graph.
So these Blue Points
the belong to class A
and the orange ones
they belong to class B.
Now you get a star as a new pony
and your task is to predict
whether this new point
it belongs to class A
or it belongs to the class B.
So to start the production
the very first thing
that you have to do is
select the Value of K. Just
as I told you KN KN
algorithm refers to the number
of nearest neighbors
that you want to select.
For example, in
this case k equal to 3.
So what does it mean it means
that I am selecting three points
which are the least distance
to the new point
or you can say I am selecting
three different points
which are closest to the star.
Well at this point
of time you can ask
how will you calculate
the least distance?
So once you
calculate the distance,
you will get one blue
and two orange points which
are closest to this star now.
Since in this case
as we have a majority
of orange points,
so you can say that for k equal
3D star belongs to class B,
or you can say
that the star is more similar
to the orange points
moving on ahead.
Well, what if k equal
to 6 well for this case,
you have to look
for six different points
which are closest to this star.
So in this case
after calculating the distance,
we find that we have
four blue points
and two Orange Point
which are closest
to the star now
as you can see that the blue
points are in majority,
so you Can say
that for k equals
6 this star belongs to class A
or the star is more similar
to Blue Points.
So by now,
I guess you know
how a KNN algorithm work
and what is the significance
of gain KNN algorithm.
So how will you
choose the value of K?
So keeping in mind this case
the most important parameter
in KNN algorithm.
So, let's see when you build
a k nearest neighbor classifier.
How will you choose
a value of K?
Well, you might have
a specific value of K in mind
or you could divide up
your data and use something
like cross-validation technique
to test several values
of K in order.
To determine which works best
for your data, for example,
if n equal 2,000 cases then
in that case the optimal value
of K lies somewhere
in between 1 to 19.
But yes,
unless you try it you
cannot be sure of it.
So, you know how the algorithm
is working on a higher level.
Let's move on and see
how things are predicted
using KNN algorithm.
Remember I told you
the KNN algorithm uses
the least distance measure
in order to find
its nearest neighbors.
So, let's see how
these distance is calculated.
Well, there are
several distance measure
which can be used.
So to start with Will mainly
focus on euclidean distance
and Manhattan distance
in this session.
So what is
this euclidean distance?
Well, this euclidean distance
is defined as the square root
of the sum of difference
between a new point x
and an existing Point why
so for example here we
have Point P1 and P2 Point
T. 1 is 1 1 and point p 2 is 5
for so what is the euclidean
distance between both of them?
So you can see
that euclidean distance is
a direct distance
between two points.
So what is the distance
between the point P1 and P2
so we can calculate it
as 5 minus 1 whole square
plus 4 minus 1 whole square
and we can route it
over which results to 5.
So next is
the Manhattan distance.
Well, this Manhattan distance is
used to calculate the distance
between real Vector
using this some
of their absolute difference
in this case
the Manhattan distance
between the point P1 and P2
is Mode of 5 minus 1
plus mod value of 4 minus 1
which results to 3 plus 4
that is 7 so this slide
shows the difference
between euclidean and Manhattan
distance from point A to point
B.
So euclidean distance is
nothing but the direct
or the least possible distance
between A and B.
Whereas the Manhattan distance
is a distance between A
and B measured along the axis
at right angle.
Let's take an example and see
how things are predicted
using KNN algorithm
or how the cannon
algorithm is working.
Suppose we have a data set
which consists of height weight
and T-shirt size
of some customers.
Now when a new customer
come we only have his height
and weight as the information
now our task is to predict.
What is the T-shirt size
of that particular customer
so for this will be using
the KNN algorithm.
So the very first thing
what we need to do,
we need to calculate
the euclidean distance.
So now that you have a new data
of height 160 one centimeter
and weight are 61 kg.
So the very first thing
that we'll do is we'll calculate
the euclidean distance.
Stance which is nothing
but the square root
of 160 1 minus 158 whole square
plus 61 minus 58 whole square
and square root of that is 4.24.
Let's drag and drop it.
So these are the various
euclidean distance
of other points.
Now, let's suppose k equal
to 5 then the algorithm
what it does is it searches
for the five customer
closest to the new customer
that is most similar
to the new data in terms
of its attribute for k equal 5.
Let's find the top five
minimum euclidian distance.
So these are the distance
which we are going to use
Two three four and five.
So let's rank them
in the order first.
This is second.
This is third then
this one is for again.
This one is 5 so
there is our order.
So for k equal 5 we
have for t-shirts
which commanders size
M and one t-shirt
which comes under size l
so obviously best guess
for the best protection
for the T-shirt size
of height 160 one centimeter
and wait 60 1 kg is M.
Or you can say that a new
customer Fittin to size
M. Well this was all
about Body theoretical session,
but before we drill down
to the coding part,
let me just tell you why people
call KN as a lazy learner.
Well Cannon for classification
is a very simple algorithm,
but that's not why they are
called lazy KN is a lazy learner
because it doesn't have
a discriminative function
from the training data.
But what it does it
memorizes the training data,
there is no learning phase
of the model and all
of the work happens at the time.
Your prediction is requested.
So as such there's the reason
why KN is often referred
to us lazy learning algorithm.
So this was all about
Or detail reticle session now,
let's move on the coding part.
So for the Practical
implementation of
the Hands-On part,
I'll be using the IRS data set.
So this data set consists
of 150 observation.
We have four features
and one class label
the four features include
the sepal length sepal
width petal length
and the petrol head
whereas the class label
decides which flower belongs
to which category.
So this was the description
of the data set,
which we are using now,
let's move on and see
what are the step
by step solution
to perform a KNN algorithm.
So first we'll start
by handling the The data
what we have to do we
have to open the data set
from the CSV format
and split the data set
into train and test part next
we'll take the similarity
where we have to
calculate the distance
between two data instances.
Once we calculate
the distance next.
We'll look for the neighbor
and select K Neighbors
which are having the least
distance from a new point.
Now once we get our neighbor,
then we'll generate a response
from a set of data instances.
So this will decide
whether the new Point belongs
to class A or Class B.
Finally, we'll create
the accuracy function
and in the end.
We'll tie it all together
in the main function.
So let's start with our code
for implementing KNN
algorithm using python.
I'll be using jupyter notebook
python 3.0 installed on it.
Now, let's move on and see
how can an algorithm
can be implemented using python.
So there's my jupyter notebook,
which is a web-based interactive
Computing notebook environment
with python 3.0 installed on it
so that the launched
its launching so there's
our jupyter notebook
and we'll be riding
our python codes on it.
So the first thing
that we need to do
is load our file,
our data is in CSV format
without a header line
or any code we can open
the file the open function
and read the data line
using the reader function
in the CSV module.
So let's write a code
to load our data file.
Let's execute the Run button.
So once you execute
the Run button,
you can see the entire training
data set as the output next.
We need to split the data
into a training data set
that KN can use to make
prediction and a test data set
that we can use to evaluate
the accuracy of The module
so we first need to convert
the flower measure
that were loaded as
string into numbers
that we can work.
Next.
We need to split the data set
randomly to train and test ratio
of 67 is 233 for test is
to train as a standard ratio,
which is used for this purpose.
So let's define a function
as load data set
that loads a CSV
with the provided file named and
split it randomly into training
and test data set using
the provided split ratio.
So this is our function
load data set
which is using filenames
that ratio training data set
and testing data set.
As its input.
All right.
So let's execute the Run button
and check for any errors.
So it's executed
with zero errors.
Let's test this function.
So there's our training
set testing set load data set.
So this is our function
load data set on inside
that we are passing.
Our file is data
with a split ratio of 0.66
and training data set
and test data set.
Let's see what
our training data set
and test data set its dividing
into so it's giving a count
of training data set
and testing data set.
The total number
of training data set
as split into is
97 and total number
of Test data set we have is 53.
So total number of training data
set we have here is 97 and total
number of test data
set we have here is 53.
All right.
Okay.
So our function load
data set is performing.
Well, so let's move
on to step two
which is similarity.
So in order to make prediction,
we need to calculate
the similarity between
any two given data instances.
This is needed
so that we can locate the kamo
similar data instances
in the training data set are
in turn make a prediction given
that all for flour measurement
are numeric and have same unit.
We can directly use
the euclidean distance measure.
This is nothing
but the square root of the sum
of squared differences
between two eras
of the number given
that all the for flower
measurements are numeric
and have same unit.
We can directly use
the euclidean distance measure
which is nothing
but the square root of the sum
of squared difference
between two arrays
or the number additionally
we want to control
which field to include
in the distance calculation.
So specifically we only want
to include first for attribute.
So our approach will be
to limit the euclidean distance
to a fixed length.
All right.
So let's define
our euclidean function.
So these are euclidean
distance function
which takes instance
one instance to and length
as parameters instance one and
instance two are the two points
of which you want to calculate
the euclidean distance,
whereas this length and denote
that how many attributes
you want to include.
Okay.
So there's our
euclidean function.
Let's execute it.
It's executing fine
without any errors.
Let's test the function suppose
the data one or the
first instance consists
of the data point us to to
to and it belongs to class A. A
and data to consist
of four for four
and it belongs to class P.
So when we calculate
the euclidean distance
of data one to data to and
what we have to do we
have to consider only
first three features of them.
All right.
So let's print the distance
as you can see here.
The distance comes out to be
three point four six four.
All right.
So this is nothing
but the square root
of 4 minus 2 whole Square.
So this distance is nothing
but the euclidean distance
and it is calculated as square
root of 4 minus 2 whole square
plus 4 minus 2 whole square
that is nothing but 3 times
or 4 minus 2 whole That is 12 +
square root of 12 is nothing
but 3.46 for all right.
So now that we have calculated
the distance now,
we need to look
for K nearest neighbors.
Now that we have a similarity
measure we can use it to collect
the kamo similar instances
for a given unseen instance.
Well, this is
a straightforward process
of calculating the distance
for all the instances
and selecting a subset with
the smallest distance value.
And now what we have
to do we have to select
the smallest distance values.
So for that will be
defining a function
as get neighbors.
So for that
what we will be doing
will be defining a function
as get neighbors
what it will do it will return
the K most similar Neighbors
From the training set
for a given test instance.
All right.
So this is how our get
nabal function look
like it takes training data set
and test instance
and K as its input here.
The K is nothing but the number
of nearest neighbor
you want to check for.
All right.
So basically what
you'll be getting
from this get Mabel's
function is K different points
having least euclidean distance
from the test instance.
All right, let's execute it.
So the function executed
without any errors.
So let's test our function.
Suppose the training data set
includes the data like 2 to 2
and it belongs to class A
and other data
includes four four four
and it belongs to class P
and our testing instances
five five five or now.
We have to predict
whether this test instance
belongs to class A
or it belongs to class be.
All right for k equal 1
we have to predict
its nearest neighbor and predict
whether this test instance
it will belong to class A
or will it belong to class be?
All right.
So let's execute
the Run button aligned.
So an executing
the Run button you can see
that we have output is 4 4 4
and B. Be a new instance 5 5 5
is closest to point 4 4 4
which belongs to class be?
All right.
Now once you have located
the most similar neighbor
for a test instance next task
is to predict a response based
on those neighbors.
So how we can do that.
Well, we can do this
by allowing each neighbor
to vote for the class attribute
and take the majority vote
as a prediction.
Let's see how we can do that.
So we have a function
as getresponse with takes
neighbors as the input.
Well, this neighbor was
nothing but the output
of this get me / function.
The output of get
neighbor function will be fed
to get response.
All right, let's execute
the Run button.
It's executed.
Let's move ahead and test
our function get response.
So we have a neighbor
as one one one.
It belongs to class A to to
to it belongs to class a33.
It belongs to class B.
So this response
what it will do it will store
the value of get response
by passing this neighbor value.
All right.
So what we want to check
is we want to predict
whether that test instance
five five five.
It belongs to class A
or Class B. Be
when the neighbors are
1 1 1 a 2 2 A + 3 3 p.
So let's check our response now
that we have created all
the different function
which are required
for a KNN algorithm.
So important main concern is
how do you evaluate
the accuracy of the prediction
and easy way to evaluate
the accuracy of the model
is to calculate a ratio
of the total correct prediction
to all the prediction made.
So for this I will
be defining function
as get accuracy and inside
that I'll be passing
my test data set
and the predictions
get accuracy function.
Check it.
Executed without any error.
Let's check it
for a sample data set.
So we have our test
data set as 1 1 1
which belongs to class A 2/2
which again belongs to class
3 3 3 which belongs to class B
and my predictions is
for first test data.
It predicted latter belongs
to class A which is true
for next it predicted
that belongs to class E,
which is again to and for
the next again it predictive
that it belongs to class A
which is false in this case
cause the test data
belongs to class be.
All right.
So in total we have to correct
prediction out of three.
All right.
Right.
So the ratio
will be 2 by 3,
which is nothing but 66.6.
So our accuracy rate is 66.6.
So now that you have created
all the function
that are required
for KNN algorithm.
Let's compile them
into one single main function.
Alright, so this is
our main function
and we are using Iris data set
with a split of 0.67 and
the value of K is 3 Let's see.
What is the accuracy score
of this check
how accurate are modulus so
in training data set,
we have 113 values
and then the test data
set we have Seven values.
These are the predicted
and the actual values
of the output.
Okay.
So in total,
we got an accuracy of ninety
seven point two nine percent,
which is really very good.
Alright, so I hope the concept
of this KNN algorithm
is here devised in a world
full of machine learning
and artificial intelligence
surrounding almost everything
around us classification
and prediction is one
of the most important aspects
of machine learning.
So before moving forward,
let's have a Look at the agenda.
I'll start of this video
by explaining you guys.
What exactly is Nave
biased then we'll understand
what is space theorem
which serves as a logic
behind the name pass
algorithm moving forward.
I'll explain the steps involved
in the neighb as algorithm one
by one and finally,
I'll finish off this video
with a demo on the Nave bass
using the sklearn package noun
a bass is a simple
but surprisingly
powerful algorithm
from predictive analysis.
It is a classification
technique based on base.
him with an assumption
of Independence among predictors
it comprises of two parts,
which is knave
and bias in simple terms
neighbors classifier assumes
that the presence
of a particular feature
in a class is unrelated
to the presence
of any other feature,
even if this features
depend on each other
or upon the existence
of the other features,
all of these properties
independently contribute
to the probability
whether a fruit is an apple
or an orange or a banana,
so That is why it
is known as naive now naive
based model is easy to build
and particularly useful
for very large data sets
in probability Theory
and statistics based theorem,
which is already
known as the base law
or the base rule describes
the probability of an event
based on prior knowledge
of the conditions
that might be related
to the event now pasted
here m is a way to figure
out conditional probability.
The conditional probability
is the probability
of an event happening given
that it has some relationship.
One or more other
events, for example,
your probability of getting
a parking space is connected
to the time of the day.
You park where you park
and what conventions are you
going on at that time
Bayes theorem is slightly
more nuanced in a nutshell.
It gives you an actual
probability of an event given
information about the tests.
Now, if you look
at the definition
of Bayes theorem,
we can see
that given a hypothesis H
and the evidence
e-base term states that
the relationship between the E
of the hypothesis
before getting the evidence,
which is the P of H
and the probability
of the hypothesis
after getting the evidence
that is p of H given e
is defined as probability
of e given H into probability
of H divided by probability of e
it's rather confusing, right?
So let's take an example
to understand this theorem.
So suppose I have
a deck of cards and
if a single card is drawn
from the deck of playing cards,
the probability that the card
is a king is for by 52
since there are four Kings
in a standard deck of 52 cards.
Now if King is an event,
this card is a king.
The probability of King
is given as 4 by 52
that is equal to 1 by 13.
Now if the evidence is provided
for instance someone
looks Such as the card
that the single card is
a face card the probability
of King given
that it's a face
can be calculated
using the base theorem
by this formula.
Now since every King
is also a face card
the probability of face given
that it's a king is equal to 1
and since there are
three phase cards in each suit.
That is the chat king and queen.
The probability of the face card
is equal to 12 by 52.
That is 3 by 30.
No using base certain we
can find out the probability
of King given that it's a face.
So our final answer
comes to 1 by 3,
which is also true.
So if you have a deck of cards,
which has having only faces now,
there are three types
of phases which are
the chat king and queen.
So the probability
that it's the king is 1 by 3.
Now.
This is the simple example
of how based on works now
if we look at the proof as in
how this paste Serum evolved.
So here we have
probability of a given B
and probability of B
given a now for
a joint probability distribution
over the sets A and B,
the probability of
a intersection B,
the conditional probability
of a given B is defined
as the probability
of a intersection B divided
by probability of B,
and similarly probability of B,
given a is defined as
probability of B intersection
a divided by probability
of a now we Equate probability
of a intersection p
and probability of
B intersection a as
both are the same thing
now from this method
as you can see,
we get our final
base theorem proof,
which is the probability of a
given b equals probability of B,
given a into probability
of P divided by
the probability of a now
while this is the equation
that applies to
any probability distribution
over the events A and B.
It has a particular nice
interpretation in case
where a is represented
as the hypothesis h H
and B is represented
as some observed evidence e
in that case the formula is p
of H given e is equal to P
of e given H into probability
of H divided by probability
of e now this relates
the probability of hypothesis
before cutting the evidence,
which is p of H
to the probability
of the hypothesis
after getting the evidence
which is p of H given e
for this reason P of H is known
as the prior probability
while P of It's given e is known
as the posterior probability
and the factor
that relates the two is known as
the likelihood ratio Now using
this term space theorem
can be rephrased
as the procedure
probability equals.
The prior probability
times the likelihood ratio.
So now that we know the maths
which is involved
behind the Bayes theorem.
Let's see how we can implement
this in real life scenario.
So suppose we have a data set.
Set in which we have
the Outlook the humidity
and we need to find out
whether we should play
or not on that day.
So the Outlook can be
sunny overcast rain
and the humidity are high normal
and the wind are categorized
into two phases
which are the weak
and the strong winds.
The first of all will create
a frequency table using
each attribute of the data set.
So the frequency table
for the Outlook looks
like this we have Sunny overcast
and rainy the frequency table
of humidity looks like this.
And a frequency table of when
looks like this we have strong
and weak for wind and high
and normal ranges for humidity.
So for each frequency table,
we will generate
a likelihood table now now
the likelihood table
contains the probability
of a particular day
suppose we take the sunny
and we take the play as yes
and no so the probability
of Sunny given that we
play yes is 3 by 10,
which is 0.3 the
probability of X,
which is the probability
of Sunny He is equal to 5 by 14.
Now.
These are all the terms
which are just generated
from the data
which we have here.
And finally the probability
of yes is 10 out of 14.
So if we have a look
at the likelihood of yes given
that it's a sunny we
can see using Bayes theorem.
It's the probability
of Sunny given yes
into probability of yes divided
by the probability of Sunny.
So we have all
the values here calculated.
So if you put
that in our base serum equation,
we get the likelihood of Is
a 0.59 similarly the likelihood
of no can also be calculated
here is 0.40 now similarly.
We are going to create
the likelihood table
for both the humidity
and the win there's a
for humidity the likelihood
for yes given the humidity
is high is equal to 0.4
to and the probability
of playing know
given the Venice High is 0.58
the similarly for table wind.
The probability of e is given
that the wind is week is 0.75
and the probability of no given
that the win is week is 0.25
now suppose we have of day
which has high rain
which has high humidity
and the wind is weak.
So should we play or not?
That's all for that.
We use the base theorem
here again the likelihood
of yes on that day is equal
to the probability
of Outlook rain given
that it's a yes
into probability.
Of humidity given that say yes,
and the probability of
when that is we given
that it's we are playing yes
into the probability of yes,
which equals to zero
point zero one nine
and similarly the likelihood
of know on that day is equal
to zero point zero one six.
Now if we look
at the probability
of yes for that day
of playing we just
need to divide it
with the likelihood
some of both the yes
and no so the probability
of playing tomorrow,
which is yes is .5.
Whereas the probability
of not playing is equal to 0.45.
Now.
This is based upon the data
which we already have with us.
So now that you have an idea
of what exactly is named by as
how it works and we have seen
how it can be implemented
on a particular data set.
Let's see where it
is used in the industry.
The started with our first
industrial use case,
which is news categorized.
It's move on to them
or we can use
the term text classification
to broaden the spectrum
of this algorithm news
in the web are rapidly growing
in the era of Information Age
where each new site has
its own different layout
and categorization
for grouping news.
Now these heterogeneity
of layout and categorization
cannot always satisfy
individual users need
to remove these heterogeneity
and classifying
the news articles.
Owing to the user preference
is a formidable task companies
use web crawler
to extract useful text
from HTML Pages
the news articles
and each of these news articles
is then tokenized now
these tokens are nothing
but the categories
of the news now
in order to achieve
better classification result.
We remove the less
significant Words,
which are the stop was
from the documents
or the Articles
and then we apply
the Nave base classifier
for classifying the news
contents based on the news.
Now this is by far one
of the best examples
of Neighbors classifier,
which is Spam filtering.
Now.
It's the Nave
Bayes classifier are
a popular statistical technique
for email filtering.
They typically use bag-of-words
features to identify
at the spam email
and approach commonly used
in text classification as well.
Now it works by correlating
the use of tokens,
but the spam and non-spam emails
and then the Bayes theorem,
which I explained
earlier is used to
calculate the probability
that an email is
or not a Spam so named
by a Spam filtering is
a baseline technique
for dealing with Spam
that container itself
to the emails need
of an individual user
and give low false positive
spam detection rates
that are generally
acceptable to users.
It is one of the oldest ways
of doing spam filtering
with its roots
in the 1990s particular words
have particular probabilities
of occurring in spam.
And in legitimate email as well
for instance most emails
users will frequently
encounter the world lottery
or the lucky draw a spam email,
but we'll sell them
see it in other emails.
The filter doesn't know
these probabilities in advance
and must be friends.
So it can build them
up to train the filter.
The user must manually indicate
whether a new email is Spam
or not for all the words
in each straining email.
The filter will
adjust the probability
that each word will appear
in a Spam or legitimate.
All in the database now
after training the word
probabilities also known
as the likelihood functions are
used to compute the probability
that an email with a particular
set of words as in belongs
to either category each word
in the email contributes
the email spam probability.
This contribution is called
the posterior probability
and is computed again
using the base 0
then the email spam probability
is computed over all
the verse in the email
and if the total exceeds
a certain threshold say
Or 95% the filter will Mark
the email as spam.
Now object detection is
the process of finding instances
of real-world objects
such as faces bicycles
and buildings in images
or video now object detection
algorithm typically
use extracted features
and learning algorithm
to recognize instance of
an object category here again,
a bias plays an important
role of categorization
and classification of object
now medical area.
This is increasingly voluminous
amount of electronic data,
which are becoming more
and more complicated.
The produced medical data
has certain characteristics
that make the analysis
very challenging and attractive
as well among all
the different approaches.
The knave bias is used.
It is the most effective
and efficient classification
algorithm and has
been successfully applied
to many medical problems
empirical comparison
of knave bias versus
five popular classifiers
on Medical data sets shows
that may bias is well suited
for medical application and has
high performance in most
of the examine medical problems.
Now in the past various
statistical methods have been
used for modeling in the area
of disease diagnosis.
These methods require
prior assumptions and are
less capable of dealing
with massive and complicated
nonlinear and dependent data one
of the main advantages
of neighbor as approach
which is appealing
to Physicians is
that all the available
information is used?
To explain the decision
this explanation seems
to be natural for medical
diagnosis and prognosis.
That is it is very
close to the way
how physician diagnosed patients
now weather is one
of the most influential factor
in our daily life to an extent
that it may affect
the economy of a country
that depends on occupation
like agriculture.
Therefore as a countermeasure
to reduce the damage
caused by uncertainty
in whether Behavior,
there should be an efficient way
to print the weather now
whether projecting has
Challenging problem
in the meteorological department
since ears even
after the technology skill
and scientific advancement
the accuracy
and production of weather
has never been sufficient even
in current day this domain
remains as a research topic
in which scientists
and mathematicians are working
to produce a model
or an algorithm
that will accurately
predict the weather now
a bias in approach
based model is created by
where procedure probabilities
are used to calculate
the likelihood of
each class label for input.
Data instance and the one
with the maximum likelihood
is considered as the resulting
output now earlier.
We saw a small implementation
of this algorithm as well
where we predicted
whether we should play
or not based on the data,
which we have collected earlier.
Now, this is a python Library
which is known as scikit-learn
it helps to build in a bias
and model in Python.
Now, there are three types
of named by ass model
under scikit-learn Library.
The first one is the caution.
It is used in classification
and it Assumes
that the feature follow
a normal distribution.
The next we have is multinomial.
It is used for discrete counts.
For example, let's say we have
a text classification problem
and here we
consider bernouli trials,
which is one step further
and instead of word
occurring in the document.
We have count
how often word occurs
in the document you
can think of it
as a number of times
outcomes number is observed
in the given number of Trials.
And finally we have
the bernouli type.
Of Naples, the binomial
model is useful
if your feature vectors are
binary bag of words model
where the once
and the zeros are words occur
in the document and the verse
which do not occur
in the document respectively
based on their data set.
You can choose any of
the given discussed model here,
which is the gaussian
the multinomial or the bernouli.
So let's understand
how this algorithm works.
And what are
the different steps?
One can take to create
a bison model and use knave bias
to predict the output so
here to understand better.
We are going to predict
the onset of diabetes Now
this problem comprises
of 768 observations
of medical details
for Pima Indian patients.
The record describes
instantaneous measurement taken
from the patient such as
the age the number
of times pregnant
and the blood work group
now all the patients are
women aged 21 and Old
and all the attributes
are numeric and the unit's vary
from attribute to attribute.
Each record has
a class value that indicate
whether the patient suffered
on onset of diabetes
within five years
or the measurements.
Now, these are
classified as zero.
Now, I've broken
the whole process down
into the following steps.
The first step is handling
the data in which we load
the data from the CSV file
and split it into training
and test data sets.
The second step
is summarizing the data.
In which we summarize
the properties in the training
data sets so that we
can calculate the probabilities
and make predictions.
Now the third step comes is
making a particular prediction.
We use the summaries
of the data set to generate
a single prediction.
And after that we
generate predictions
given a test data set and
a summarize training data sets.
And finally we evaluate
the accuracy of the predictions
made for a test data set
as the percentage correct
out of all the predictions made
and finally We tied
together and form.
Our own model
of nape is classifier.
Now.
The first thing we need to do
is load our data the data is
in the CSV format
without a header line
or any codes.
We can open the file
with the open function
and read the data lines
using the read functions
in the CSV module.
Now, we also need
to convert the attributes
that were loaded as
strings into numbers
so that we can work with them.
So let me show you
how this can be implemented now
for that you need to Tall python
on a system and use
the jupyter notebook
or the python shell.
Hey, I'm using
the Anaconda Navigator
which has all
the things required to do
the programming in Python.
We have the Jupiter lab.
We have the notebook.
We have the QT console.
Even we have a studio as well.
So what you need to do is just
install the Anaconda Navigator
it comes with the pre
installed python also,
so the moment you click launch
on The jupyter Notebook.
It will take you
to the Jupiter homepage
in a local system
and here you can do
programming in Python.
So let me just rename it as
by my India diabetes.
So first, we need
to load the data set.
So I'm creating here a function
load CSV now before that.
We need to import
certain CSV the math
and the random method.
So as you can see,
I've created a load CSV function
which will take the pie
my Indian diabetes
data dot CSV file using
the CSV dot reader method
and then we are converting
every element of that data set
into float originally all
the Ants are in string,
but we need to convert
them into floor
for our calculation purposes.
Now next we need to split
the data into training data sets
that nay bias can use
to make the prediction
and this data set
that we can use to evaluate
the accuracy of the model.
We need to split the data
set randomly into training
and testing data set
in the ratio of usually
which is 70 to 30,
but for this example,
I am going to use 67
and 33 now 70 and 30 is a Ratio
for testing algorithms
so you can play around
with this number.
So this is our split
data set function.
Now the Navy base model is
comprised of summary of the data
in the training data set.
Now this summary is then used
while making predictions.
Now the summary
of the training data
collected involves the mean
the standard deviation
of each attribute
by class value now, for example,
if there are two class values
and seven numerical attributes,
then we need a mean
and the standard deviation for
each of these seven attributes
and the class value
which makes The 14
attribute summaries
so we can break the preparation
of this summary down
into the following sub tasks
which are the separating data
by class calculating mean
calculating standard deviation
summarizing the data sets
and summarizing
attributes by class.
So the first task is to separate
the training data set
instances by class value
so that we can calculate
statistics for each class.
We can do that by creating a map
of each class value
to a list of instances
that belong to the class.
Class and sort the entire
dataset of instances
into the appropriate list.
Now the separate
by class function just the same.
So as you can see
the function assumes
that the last attribute
is the class value
the function returns a map
of class value to the list
of data instances next.
We need to calculate
the mean of each attribute
for a class value.
Now, the mean is the central
middle or the central tendency
of the data and we
use it as a middle
of our gaussian distribution
when Calculating
the probabilities.
So this is our function
for mean now.
We also need to calculate
the standard deviation
of each attribute
for a class value.
The standard deviation
is calculated as a square root
of the variance
and the variance
is calculated as the average
of the squared differences
for each attribute value
from the mean now
one thing to note
that here is
that we are using
n minus one method
which subtracts one
from the number
of attributes values
when calculating the variance.
The now that we have the tools
to summarize the data
for a given list of instances,
we can calculate the mean
and standard deviation
for each attribute.
Now that's if function groups
the values for each attribute
across our data instances
into their own lists
so that we can compute the mean
and standard deviation values
for each attribute.
The next comes the summarizing
attributes by class.
We can pull it all together
by first separating
our training data sets
into instances growth by class
then calculating the summaries
for each a To be with now.
We are ready to make predictions
using the summaries prepared
from our training data
making predictions involves
calculating the probability
that a given data instance
belong to each class then
selecting the class
with the largest probability
as a prediction.
Now we can divide this whole
method into four tasks
which are the calculating
gaussian probability density
function calculating class
probability making a prediction
and then estimating the accuracy
now to calculate the gaussian
probability density function.
We use the gaussian function
to estimate the probability
of a given attribute value
given the node mean
and the standard deviation
of the attribute estimated
from the training data.
As you can see
the parameters are x
mean and the standard deviation
now in the calculate
probability function,
we calculate the exponent first
then calculate the main division
this lets us fit the equation
nicely into two lines.
Now, the next task
is calculating the
class properties now
that we had can calculate
the probability of an attribute
belonging to a class.
We can combine the probabilities
of all the attributes values
for a data instance
and come up with a probability
of the entire.
Our data instance
belonging to the class.
So now that we have calculated
the class properties.
It's time to finally make
our first prediction now,
we can calculate the probability
of the data instance belong
to each class value
and we can look
for the largest probability
and return the associated class
and for that we are going
to use this function to predict
which uses the summaries
and the input Vector which is
basically all the probabilities
which are being input
for a particular label
now finally we can
An estimate the accuracy
of the model
by making predictions
for each data instances
in our test data for that.
We use the cat
predictions method.
Now this method is used
to calculate the predictions
based upon the test data sets
and the summary
of the training data set.
Now, the predictions
can be compared
to the class values
in our test data set
and classification accuracy
can be calculated as
an accuracy ratio
between the zeros
and the hundred percent.
Now the get accuracy method will
calculate this accuracy ratio.
Now finally to sum it all up.
We Define our main function
we call all these methods
which we have defined
earlier one by one to get
the Courtesy of the model
which we have created.
So as you can see,
this is our main function
in which we have the file name.
We have defined the split ratio.
We have the data set.
We have the training
and test data set.
We are using the split
data set method next.
We are using the summarized
by class function using
the get prediction and
the get accuracy method as well.
So guys as you can see
the output of this one gives us
that we are splitting the seven
sixty eight rows into 514
which is the training and 254
which is the test data set rows
and the accuracy of this model
is 68% Now we can play
with the amount of training
and test data sets
which are to be used
so we can change
the split ratio to seventies.
238 is 220 to get
different sort of accuracy.
So suppose I change
the split ratio from 0.67 20.8.
So as you can see,
we get the accuracy
of 62 percent.
So splitting it into 0.67
gave us a better result
which was 68 percent.
So this is how you can Implement
Navy bias caution classifier.
These are the step
by step methods
which you need to do in case of
using the Nave Bayes classifier,
but don't worry.
We do not need to write
all this many lines
of code to make a model this
with The Sacketts.
And I really comes into picture
the scikit-learn library has
a predefined method
or as say a predefined function
of neighbor bias,
which converts all
of these lines,
of course into merely just
two or three lines of codes.
So, let me just open
another jupyter notebook.
So let me name it
as sklearn a pass.
Now here we are going to use
the most famous data set
which is the iris dataset.
Now, the iris flower data
set is a multivariate
data set introduced by
the British statistician
and biologists Roland Fisher
and based on this fish is linear
discriminant model this data set
became a typical test case
for many statistical
classification techniques
in machine learning.
So here we are going to use
the caution NB model,
which is already available
in the sklearn.
As I mentioned earlier,
there were three
types of Neighbors
which are the question
multinomial and the bernouli.
So here we are going to use
the caution and be model
which is already present
in the sklearn library,
which is the
cycle learn Library.
So first of all,
what we need to do is
import the sklearn data sets
and the metrics
and we also need to import
the caution and be Now
once all these libraries
are lowered we need
to load the data set
which is the iris dataset.
The next what we need
to do is fit a Nave
by a small to this data set.
So as you can see we have so
easily defined the model
which is the gaussian
NB which contains
all the programming
which I just showed you
earlier all the methods
which are taking the input
calculating the mean
the standard deviation
separating it bike last
and finally making predictions.
Calculating the
prediction accuracy.
All of this comes
under the caution and be method
which is inside already present
in the sklearn library.
We just need to fit it
according to the data set
which we have so next
if we print the model we see
which is the gaussian NB model.
The next what we need to do
is make the predictions.
So the expected output
is data set dot Target
and the projected
is using the pretend model
and the model we are using
is the cause in NB here.
Here now to summarize the model
which created we calculate
the confusion Matrix
and the classification report.
So guys, as you can see
the classification to provide
we have the Precision
of Point Ninety Six,
we have the recall of 0.96.
We have the F1 score
and the support and finally if
we print our confusion Matrix,
as you can see it gives
us this output.
So as you can see
using the gaussian
and we method just
putting it in the model
and using any of the data.
Fitting the model
which you created
into a particular data set
and getting the desired
output is so easy
with the scikit-learn library.
So guys, this is it.
I hope you understood a lot
about the nape Bayes classifier
how it is used
where it is used and what are
the different steps involved
in the classification technique
and how the scikit-learn
makes all of those techniques
very easy to implement
in any data set which we have.
As we M or support
Vector machine is one
of the most effective
machine learning classifier
and it has been used
in various Fields
such as face recognition
cancer classification
and so on today's session
is dedicated to how svm works
the various features of svm
and how it is used
in the real world.
So without any further due
let's take a look
at the agenda for today.
We're going to begin the session
with an introduction
to machine learning
and the different types
of machine learning.
Next we'll discuss
what exactly support
Vector machines are
and then we'll move on and see
how svm works
and how it can be used
to classify linearly
separable data will also
briefly discuss about
how nonlinear svm's work
and then we'll move on
and look at the use case of svm
in colon cancer classification
and finally we'll end
the session by running a demo
where we'll use svm to predict
whether a patient is suffering
from a heart disease or not.
Okay, so that was the agenda.
Let's get stood
with our first topic.
So what is machine learning
machine learning is a science
of getting computers to act
by feeding them data
and letting them learn
a few tricks on their own.
Okay, we're not going
to explicitly program
the machine instead.
We're going to feed it
data and let it learn
the key to machine learning is
the data machines learn just
like us humans.
We humans need
to collect information
and data to learn similarly
machines must also be fed data
in order to learn
and make decisions.
Let's say that you want
a machine to predict
the value of a stock.
All right in such situations.
You just feed the machine
with relevant data
after which you develop a model
which is used to predict
the value of the stock.
NOW one thing to keep
in mind is the more data
you feed the machine the
better it will learn
and make more accurate
predictions obviously machine
learning is not so simple
in order for a machine
to analyze and get
useful insights from data.
It must process
and study the data
by running different.
Algorithms on it.
All right.
And today we'll be discussing
about one of the most widely
used algorithm called
the support Vector machine.
Okay.
Now that you have a brief idea
about what machine learning is,
let's look at the different ways
in which machines Lon first.
We have supervised
learning in this type
of learning the machine
learns under guidance.
All right, that's why
it's called supervised learning
now at school.
Our teachers guided us
and taught us similarly
in supervised learning machines
learn by feeding
them labeled data.
Explicitly telling them.
Hey, this is the input
and this is
how the output must look.
Okay.
So guys the teacher in this case
is the training data.
Next we have
unsupervised learning here.
The data is not labeled
and there is no guide
of any sort.
Okay, the machine must figure
out the data set given
and must find hidden patterns
in order to make predictions
about the output an example
of unsupervised learning is
an adult's like you and me.
We don't need a guide to help us
with our daily activities.
They figured things out on
our own without any supervision.
All right, that's exactly how
I'm supervised learning work.
Finally.
We have reinforcement learning.
Let's say you were dropped off
at an isolated island.
What would you do now
initially you would panic
and you'll be unsure
of what to do
where to get food from How
To Live and all of that
but after a while you will have
to adapt you must learn
how to live in the island adapt
to the changing climate learn
what to eat and what not to eat.
You're basically following
the hit and trial.
Because you're new
to the surrounding
and the only way to learn
is experience and then learn
from your experience.
This is exactly what
reinforcement learning is.
It is a learning method
wherein an agent interacts
with its environment
by producing actions
and discovers errors or words.
Alright, and once it gets
trained it gets ready to predict
the new data presented to it.
Now in our case the agent
was you basically stuck
on the island
and the environment
was the island.
All right?
Okay now now let's
move on and see
what svm algorithm is all about.
So guys svm
or support Vector machine is
a supervised learning algorithm,
which is mainly used to classify
data into different classes now
unlike most algorithms svm
makes use of a hyperplane
which acts like
a decision boundary
between the various classes
in general svm can
be used to generate
multiple separating hyperplanes
so that the data
is divided into segments.
Okay and each These segments
will contain only
one kind of data.
It's mainly used
for classification purpose
wearing you want to classify
or data into two different
segments depending
on the features of the data.
Now before moving any further,
let's discuss a few
features of svm.
Like I mentioned earlier svm is
a supervised learning algorithm.
This means that svm trains
on a set of labeled data svm
studies the label training data
and then classifies
any new input data depending on
what it learned in the training.
In Phase a main advantage
of support Vector machine is
that it can be used
for both classification
and regression problems.
All right.
Now even though svm is mainly
known for classification the svr
which is the support
Vector regressor is used
for regression problems.
All right, so svm can be used
both for classification.
And for regression.
Now, this is one of the reasons
why a lot of people prefer svm
because it's a very good
classifier and along with that.
It is also used for regression.
Another feature is the svm
kernel functions svm can be used
for classifying nonlinear data
by using the kernel trick
the kernel trick basically means
to transform your data
into another dimension
so that you can easily
draw a hyperplane
between the different
classes of the data.
Alright, nonlinear data
is basically data
which cannot be separated
with a straight line.
Alright, so svm can even be used
on nonlinear data sets.
You just have to use
a kernel functions to do this.
All right, so Guys,
I hope you all are clear
with the basic concepts of svm.
Now.
Let's move on and look
at how svm works so guys
an order to understand
how svm Works let's consider
a small scenario now
for a second pretend
that you own a firm.
Okay, and let's say
that you have a problem
and you want to set up a fence
to protect your rabbits
from the pack of wolves.
Okay, but where do you
build your fence
one way to get around?
The problem is to build
a classifier based
on the position of the rabbits
and words in your Faster.
So what I'm telling you is
you can classify the group
of rabbits as one group
and draw a decision
boundary between the rabbits
and the world.
All right.
So if I do that and if I try
to draw a decision boundary
between the rabbits
and the Wolves,
it looks something like this.
Okay.
Now you can clearly build
a fence along this line
in simple terms.
This is exactly
how SPM work it draws
a decision boundary,
which is a hyperplane
between any two classes in order
to separate them or class.
Asif I them now,
I know you're thinking
how do you know
where to draw a hyperplane
the basic principle behind
svm is to draw a hyperplane
that best separates
the two classes
in our case the two glasses
of the rabbits and the Wolves.
So you start off by drawing
a random hyperplane
and then you check the distance
between the hyperplane
and the closest data points
from each glove these closes
on your is data points
to the hyperplane are known
as support vectors and that's
where the name
comes from support.
Active machine.
So basically the
hyperplane is drawn
based on these support vectors.
So guys an Optimum
hyperplane will have
a maximum distance from each
of these support vectors.
All right.
So basically the hyper plane
which has the maximum distance
from the support vectors is
the most optimal hyperplane
and this distance
between the hyperplane
and the support vectors
is known as the margin.
All right.
So to sum it up svm
is used to classify data
by using a hyper plane such
that the distance distance
between the hyperplane
and the support
vectors is maximum.
So basically your margin
has to be maximum.
All right, that way,
you know that you're actually
separating your classes or add
because the distance between
the two classes is maximum.
Okay.
Now, let's try
to solve a problem.
Okay.
So let's say that I input
a new data point.
Okay.
This is a new data point
and now I want to draw
a hyper plane such
that it best separates
the two classes.
Okay, so I start off by drawing
a hyperplane like this
and then I check the distance
between Hyper plane
and the support vectors.
Okay, so I'm trying to check
if the margin is maximum
for this hyperplane,
but what if I draw a hyper plane
which is like this?
All right.
Now I'm going to check
the support vectors over here.
Then I'm going
to check the distance
from the support vectors
and with this hyperplane,
it's clear that the
margin is more right
when you compare the margin
of the previous one
to this hyperplane.
It is more.
So the reason why I'm choosing
this hyperplane is
because the distance
between the support vectors
and the hi Hyperplane
is maximum in this scenario.
Okay, so guys this is
how you choose a hyperplane.
You basically have to make sure
that the hyper plane
has a maximum.
Margin.
All right, it has two best
separate the two classes.
All right.
Okay so far it was quite easy.
Our data was linearly separable
which means that you
could draw a straight line
to separate the two classes.
All right, but what will you do?
If the data set is like this
you possibly can't draw
a hyper plane like this.
All right.
It doesn't separate the two.
At all, so what do you do
in such situations now earlier
in the session I mentioned
how a kernel can be used
to transform data
into another dimension
that has a clear dividing margin
between the classes of data.
Alright, so kernel functions
offer the user this option
of transforming nonlinear spaces
into linear ones.
Nonlinear data set is the one
that you can't separate
using a straight line.
All right, in order to deal
with such data sets you're going
to Ants form them
into linear data sets
and then use svm on them.
Okay.
So simple trick would be
to transform the two variables
X and Y into a new
feature space involving
a new variable called Z.
All right, so guys so far
we were plotting our data
on two dimensional space.
Correct?
We will only using the X
and the y axis so we had only
those two variables X and Y now
in order to deal with this kind
of data a simple trick would be
to transform the two variables X
and I into a new feature space
involving a new variable
called Z. Ok,
so we're basically
visualizing the data
on a three-dimensional space.
Now when you transform
the 2D space into a 3D space,
you can clearly see
a dividing margin
between the two classes
of data right now.
You can go ahead
and separate the two classes
by drawing the best
hyperplane between them.
Okay, that's exactly
what we discussed
in the previous slides.
So guys, why don't you try
this yourself dry
drawing a hyperplane,
which is the most Optimum.
For these two classes.
All right, so guys,
I hope you have
a good understanding
about nonlinear svm's now.
Let's look at a real world use
case of support Vector machines.
So guys s VM
as a classifier has been used
in cancer classification
since the early 2000s.
So there was an experiment held
by a group of professionals
who applied svm in a colon
cancer tissue classification.
So the data set consisted
of about 2,000
transmembrane protein samples
and Only about 50 to 200
genes samples were input
Into the svm classifier
Now this sample
which was input
into the svm classifier had
both colon cancer tissue samples
and normal colon tissue
samples right now.
The main objective of this study
was to classify Gene samples
based on whether they
are cancerous or not.
Okay, so svm was trained
using the 50 to 200 samples
in order to discriminate
between non-tumor
from tumor specimens.
So the performance
of The svm classifier
was very accurate
for even a small data set.
All right, we had only
50 to 200 samples.
And even for the small data
set svm was pretty accurate
with its results.
Not only that its
performance was compared
to other classification
algorithm like naive Bayes
and in each case svm
outperform naive Bayes.
So after this experiment
it was clear
that svm classify
the data more effectively
and it worked exceptionally good
with small data sets.
Let's go ahead
and understand what exactly
is unsupervised learning.
So sometimes the given data
is unstructured and unlabeled
so it becomes difficult
to classify the data
into different categories.
So unsupervised learning
helps to solve this problem.
This learning is used
to Cluster the input data
in classes on the basis
of their statistical properties.
So example, we can
cluster Different Bikes
based upon the speed
limit their acceleration
or the average.
Average that they are giving so
and suppose learning is a type
of machine learning algorithm
used to draw inferences
from data sets consisting
of input data
without labels responses.
So if you have a look
at the workflow
or the process flow
of unsupervised learning,
so the training data is
collection of information
without any label.
We have the machine
learning algorithm
and then we have
the clustering malls.
So what it does is
that distributes the data
into different clusters
and again if you provide
any Lebanon new data,
it will make a prediction
and find out to which cluster
that particular data
or the data set belongs
to or the particular data point
belongs to so one
of the most important
algorithms in unsupervised
learning is clustering.
So let's understand exactly
what is clustering.
So a clustering
basically is the process
of dividing the data sets
into groups consisting
of similar data points.
It means grouping
of objects based
on the information found in
the data describing the objects
or their relationships,
so So clustering malls focus on
and defying groups
of similar records
and labeling records
according to the group
to which they belong now.
This is done without the benefit
of prior knowledge
about the groups
and their creator districts.
So and in fact,
we may not even know exactly
how many groups are
there to look for.
Now.
These models are often
referred to as
unsupervised learning models,
since there's no external
standard by which
to judge the malls
classification performance.
There are no right or wrong
answers to these model and
if we talk about why
clustering is used
so the goal of clustering
is to determine
the intrinsic growth in a set
of unlabeled data sometime.
The partitioning is the goal
or the purpose of clustering
algorithm is to make sense
of and exact value
from the last set of structured
and unstructured data.
So that is why clustering
is used in the industry.
And if you have a look
at the various use cases
of clustering in Industry
so first of all,
it's being used in marketing.
So discovering distinct groups
in customer databases
such as customers
who make a lot of long
distance calls customers
who use internet more
than cause they're also
using insurance companies
for like identifying groups
of Corporation insurance policy
holders with high average
claim rate Farmers crash cops,
which is profitable.
They are using C Smith studies
and Define probability areas
of oil or gas exploration based.
Don't cease make data
and they're also used
in the recommendation of movies.
If you'd say they are also used
in Flickr photos.
They also used by Amazon
for recommending the product
which category it lies in.
So basically if we talk
about clustering there are
three types of clustering.
So first of all,
we have the exclusive clustering
which is the hard clustering
so here and item belongs
exclusively to one cluster
not several clusters
and the datapoint belong
exclusively to one cluster.
ER so an example of this is
the k-means clustering so
claiming clustering does
this exclusive kind
of clustering so secondly,
we have overlapping clustering
so it is also known as
soft clusters in this
and item can belong
to multiple clusters as
its degree of association
with each cluster
is shown and for example,
we have fuzzy
or the c means clustering
which has been used
for overlapping clustering
and finally we have
the hierarchical clustering
so When two clusters have
a parent-child relationship
or a tree-like structure,
then it is known
as hierarchical cluster.
So as you can see here
from the example,
we have a parent-child kind
of relationship in
the cluster given here.
So let's understand
what exactly is
K means clustering.
So today means clustering is
an Enquirer them whose main goal
is to group similar elements
of data points into a cluster
and it is a process
by which objects are classified
into a predefined
number of groups
so that they They are
as much just similar as
possible from one group
to another group
but as much as similar or
possible within each group now
if you have a look
at the algorithm working here,
you're right.
So first of all,
it starts with and defying
the number of clusters,
which is K
that I can we find the centroid
we find that distance objects
to the distance object
to the centroid distance
of object to the centroid.
Then we find the grouping based
on the minimum distance.
Past the centroid Converse
if true then we make
a cluster false.
We then I can't find
the centroid repeat
all of the steps
again and again,
so let me show you
how exactly clustering was
with an example here.
So first we need
to decide the number
of clusters to be made now
another important task here is
how to decide the important
number of clusters
or how to decide the number
of classes will get
into that later.
So first, let's assume
that the number
of clusters we have decided.
It is three.
So after that then we
provide the centroids
for all the Clusters
which is guessing
and the algorithm calculates
the euclidean distance
of the point from each centroid
and assize the data point
to the closest cluster
now euclidean distance.
All of you know
is the square root
of the distance the square root
of the square of the distance.
So next when the centroids
are calculated again,
we have our new clusters
for each data point then again
the distance from the points.
To the new classes
are calculated and then
again the points are assigned
to the closest cluster.
And then again,
we have the new centroid
scattered and now
these steps are repeated
until we have
a repetition the centroids
or the new centralized are very
close to the very previous ones.
So until unless our output
gets repeated or the outputs
are very very close enough.
We do not stop this process.
We keep on calculating
the euclidean distance
of all the points
to the centroid.
It's then we calculate
the new centroids
and that is how K means
clustering Works basically,
so an important part
here is to understand
how to decide the value of K
or the number of clusters
because it does
not make any sense.
If you do not know
how many classes
are you going to make?
So to decide
the number of clusters?
We have the elbow method.
So let's assume first
of all compute
the sum squared error,
which is sse4 some value
of a for example.
Take two four six
and eight now the SSE
which is the sum squared error
is defined as a sum
of the squared distance
between each number member
of the cluster
and its centroid
mathematically and
if you mathematically it
is given by the equation
which is provided here.
And if you brought
the key against the SSE,
you will see
that the error decreases
as K gets large not this is
because the number
of cluster increases
they should be smaller.
So the Distortion is
also smaller know.
The idea of the elbow method
is to choose the K at which
the SSE decreases abruptly.
So for example here
if we have a look
at the figure given here.
We see that the best number
of cluster is at the elbow
as you can see here the graph
here changes abruptly
after the number four.
So for this particular example,
we're going to use
for as a number of cluster.
So first of all
while working with
k-means clustering there
are two key points
to know first of all,
Be careful about
where you start so choosing
the first center at random
during the second center.
That is far away from the first
center similarly choosing
the NIH Center as far away
as possible from the closest
of the of the other centers
and the second idea
is to do as many runs
of k-means each with different
random starting points
so that you get an idea
of where exactly
and how many clusters
you need to make
and where exactly
the centroid lies
and how the data
is getting converted.
Divorced now k-means is
not exactly a very good method.
So let's understand the pros
and cons of k-means clustering.
We know that k-means is simple
and understandable.
Everyone learns to the first go
the items automatically assigned
to the Clusters.
Now if we have
a look at the cons,
so first of all one needs to
define the number of clusters,
there's a very
heavy task asks us
if we have three four or
if we have 10 categories,
and if you do not know
what the number
of clusters are going to be.
It's very difficult for anyone.
You know to guess the number
of clusters not all the items
are forced into clusters
whether they are actually belong
to any other cluster
or any other category.
They are forced to rely
in that other category
in which they are closest
to this against happens
because of the number
of clusters with not defining
the correct number of clusters
or not being able to guess
the correct number of clusters.
So and for most of all,
it's unable to handle
the noisy data and the outliners
because anyways machine
learning engineers and date.
Our scientists have
to clean the data.
But then again it comes
down to the analysis
what they're doing
and the method
that they are using so typically
people do not clean the data
for k-means clustering or even
if the clean there's
sometimes a now see noisy
and outliners data
which affect the whole model
so that was all
for k-means clustering.
So what we're going to do
is now use k-means clustering
for the movie datasets,
so, Have to find out
the number of clusters
and divide it accordingly.
So the use case is
that first of all,
we have a data set
of five thousand movies.
And what we want
to do is grip them
if the movies into clusters
based on the Facebook likes,
so guys, let's have a look
at the demo here.
So first of all,
what we're going to do is
import deep copy numpy pandas
Seaborn the various libraries,
which we're going to use now
and from my proclivities
in the use ply plot.
And we're going to use
this ggplot and next
what we're going to do is
import the data set and look
at the shape of the data set.
So if you have a look at the
shape of the data set we can see
that it has 5043 rose
with 28 columns.
And if you have a look
at the head of the data
set we can see it
just 5043 data points,
so George we going to do
is place the data points
in the plot we take
the director Facebook likes
and we have a look
at the data columns
face number in post cars
total Facebook likes
director Facebook likes.
So what we have done here
now is taking the director
Facebook likes and the actor
three Facebook likes, right.
So we have five thousand
forty three rows
and two columns Now using
the k-means from sklearn
what we're going
to do is import it.
First we're going to import
k-means from scale
and Dot cluster.
Remember guys eschaton is
a very important library
in Python for machine learning.
So and the number of cluster
what we're going to do is
provide as five now this again,
the number of cluster
depends upon the SSE,
which is the sum
of squared errors all the we're
going to use the elbow method.
So I'm not going to go
into the details of that again.
So we're going to fit the data
into the k-means to fit and
if you find the cluster,
Us than for the
k-means and printed.
So what we find is is
an array of five clusters
and Fa print the label
of the k-means cluster.
Now next what we're going
to do is plot the data
which we have with the Clusters
with the new data clusters,
which we have found
and for this we're going
to use the CC Bond
and as you can see here,
we have plotted that car.
We have plotted the data
into the grid and you can see
here we have five clusters.
So probably what I would say is
that the cluster
3 and the cluster
zero are very very close.
So it might depend see
that's exactly what I
was going to say.
Is that initially
the main Challenge
and k-means clustering is
to define the number of centers
which are the K.
So as you can see here
that the third Center
and the zeroth cluster
the third cluster
and the zeroth cluster up
very very close to each other.
So guys It probably
could have been
in one another cluster
and the another disadvantage was
that we do not exactly know
how the points are
to be arranged.
So it's very difficult to force
the data into any other cluster
which makes our analysis
a little different works fine.
But sometimes it
might be difficult to code
in the k-means clustering now,
let's understand what exactly is
c means clustering.
So the fuzzy see means
is an extension of the k-means
clustering the popular simple.
Clustering technique so
fuzzy clustering also referred
as soft clustering is a form
of clustering in which
each data point can belong
to more than one cluster.
So k-means tries to find
the heart clusters
where each point belongs
to one cluster.
Whereas the fuzzy c means
discovers the soft clusters
in a soft cluster
any point can belong
to more than one cluster
at a time with
a certain Affinity value
towards each 4zc means assigns
the degree of membership,
which Just from 0 to 1
to an object to a given cluster.
So there is a stipulation
that the sum of Z membership
of an object to all the cluster.
It belongs to must be equal
to 1 so the degree of membership
of this particular point to pull
of these clusters as 0.6 0.4.
And if you add up we get 1
so that is one of the logic
behind the fuzzy c means
so and and this Affinity
is proportional to the distance
from the point to the center
of a cluster now
then again We have the pros
and cons of fuzzy see means.
So first of all,
it allows a data point to be
in multiple cluster.
That's a pro.
It's a more neutral
representation of the behavior
of jeans jeans usually are
involved in multiple functions.
So it is a very
good type of clustering
when we're talking
about genes First of and again,
if we talk about the cons again,
we have to Define c
which is the number
of clusters same as K next.
We need to determine the
membership cutoff value also,
so that takes a lot of I'm
and it's time-consuming
and the Clusters
are sensitive to initial
assignment of centroid.
So a slight change
or deviation from the center's
it's going to result
in a very different
kind of, you know,
a funny kind of output with that
from the fuzzy see means and one
of the major disadvantage
of c means clustering is
that it's this
a non-deterministic algorithm.
So it does not give you
a particular output as
in such that's
that now let's have a look
at At the throat type
of clustering which is
the hierarchical clustering.
So hierarchical clustering
is an alternative approach
which builds a hierarchy
from the bottom up
or the top to bottom
and does not require
to specify the number
of clusters beforehand.
Now, the algorithm works
as in first of all,
we put each data point
in its own cluster and
if I the closest to Cluster
and combine them into one more
cluster repeat the above step
till the data points are
in a single cluster.
Now, there are two types of
hierarchical clustering one is
I've number 80 plus string
and the other one
is division clustering.
So a cumulative clustering bills
the dendogram from bottom level
while the division clustering
it starts all the data points
in one cluster
the fruit cluster now again
hierarchical clustering also
has some sort of pros and cons.
So in the pros
don't know Assumption
of a particular number
of cluster is required
and it may correspond
to meaningful tax anomalies.
Whereas if we talk
about the cons
once a decision is made
to combine two clusters.
It cannot be undone and one
of the major disadvantage of
these hierarchical clustering is
that it becomes very slow.
If we talked about very very
large data sets and nowadays.
I think every industry
are using last year
as it's and collecting
large amounts of data.
So hierarchical clustering is
not the act or the best method
someone might need
to go for so there's
that Hello everyone
and welcome to this interesting
session on a prairie algorithm.
Now many of us have visited
retails shops such as
Walmart or Target
for our household needs.
Well, let's say
that we are planning to buy
a new iPhone from Target.
What we would typically do is
search for the model by visiting
the mobile section of the stove
and then select the product
and head towards
the billing counter.
But in today's world the goal
of the organization is
to increase the revenue.
Can this be done
by just pitching one?
I worked at a time
to the customer.
Now.
The answer to Is is clearly
no hence organization began
mining data relating
to frequently bought items.
So a Market Basket analysis
is one of the key techniques
used by large retailers
to uncover associations
between items now examples
could be the customers
who purchase Bread have
a 60 percent likelihood
to also purchase Jam customers
who purchase laptops are
more likely to purchase
laptop bags as well.
They try to find out
associations between different
items and products
that can be sold together
which gives assisting
in the right product placement.
Typically, it figures out
what products are
being bought together
and organizations can place
products in a similar manner,
for example, people
who buy bread also
tend to buy butter,
right and the marketing team
at retail stores
should Target customers
who buy bread and butter
and provide an offer to them
so that they buy a
But item suppose X
so if a customer buys bread
and butter and sees
a discount offer on X,
he will be encouraged
to spend more and buy the eggs
and this is what Market
Basket analysis is all about.
This is what we are going
to talk about in this session,
which is Association rule Mining
and the a prayer real Corinth
now Association rule
can be thought of as
an if-then relationship
just to elaborate on that.
We have come up
with a rule suppose
if an item a is Been bought
by the customer.
Then the chances
of Item B being picked
by the customer to under
the same transaction ID is found
out you need to understand here
that it's not a
cash reality rather.
It's a co-occurrence pattern
that comes to the force.
Now, there are two elements
to this rule first if
and second is the then now
if is also known as antecedent.
This is an item
or a group of items
that are typically
found in the item set
and the later one.
Is called the consequent
this comes along as an item
with an antecedent group
or the group
of antecedents a purchase.
Now if we look
at the image here a arrow B,
it means that
if a person buys an item a
then he will also buy an item b
or he will most
probably by an item B.
Now the simple example
that I gave you about
the bread-and-butter and the x
is just a small example,
but what if you have thousands
and thousands of items
if you go to any proof
additional data scientist
with that data,
you can just imagine
how much of profit you can make
if the data scientist provides
you with the right examples
and the right placement
of the items,
which you can do and you
can get a lot of insights.
That is why Association
rule mining is a very
good algorithm which helps
the business make profit.
So, let's see
how this algorithm works.
So Association rule mining is
all about building the rules
and we have just seen one rule
that If you buy a then
there's a slight possibility
or there is a chance
that you might buy
be also this type
of a relationship in which
we can find the relationship
between these two items
is known as single cardinality,
but what if the customer
who bought a and b also wants
to buy C or if a customer
who bought a b and c
also wants to buy D. Then
in these cases the cardinality
usually increases
and we can have a lot
of combination around.
These data and
if you have around 10,000
or more than 10,000 data
or items just imagine
how many rules you're going
to create for each product.
That is why Association rule
mining has such measures so
that we do not end up creating
tens of thousands of rules.
Now that is where the a priori
algorithm comes in.
But before we get
into the a priori algorithm,
let's understand.
What's the maths behind it.
Now there are three
types of matrices.
Which help to
measure the association?
We have support
confidence and lift.
So support is
the frequency of item a
or the combination of item ARB.
It's basically the frequency
of the items,
which we have bought
and what are the combination
of the frequency of the item.
We have bought.
So with this what we can do
is filter out the items,
which have been
bought less frequently.
This is one of the measures
which is support now
what confidence tells
us so conference.
Gives us how often the items
NB occur together given
the number of times a occur.
Now this also helps us solve
a lot of other problems
because if somebody is buying a
and b together and not buying
see we can just rule out see
at that point of time.
So this solves
another problem is
that we obviously do not need
to analyze the process
which people just by barely.
So what we can do is
according to the sages we
can Define our minimum support
and confidence and when you
have set Values we can put
this values in the algorithm
and we can filter
out the data and we
can create different rules
and suppose even
after filtering you have
like five thousand rules.
And for every item we
create these 5,000 rules.
So that's
practically impossible.
So for that we need
the third calculation,
which is the lift
so lift is basically
the strength of any Rule now,
let's have a look
at the denominator
of the formula given here
and if you see Here,
we have the independent
support values of A and B.
So this gives us
the independent occurrence
probability of A and B.
And obviously there's
a lot of difference
between the random occurrence
and Association and
if the denominator
of the lift is more
what it means is
that the occurrence
of Randomness is more
rather than the occurs
because of any association.
So left is the final verdict
where we know
whether we have to spend time.
On this particular rule
what we have got here or not.
Now, let's have a look
at a simple example
of Association rule mining.
So suppose.
We have a set of items a b c d
and e and a set
of transactions T1 T2 T3 T4
and T5 and as you can see here,
we have the transactions T1
in which we have ABC T to a CD
t3b CDT for a d e and T5 BCE.
Now what we generally
do is create.
At some rules or Association
rules such as a gives T
or C gives a a gift C B
and C gives a what this
basically means is
that if a person buys a then
he's most likely to buy D.
And if a person by C,
then he's most likely
to buy a and
if you have a look
at the last one,
if a person buys B and C is
most likely to buy the item
a as well now if we calculate
the support confidence
and lift using these rules
as you can see here
in the table,
we have the rule.
And the support confidence
handle lift values.
Let's discuss about a prairie.
So a priori algorithm
uses the frequent itemsets
to generate the association Rule
and it is based on the concept
that subset of a frequent
itemsets must also be
a frequent item set itself.
Now this raises the question
what exactly is
a frequent item set.
So a frequent item
set is an item set
whose support value is greater
than the threshold value
just now we discussed
that the marketing team
according to the says have
a minimum threshold value
for the confidence as
well as the support.
So frequent itemsets
is that animset
who support value is greater
than the threshold value
already specified example,
if A and B is a freaker item set
Than A and B should also be
frequent itemsets individually.
Now, let's consider
the following transaction
to make the things
such as easier suppose.
We have transactions 1
2 3 4 5 and these
Items out there.
So T 1 has 1 3 & 4 T
2 has 2 3 and 5 T3 has
1 2 3 5 T 4 to 5 and T 5 1 3 & 5
now the first step
is to build a list
of items sets of size 1 by
using this transactional data.
And one thing to note here is
that the minimum support count
which is given here is
to Let's suppose it's too
so the first step is
to create item sets
of size 1 and calculate
their support values.
So as you can see here.
We have the table see one
in which we have
the item sets 1 2 3 4 5
and the support values
if you remember
the formula of support,
it was frequency divided by
the total number of occurrence.
So as you can see
here for the items
that one the support is 3
as you can see here
that item set one up here s
and t 1 T 3 and T 5.
So as you can see,
it's frequency is 1 2 & 3 now
as you can see the item set
for has a support of one
as it occurs only once
in Transaction one
but the minimum
support value is 2
that's why it's going
to be eliminated.
So we have the final table
which is the table F1,
which we have the item
sets 1 2 3 and 5
and we have the support values
3 3 4 & 4 now the next step is
to create Adam sets
of size 2 and calculate
their support values now
all the combination
of the item sets in the F1,
which is the final table
in which it is carded the for
are going to be used
for this iteration.
So So we get the table c 2.
So as you can see here,
we have 1 2 1 3 1
5 2 3 2 5 & 3 5 now
if you calculate
the support here again,
we can see
that the item set 1 comma
2 has a support of one
which is again less
than the specified threshold.
So we're going to discard
that so if we have a look
at the table f 2
we have 1 comma 3 1 5
2 3 2 5 & 3 5 again,
we're going to move forward
and create the atoms.
That of size 3 and calculate
this support values.
Now all the combinations
are going to be used
from the item set F to
for this particular iterations.
Now before calculating
support values,
let's perform proning
on the data set.
Now what is pruning now
after the combinations
are being made we device
c 3 item sets to check
if there is another subset
whose support is less
than the minimum support value.
That is what frequent
items that means.
So if you have a look
here the item sets.
We have is 1 2 3 1 2 1
3 2 3 4 the first one
because as you can see here
if we have a look
at the subsets of one two,
three, we have
1 comma 2 as well,
so we are going to discard
this whole item set same goes
for the second one.
We have one to five.
We have 1/2 in that
which was discarded
in the previous set
or the previous step.
That's why we're going
to discard that also
which leaves us
with only two factors,
which is 1 3 5 8.
I'm set and the two three five
and the support for this is 2
and 2 as well.
Now if we create the table C
for using four elements,
we going to have
only one item set,
which is 1 2 3 and 5 and
if you have a look at the table
here the transaction table one,
two, three and five
appears only one.
So the support is one
and since C for the support
of the whole table C
4 is less than 2 so
we're going to stop here
and return to the
previous item set
that It is 3 3
so the frequent itemsets have
1 3 5 and 2 3 5 now let's assume
our minimum confidence value is
60 percent for that.
We're going to generate
all the non-empty subsets
for each frequent itemsets.
Now for I equals 1 comma
3 comma 5 which is the item set.
We get the subset one three one
five three five one three
and five similarly
for 2 3 5 we get
to three to five
three five two three.
and five now this rule states
that for every subset s
of I the output of the rule
gives something like s gives i2s
that implies s recommends I of s
and this is only possible
if the support of I divided
by the support of s is greater
than equal to the minimum
confidence value now applying
these rules to the item set
of F3 we get rule 1 which is 1 3
gives 1 comma 3 comma 5 and 1/3
3 it means 1 and 3 gives 5
so the confidence is equal
to the support of 1 comma
3 comma fire driver support
of 1 comma 3 that equals 2 by 3
which is 66% and
which is greater
than the 60 percent.
So the rule 1 is selected now
if we come to rule 2
which is 1 comma 5 it gives
1 comma 3 comma 5 and 1 5
it means if we have
1 & 5 it implies.
We also going
to have three know.
Calculate the confidence
of this one.
We're going to have support
1 3 5 whereby support 1/5
which gives us a hundred percent
which means rule
2 is selected as well.
But again if you have a look
at rule 506 over here similarly,
if it's select 3 gives
1 3 5 & 3 it means
if you have three,
we also get one and five.
So the confidence for this comes
at 50% Which is less than
the given 60 percent Target.
So we're going to reject
this Rule and same.
Goes for the rule number six.
Now one thing to keep
in mind here is
that all those are rule
1 and Rule 5 look
a lot similar they are
not so it really depends
what's on the left
hand side of the arrow.
And what's on the right-hand
sides of the arrow.
It's the if-then possibility.
I'm sure you guys can understand
what exactly these rows are
and how to proceed
with this rules.
So, let's see
how we can implement
the same in Python, right?
So for that what I'm going
to do is create a new python.
and I'm going to use
the chapter notebook.
You're free to use
any sort of ID.
I'm going to name
it as a priority.
So the first thing
what we're going to do
is we will be using
the online transactional data
of retail store for
generating Association rules.
So firstly what we need to do
is get the pandas and ml x
10 libraries imported
and read the file.
So as you can see here,
we are using the online
retail dot xlsx format file
and from ml extant.
We're going to import a prairie
and Association rules at
all comes under MX 10.
So as you can see here,
we have the invoice
the stock quote
the description the quantity
the invoice data unit
price customer ID
and the country now
next in this step.
What we're going to do
is do data cleanup
which includes removing
the spaces from some
of the descriptions.
And drop the rules
that do not have invoice
numbers and remove
the great grab transactions
because that is of no use to us.
So as you can see here
at the output in which
we have like five hundred
and thirty two thousand rows
with eight columns.
So after the cleanup,
we need to consolidate the items
into one transaction per row
with each product for the sake
of keeping the data set small.
We are only looking
at the sales for France.
So as you can see here,
we have excluded all the other
says we're just looking
at the sales for France.
Now.
There are a lot
of zeros in the data.
But we also need to make sure
any positive values
are converted to 1
and anything less
than zero is set to 0
so as you can see here,
we are still 392 Rose.
We're going to
encode it and see.
Check again.
Now that you have structured
the data properly in this step.
What we're going to do is
generate frequent itemsets
that have support at
least seven percent,
but this number is chosen
so that you can get close enough
and generated rules
with the corresponding
support confidence and lift.
So go ahead you can see here.
The minimum support
is 0.71 of what
if we add another constraint
on the rules such as
the lift is greater than 6
and the conference
is greater than 0.8.
So as you can see here,
we have the left-hand side
and the right-hand side
of the association rule,
which is the antecedent
and the consequence.
We have the support.
We have the confidence
to lift the leverage
and the conviction.
So guys, that's it
for this session.
That is how you create
Association rules using the API.
Real gold tone which helps a lot
in the marketing business.
It runs on the principle
of Market Basket analysis,
which is exactly what big
companies like Walmart.
You have Reliance
and Target to even Ikea does it
and I hope you got
to know what exactly is
Association rule mining
what is lift confidence
and support and how to
create Association rules.
So guys reinforcement learning.
Dying is a part
of machine learning
where an agent is put
in an environment
and he learns to behave
in this environment
by performing certain actions.
Okay, so it basically performs
actions and it either gets
a rewards on the actions
or it gets a punishment
and observing the reward
which it gets from those actions
reinforcement learning is all
about taking an appropriate
action in order
to maximize the reward
in a particular situation.
So guys in supervised learning
the training data comprises
of the input
and the expected output
And so the model is trained
with the expected output itself,
but when it comes
to reinforcement learning,
there is no
expected output here.
The reinforcement agent
decides what actions
to take in order to perform
a given task in the absence
of a training data set.
It is bound to learn
from its experience itself.
Alright.
So reinforcement learning
is all about an agent
who's put in
an unknown environment
and he's going to use
a hit and trial method
in order to figure out
the environment and then come up
with an outcome.
Okay.
Now, let's look at it.
Reinforcement learning
within an analogy.
So consider a scenario
where in a baby is learning
how to walk the scenario
can go about in two ways.
Now in the first case
the baby starts walking
and makes it to the candy here.
The candy is basically
the reward it's going to get so
since the candy is
the end goal the baby is happy.
It's positive.
Okay, so the baby is happy
and it gets rewarded a set
of candies now another way
in which this could go is
that the baby starts walking
but Falls due to some hurdle
in between The baby gets hot
and it doesn't get any candy
and obviously the baby is sad.
So this is a negative reward.
Okay, or you can say
this is a setback.
So just like how we humans learn
from our mistakes by trial
and error reinforcement
learning is also similar.
Okay, so we have an agent
which is basically
the baby and a reward
which is the candy over here.
Okay, and with many hurdles
in between the agent is supposed
to find the best possible path
to read through the reward.
So guys.
I hope you all are clear with
the reinforcement learning now,
let's look at At the
reinforcement learning process.
So generally a reinforcement
learning system has
two main components, right?
The first is an agent
and the second one
is an environment.
Now in the previous case,
we saw that the agent was
the baby and the environment
was the living room
where in the baby was crawling.
Okay.
The environment is the setting
that the agent is acting
on and the agent over here
represents the reinforcement
learning algorithm.
So guys the reinforcement
learning process starts
when the environment
sends a state to the
And then the agent
will take some actions based
on the observations
in turn the environment
will send the next state
and the respective reward
back to the agent.
The agent will
update its knowledge
with the reward returned by
the environment and it uses
that to evaluate
its previous action.
So guys this
Loop keeps continuing
until the environment sends
a terminal state which means
that the agent has
accomplished all his tasks
and he finally gets the reward.
Okay.
This is exactly
what was depicted
in this scenario.
So the agent keeps
climbing up ladders
until he reaches his reward
to understand this better.
Let's suppose that our agent is
learning to play Counter Strike.
Okay.
So let's break it down
now initially the RL agent
which is basically
the player player 1.
Let's say it's a player one
who is trying to learn
how to play the game.
Okay.
He collects some state
from the environment.
Okay.
This could be the first date
of Counter-Strike now based
on the state the agent
will take some action.
Okay, and this action
can be anything
that causes a result.
So if the Almost left
or right it's also
considered as an action.
Okay, so initially the action
is going to be random
because obviously the first time
you pick up Counter-Strike,
you're not going
to be a master at it.
So you're going to try
with different actions
and you just want to pick up a
random action in the beginning.
Now the environment is going
to give a new state.
So after clearing
that the environment
is now going to give a new state
to the agent or to the player.
So maybe he's across th one now.
He's in stage 2.
So now the player
will get a reward
our one from the environment.
Because it cleared stage 1.
So this reward can be anything.
It can be additional points
or coins or anything like that.
Okay.
So basically this Loop
keeps going on
until the player is dead
or reaches the destination.
Okay, and it continuously
outputs a sequence
of States actions and rewards.
So guys, this was
a small example to show you
how reinforcement
learning process works.
So you start
with an initial State
and once a player clothes
that state he gets a reward
after that the environment
will give another stage
to the player.
And after it clears that state
it's going to get another award
and it's going to keep happening
until the player
reaches his destination.
All right, so guys,
I hope this is clear now,
let's move on and look
at the reinforcement
learning definitions.
So there are a few Concepts
that you should be aware
of while studying
reinforcement learning.
Let's look at those
definitions over here.
So first we have the agent
now an agent is basically
the reinforcement learning
algorithm that learns
from trial and error.
Okay, so an agent takes actions
like For example a soldier
in Counter-Strike navigating
through the game.
That's also an action.
Okay, if he moves left right
or if he shoots at somebody
that's also an action.
Okay.
So the agent is responsible
for taking actions
in the environment.
Now the environment is
the whole Counter-Strike game.
Okay.
It's basically the world
through which the agent
moves the environment takes
the agents current state
and action as input
and it Returns the agency reward
and its next state as output.
Alright next we have action
now all the possible.
Steps that an agent
can take are called actions.
So like I said,
it can be moving right left
or shooting or any of that.
Alright, then we have
state now state is
basically the current condition
returned by the environment.
So whichever State you are in
if you are in state 1 or
if you're in state
to that represents
your current condition.
All right.
Next we have reward a reward
is basically an instant return
from the environment
to appraise Your Last Action.
Okay, so it can be
anything like coins
or it can be audition.
Two points.
So basically a reward
is given to an agent
after it clears
the specific stages.
Next we have policy policies
basically the strategy
that the agent uses to find
out his next action based
on his current state policy is
just the strategy with which you
approach the game.
Then we have value.
Now while you is
the expected long-term return
with discount so value
in action value can be a little
bit confusing for you right now,
but as we move further,
you'll understand
what I'm talking.
Kima okay.
So value is basically
the long-term return
that you get with discount.
Okay discount.
I'll explain in
the furthest lines.
Then we have action value
now action value
is also known as Q value.
Okay.
It's very similar to Value
except that it takes
an extra parameter,
which is the current action.
So basically here you'll find
out the Q value depending
on the particular action
that you took.
All right.
So guys don't get confused
with value and action value.
We look at examples
in the further slides and you
will understand this better.
Okay.
So guys make sure that you're
familiar with these terms
because you'll be seeing
a lot of these terms
in the further slides.
All right.
Now before we move any further,
I'd like to discuss
a few more Concepts.
Okay.
So first we will discuss
the reward maximization.
So if you haven't already
realized it the basic aim
of the RL agent is
to maximize the reward now,
how does that happen?
Let's try to understand
this in depth.
So the agent must be
trained in such a way
that he takes the best action so
that the reward is
Because the end goal
of reinforcement learning
is to maximize your reward
based on a set of actions.
So let me explain this
with a small game now
in the figure you can see
there is a fox there's some meat
and there's a tiger
so our agent is basically
the fox and his end goal
is to eat the maximum amount
of meat before being eaten
by the tiger now
since the fox is a clever
fellow he eats the meat
that is closer to him
rather than the meat
which is closer to the tiger.
Now this is because the
closer he is to the tiger the
higher our his chances
of getting killed.
So because of this the rewards
which are near the tiger,
even if they are
bigger meat chunks,
they will be discounted.
So this is exactly
what discounting means
so our agent is not going
to eat the meat chunks
which are closer to the tiger
because of the risk.
All right now,
even though the meat chunks
might be larger.
He does not want to take
the chances of getting killed.
Okay.
This is called discounting.
Okay.
This is where you discount
because it improvise
and you just eat the meat
which are closer to you
instead of taking risks
and eating the meat
which are The to your opponent.
All right.
Now the discounting
of reward Works based
on a value called gamma
will be discussing gamma
in our further slides
but in short the value
of gamma is between 0 and 1.
Okay.
So the smaller the gamma the
larger is the discount value.
Okay.
So if the gamma value is lesser,
it means that the agent
is not going to explore
and he's not going
to try and eat the meat chunks
which are closer to the tiger.
Okay, but if the gamma value
is closer to 1 it means
that our agent is actually
We're going to explore
and it's going to dry
and eat the meat chunks
which are closer to the tiger.
All right, now,
I'll be explaining this
in depth in the further slides.
So don't worry
if you haven't got
a clear concept yet,
but just understand
that reward maximization is
a very important step
when it comes
to reinforcement learning
because the agent has
to collect maximum rewards
by the end of the game.
All right.
Now, let's look
at another concept
which is called exploration
and exploitation.
So exploration like
the name suggests is
about exploring and capturing.
More information about
an environment on the other
hand exploitation is
about using the already
known exploited information
to heighten the rewards.
So guys consider the fox
and tiger example
that we discussed now here the
fox eats only the meat chunks
which are close to him,
but he does not eat
the meat chunks
which are closer to the tiger.
Okay, even though they
might give him more Awards.
He does not eat them
if the fox only focuses
on the closest rewards,
he will never reach
the big chunks of meat.
Okay, this is what
exploitation is the
about you just going to use
the currently known information
and you're going
to try and get rewards based
on that information.
But if the fox decides
to explore a bit,
it can find the bigger award
which is the big chunks of meat.
This is exactly
what exploration is.
So the agent is not going
to stick to one corner instead.
He's going to explore
the entire environment and try
and collect bigger rewards.
All right, so guys,
I hope you all are clear with
exploration and exploitation.
Now, let's look
at the markers decision process.
So guys this is basically
a mathematical approach
for mapping a solution in
reinforcement learning in a way.
The purpose of reinforcement
learning is to solve
a Markov decision process.
Okay.
So there are a few parameters
that are used to get
to the solution.
So the parameters include
the set of actions the set
of states the rewards the policy
that you're taking to approach
the problem and the value
that you get.
Okay, so to sum it up
the agent must take
an action a to transition
from a start state.
The end State s while doing
so the agent will receive
a reward are for each action
that he takes.
So guys a series
of actions taken by
the agent Define the policy
or it defines the approach
and the rewards
that are collected
Define the value.
So the main goal here is
to maximize the rewards
by choosing the optimum policy.
All right.
Now, let's try to understand
this with the help
of the shortest path problem.
I'm sure a lot of you might
have gone through this problem
when you are in college.
So guys look
at the graph over here.
So our aim here is
to find the shortest path
between a and d
with minimum possible cost.
So the value that you see
on each of these edges
basically denotes the cost.
So if I want to go from a to c
it's going to cost me 15 points.
Okay.
So let's look at
how this is done.
Now before we move
and look at the problem
in this problem the set of
states are denoted by the nodes,
which is ABCD
and the action is to Traverse
from one node to the other.
So if I'm going from a Be
that's an action
similarly a to see
that's an action.
Okay, the reward is
basically the cost
which is represented
by each Edge over here.
All right.
Now the policy is
basically the path
that I choose to
reach the destination.
So let's say I choose
a seed be okay
that's one policy in order
to get to D and choosing a CD
which is a policy.
Okay.
It's basically how
I'm approaching the problem.
So guys here you
can start off at node a
and you can take baby steps
to your destination now
initially you're Clueless.
So you can just take
the next possible node,
which is visible to you.
So guys if you're smart enough,
you're going to choose a
to see instead of ABCD or ABD.
All right.
So now if you are
at nodes see you want
to Traverse to note D. You
must again choose a wise path
or red you just have
to calculate which path
has the highest cost
or which path will give
you the maximum rewards.
So guys, this is
a simple problem.
We just drank to calculate
the shortest path between a
and d by traversing
through these nodes.
So if I travels from a CD it
gives me the maximum reward.
Okay, it gives me 65
which is more than any other
policy would give me okay.
So if I go from ABD,
it would be 40 when you
compare this to a CD.
It gives me more reward.
So obviously I'm going
to go with a CB.
Okay, so guys was
a simple problem
in order to understand how
Markov decision process works.
All right, so guys,
I want to ask you a question.
What do you think?
I did hear did
I perform exploration
or did I perform exploitation?
Now the policy for the above
example is of exploitation
because we didn't explore
the other nodes.
Okay.
We just selected three notes
and we Traverse through them.
So that's why this
is called exploitation.
We must always explore
the different notes
so that we can find
a more optimal policy.
But in this case, obviously
a CD has the highest reward
and we're going with a CD,
but generally it's
not so simple.
There are a lot of nodes there
hundreds of notes to Traverse
and they're like 50 60 policies.
Okay, 50 60 different policies.
So you make sure you explore.
All the policies and then decide
on an Optimum policy
which will give you
a maximum reward.
So guys before we perform
the Hands-On part.
Let's try to understand
the math behind our demo.
Okay.
So in our demo will be using
the Q learning algorithm
which is a type of reinforcement
learning algorithm.
Okay, it's simple,
it just means that if you
take the best possible actions
to reach your goal
or to get the most rewards.
All right, let's try to
understand this with an example.
So guys, this is exactly
what be running in In our demo,
so make sure you
understand this properly.
Okay.
So our goal here is
we're going to place an agent
in any one of the rooms.
Okay.
So basically these squares
you see here our rooms.
OK 0 is a room
for is a room three is
a room one is a room
and 2:05 is also a room.
It's basically a way
outside the building.
All right.
So what we're going to do is
we're going to place an agent
in any one of these rooms
and the goal is to reach
outside the building.
Okay outside.
The building is
room number five.
Okay, so these are These spaces
are basically doors,
which means that you can go
from zero to four.
You can go from 4
to 3 3 to 1 1 to 5
and similarly 3 to 2,
but you can't go
from 5 to 2 directly.
All right, so there
are certain set of rooms
that don't get
connected directly.
Okay.
So like of mentioned here each
room is numbered from 0 to 4,
and the outside of the building
is numbered as five and one
thing to note here
is Room 1 and room
for directly lead
to room number five.
All right.
So room number one and four
will directly lead out
to room number five.
So basically our goal over here
is to get to room number five.
Okay to set this room
as a goal will associate
a reward value to each door.
Okay.
Don't worry.
I'll explain what I'm saying.
So if you re present these rooms
in a graph this is
how the graph is going to look.
Okay.
So for example from true,
you can go to three
and then three two,
one one two five
which will lead us to our goal
these arrows represent the link
between the dose.
No, this is quite
understandable now.
Our next step is
to associate a reward value
to each of these doors.
Okay, so the rooms
that are directly connected
to our end room,
which is room number five will
get a reward of hundred.
Okay.
So basically our room number
one will have a reward five now.
This is obviously
because it's directly
connected to 5 similarly
for will also be associated
with a reward of hundred
because it's directly
connected to 5.
Okay.
So if you go out
from for it will lead
to five now the other know.
Roads are not directly
connected to 5.
So you can't directly
go from 0 to 5.
Okay.
So for this will be assigning
a reward of zero.
So basically other doors
not directly connected
to the Target room
have a zero reward.
Okay now because the doors
are to weigh the two arrows
are assigned to each room.
Okay, you can see two arrows
assigned to each room.
So basically zero leads to four
and four leads back to 0 now.
We have assigned 0 0 over here
because 0 does not directly
lead to five but one
directly leads to Five
and that's why you can see
a hundred over here similarly
for directly leads
to our goal State and
that's why we were signed
a hundred over here
and obviously five two five
is hundred as well.
So here all the direct
connections to room number
five are rewarded hundred
and all the indirect connections
are awarded zero.
So guys in q-learning the end
goal is to reach the state
with the highest reward
so that the agent
arrives at the goal.
Okay.
So let me just explain
this graph to you
in detail now these These rooms
over here labeled one, two,
three to five they represent
the state an agent is in so
if I stay to one It means
that the agent is
in room number one similarly
the agents movement
from one room to the other
represents the action.
Okay.
So if I say one two, three,
it represents an action.
All right.
So basically the state
is represented as node
and the action is represented
by these arrows.
Okay.
So this is what this graph is
about these nodes represent
the rooms and these Arrows
represent the actions.
Okay.
Let's look at a small example.
Let's set the initial
state to 0.
So my agent is placed
in room number two,
and he has to travel all the way
to room number five.
So if I set the initial stage
to to he can travel to State 3.
Okay from three he
can either go to one
or you can go back to to
or you can go to for
if he chooses to go to
for it will directly take
him to room number 5, okay,
which is our end goal and even
if he goes from room number
3 2 1 it will take
him to room number.
High five, so this is
how our algorithm works is going
to drivers different rooms.
In order to reach
the Gold Room,
which is room number 5.
Now, let's try
and depict these rewards
in the form of a matrix.
Okay, because we'll be
using this our Matrix
or the reward Matrix to
calculate the Q value
or the Q Matrix.
Okay.
We'll see what the Q value is
in the next step.
But for now,
let's see how this reward
Matrix is calculated.
Now the -
ones that you see
in the table,
they represent the null values.
Now these -1 basically means
that Wherever there is
no link between nodes.
It's represented as minus 1 so 0
2 0 is minus 1 0 to 1
there is no link.
Okay, there's no direct
link from 0 to 1.
So it's represented as
minus 1 similarly 0 to 2 or 2.
There is no link.
You can see there's
no line over here.
So this is also minus 1,
but when it comes to 0 to 4,
there is a connection
and we have numbered 0
because the reward for a state
which is not directly connected
to the goal is zero,
but if you look
at this 1 comma 5
which is is basically traversing
from Node 1 to node 5, you
can see the reward is hundred.
Okay, that's basically
because one and five
are directly connected
and five is our end goal.
So any node
which will directly connected
to our goal state will get
a reward of hundred.
Okay.
That's why I've put hundred
over here similarly.
If you look at the
fourth row over here.
I've assigned hundred over here.
This is because from 4 to 5
that is a direct connection.
There's a direct connection
which gives them
a hundred reward.
Okay, you can see from 4 to 5.
There is a direct link.
Okay, so from room number
for to room number
five you can go directly.
That's why there's
a hundred reward over here.
So guys, this is
how the reward Matrix is made.
Alright, I hope this
is clear to you all.
Okay.
Now that we have
the reward Matrix.
We need to create another Matrix
called The Q Matrix.
OK here, you'll store
or the Q values
that will calculate now
this Q Matrix basically
represents the memory of
what the agent has learned
through experience.
Okay.
So once he traverses
from one room to the final room,
whatever he's learned.
It is stored in this Q Matrix.
Okay, in order
for him to remember
that the next time he travels
this we use this Matrix.
Okay.
It's basically like a memory.
So guys the rows of the Q Matrix
will represent the current state
of the agent The Columns will
represent the possible actions
and to calculate the Q value
use this formula.
All right, I'll show you
what the Q Matrix looks like,
but first, let's
understand this formula.
Now this Q value
will calculating because we
want to fill in the Q Matrix.
Okay.
So this is basically a Matrix
over here initially,
it's all 0
but as the agent Traverse is
from different nodes
to the destination node.
This Matrix will get filled up.
Okay.
So basically it will be
like a memory to the agent.
He'll know that okay,
when he traversed using
a particular path,
he found out
that his value was maximum or
as a reward was maximum of year.
So next time he'll
choose that path.
This is exactly what
the Q Matrix is.
Okay.
Let's go back now guys,
don't worry about
this formula for now
because we'll be implementing
this formula in an example.
In the next slide.
Okay, so don't worry
about this formula for now,
but here just remember
that this Q basically represents
the Q Matrix the r represents
the reward Matrix
and the gamma is the gamma value
which I'll talk about shortly
and here you just finding out
the maximum from the Q Matrix.
So basically the gamma parameter
has a range from 0 to 1
so you can have a value of
0.1 0.3 0.5 0.8 and all of that.
So if the gamma is closer
to zero it means
That the agent will consider
only the immediate
rewards which means
that the agent will
not explore the surrounding.
Basically, it won't
explore different rooms.
It will just choose
a particular room
and then we'll try
sticking to it.
But if the value of gamma
is high meaning that
if it's closer to one the agent
will consider future Awards
with greater weight.
This means that the agent
will explore all
the possible approaches
or all the possible policies
in order to get to the end goal.
So guys, this is what I
was talking about when I
mention ation and exploration.
All right.
So if the gamma value is closer
to 1 it basically means
that you're actually exploring
the entire environment
and then choosing
an Optimum policy.
But if your gamma value
is closer to zero,
it means that the agent
will only stick
to a certain set of policies
and it will calculate
the maximum reward based
on those policies.
Now next.
We have the Q learning algorithm
that we're going to use
to solve this problem.
So guys now this is going
to look very confusing to y'all.
So let me just explain
In this with an example.
Okay.
We'll see what we're actually
going to run in our demo.
We will do the math behind it.
And then I'll tell you what
this Q learning algorithm is.
Okay, you'll understand it
as I'm showing you the example.
So guys in the Q learning
algorithm the agent learns
from his experience.
Okay, so each episode,
which is basically
when the agents are traversing
from an initial room
to the end goal is equivalent
to one training session
and in every training session
the agent will explore
the environment it
will Receive some reward
until it reaches the goal state
which is five.
So there's a purpose
of training is to enhance
the brain of our agent.
Okay only if he knows
the environment very well,
will he know
which action to take
and this is why we calculate
the Q Matrix okay in Q Matrix,
which is going to calculate
the value of traversing
from every state to the end
state from every initial room
to the end room.
Okay, so when we
calculate all the values
or how much reward
we're getting from each policy
that we We know
the optimum policy
that will give us
the maximum reward.
Okay, that's why
we have the Q Matrix.
This is very important
because the more
you train the agent
and the more Optimum your output
will be so basically here
the agent will not perform
exploitation instead.
He'll explore around
and go back and forth
through the different rooms
and find the fastest
route to the goal.
All right.
Now, let's look at an example.
Okay.
Let's see how
the algorithm works.
Okay.
Let's go back
to the previous slide
and Here it says
that the first step is
to set the gamma parameter.
Okay.
So let's do that.
Now the first step
is to set the value
of the learning parameter,
which is gamma and we
have randomly set it
to zero point eight.
Okay.
The next step is to initialize
the Matrix Q 2 0 Okay.
So we've set Matrix Q
2 0 over here and then we
will select the initial stage
Okay, the third step is select
a random initial State and here
we've selected the initial State
as room number one.
Okay.
So after you initialize
the matter Q as a zero Matrix
from room number one,
you can either go to room number
three or number five.
So if you look
at the reward Matrix can see
that from room number one,
you can only go to room number
three or room number five.
The other values
are minus 1 here,
which means that there is
no link from 1 to 0 1
2 1 1 2 2 and 1 to 4.
So the only possible actions
from room number one is to go
to room number 3 and to go
to room number five.
All right.
Okay.
So let's select
room number five, okay.
So from room number one,
you can go to 3 and 5 and we
have randomly selected five.
You can also select
three but for example,
let's select five over here.
Now from Rome five,
you're going to calculate
the maximum Q value
for the next state based
on all possible actions.
So from number five,
the next state can be
room number one four or five.
So you're going to calculate
the Q value for traversing
5 to 1 5 2 4 5 2 5
and you're going to find out
which has the maximum Q value
and that's how you're going.
Compute the Q value.
So let's Implement our formula.
Okay, this is
the q-learning formula.
So right now we're traversing
from room number
one to room number 5.
Okay.
This is our state.
So here I've written
Q 1 comma 5.
Okay one represents
our current state
which is room number one.
Okay.
Our initial state was room
number one and we are traversing
to room number five.
Okay.
It's shown in this figure room
number 5 now for this we need
to calculate the Q value
next in our formula.
It says the reward
Matrix State and action.
So the reward Matrix for 1 comma
5 let's look at 1 comma
5 1 comma 5 corresponds
to a hundred.
Okay, so I reward over
here will be hundred so
r 1 comma 5 is basically
hundred then you're going
to add the gamma value.
Now the gamma value
will be initialized it
to zero point eight.
So that's what we
have written over here.
And we're going to multiply it
with the maximum value
that we're going to get
for the next date based
on all possible actions.
Okay.
So from 5,
the next state is 1 4 and 5.
So if Travis from five to one
that's what I've written
over here 5 to 4.
You're going to calculate the Q
value of Fire 2 4 & 5 to 5.
Okay.
That's what I
mentioned over here.
So Q 5 comma 1 5 comma 4
and 5 comma 5 are
the next possible actions
that you can take from State V.
So r 1 comma 5 is hundred.
Okay, because from
the reward Matrix,
you can see that 1 comma
5 is hundred 0.8 is the value
of gamma after that.
We will calculate Q
of 5 comma 1 5 comma
4 and 5 comma 5 Like
I mentioned earlier
that we're going to initialize
Matrix Q as zero Matrix
So based setting the value of 0
because initially obviously
the agent doesn't have
any memory of what is happening.
Okay, so he just
starting from scratch.
That's why all
these values are 0 so Q
of 5 comma 1 will obviously
be 0 5 comma 4 would be 0
and 5 comma 5 will also be zero
and to find out the maximum
between these it's obviously 0.
So when you
compute this equation,
you will get hundred so
the Q value of 1 comma 5 is
So if I agent goes from room
number one to room number five,
he's going to have
a maximum reward
or Q value of hundred.
All right.
Now in the next
slide you can see
that I've updated the value
of Q of 1 comma 5.
Okay, it said 200.
All right now similarly,
let's look at another example so
that you understand this better.
So guys, this is exactly
what we're going
to do in our demo.
It's only going to be coded.
Okay.
I'm just explaining
our code right now.
I'm just telling you
the math behind it.
Alright now, let's look
at another example.
Example OK this time.
We'll start with a randomly
chosen initial State.
Let's say that
we've chosen State 3.
Okay.
So from room 3,
you can either go
to room number one two,
or four randomly
will select room number
one and from room number one,
you're going to calculate
the maximum Q value
for the next state based
on all possible actions.
So the possible actions
from one is to go to 3
and to go to 5 now
if you calculate the Q value
using this formula,
so let me explain this
to you once again now,
3 comma 1 basically represents
that we're in room number
three and we are going
to room number one.
Okay.
So this represents our action?
Okay.
So we're going from 3 to 1
which is our action
and three is our current state
next we will look at the reward
of going from 3 to 1.
Okay, if you go to the reward
Matrix 3 comma 1 is 0 okay.
Now this is
because there's no direct link
between three and five.
Okay, so that's why
the reward here is zero.
So the value here will be 0
after that we have
the gamma value,
which is zero point.
Eight and then we're going
to calculate the Q Max
of 1 comma 3 and 1 comma
5 out of these whichever
has the maximum value
we're going to use that.
Okay, so Q of 1 comma 3 is 0.
All right 0 you can see
here 1 comma 3 is 0
and 1 comma 5 if you
remember we just calculated
1 comma 5 in the previous slide.
Okay 1 comma 5 is hundred.
So here I'm going
to put a hundred.
So the maximum here is hundred.
So 0.8 in 200 will give us c t
so that's the Q value.
Going to get if you Traverse
from three two one.
Okay.
I hope that was clear.
So now we have Travers
from room number
three to room number
one with the reward of 80.
Okay, but we still
haven't reached the end goal
which is room number five.
So for our next episode
the state will be room.
Number one.
So guys, like I said,
we'll repeat this in a loop
because room number
one is not our end goal.
Okay, our end goal
is room number 5.
So now we need to figure out
how to get from room number
one to room number 5.
So from room number one,
you can either either go
to three or five.
That's what I've
drawn over here.
So if we select five we know
that it's our end goal.
Okay.
So from room number 5,
then you have to calculate
the maximum Q value
for the next possible actions.
So the next possible actions
from five is to go
to room number one room number
four or room number five.
So you're going to calculate
the Q value of 5 to 1 5 2 4 & 5
2 5 and find out
which is the maximum Q value
here and you're going
to use that value.
All right.
So let's look
at the formula now now again,
we're in room number
one and Want to go
to room number 5.
Okay, so that's exactly
what I've written here Q 1 comma
5 next is the reward Matrix.
So reward of 1 comma
5 which is hundred.
All right, then we have added
the gamma value which is 0.8.
And then we're going
to find the maximum Q value
from 5 to 1 5 2 4 & 5 to 5.
So this is what
we're performing over here.
So 5 comma 1 5 comma 4
and 5 comma 5 are all 0 this is
because we initially set all
the values of the Q Matrix as 0
so you get Hundred over here
and the Matrix Remains the Same
because we already
had calculated Q 1 comma 5
so the value of 1 comma
5 is already fed to the agent.
So when he comes back here,
he knows our okay.
He's already done
this before now.
He's going to try
and Implement another method.
Okay is going to try
and take another route
or another policy.
So he's going to try to go
from different rooms
and finally land up
in room number 5,
so guys, this is exactly
how our code runs.
We're going to Traverse
through each and every node
because we want an Optimum ball.
See, okay.
An Optimum policy
is attained only
when you Traverse
through all possible actions.
Okay.
So if you go through
all possible actions
that you can perform only
then will you understand
which is the best action
which will lead us
to the reward.
I hope this is clear now,
let's move on
and look at our code.
So guys, this is our code
and this is executed in Python
and I'm assuming
that all of you have
a good background in Python.
Okay, if you don't understand
python very well.
I'm going to leave a link
in the description.
You can check out
that video on Python
and then maybe come
back to this later.
Okay, but I'll be explaining
the code to you anyway,
but I'm not going to spend a lot
of time explaining each
and every line of code
because I'm assuming
that you know python.
Okay.
So let's look at the first line
of code over here.
So what we're going to do is
we're going to import numpy.
Okay numpy is basically
a python library
for adding support for
large multi-dimensional arrays
and matrices and it's
basically for computing
mathematical functions.
Okay so first Want to import
that after that we're going
to create the our Matrix.
Okay.
So this is the our Matrix next
we're going to create a q Matrix
and it's a 6 into 6 Matrix
because obviously we have
six states starting from 0 to 5.
Okay, and we are going
to initialize the value to zero.
So basically the Q Matrix
is going to be initialized
to zero over here.
All right,
after that we're setting
the gamma parameter to 0.8.
So guys you can play
with this parameter
and you know move it
to 0.9 or movement logo to 0.8.
Okay, you can see see
what happens then then
we'll set an initial stage.
Okay initial stage
is set as 1 after that.
We're defining a function
called available actions.
Okay.
So basically what
we're doing here is
since our initial state is one.
We're going to check
our row number one.
Okay, this is
our own number one.
Okay.
This is wrong number zero.
This is zero number
one and so on.
So we're going to check the row
number one and we're going
to find the values
which are greater
than or equal to 0
because these values
basically The nodes that
we can travel to now
if you select minus 1
you can Traverse 2-1.
Okay, I explained
this earlier the -
one represents all the nodes
that we can travel to but we
can travel to these nodes.
Okay.
So basically over here
a checking all the values
which are equal to 0
or greater than 0 these
will be our available actions.
So if our initial state is one
we can travel to other states
whose value is equal to 0
or greater than 0
and this is stored
in this variable called.
All available act right now.
This will basically get
the available actions
in the current state.
Okay.
So we're just storing
the possible actions
in this available
act variable over here.
So basically over here
since our initial state is
one we're going to find out
the next possible States
we can go to okay
that is stored
in the available act variable.
Now next is this function
chooses at random which action
to be performed
within the range.
So if you remember over here,
so guys initially we
are in stage number.
Okay are available actions is
to go to stage number
3 or stage number five.
Sorry room number
3 or room number 5.
Okay.
Now randomly, we need
to choose one room.
So for that using
this line of code, okay.
So here we are randomly going
to choose one of the actions
from the available act
this available act.
Like I said earlier stores
all our possible actions.
Okay from the initial State.
Okay.
So once it chooses an action
is going to store it
in next action,
so guys this action will Present
the next available action to
take now next is our Q Matrix.
Remember this formula
that we used.
So guys this formula
that we use is
what we are going to calculate
in the next few lines of code.
So in this block of code,
which is executing
and Computing the value of Q.
Okay, this is our formula
for computing the value
of Q current state Karma action.
Our current state Karma action
gamma into the maximum value.
So here basically
we're going to calculate
the maximum index meaning
that To be going to check
which of the possible
actions will give us
the maximum Q value read
if you remember
in our explanation over here
this value over here Max Q
or five comma 1 5 comma 4
and 5 comma 5 we had
to choose a maximum Q value
that we get from these three.
So basically that's exactly
what we're doing
in this line of code,
the calculating the index
which gives us the maximum value
after we finish Computing
the value of Q will just
have to update our Matrix.
After that, we'll be
updating the Q value
and will be choosing
a new initial State.
Okay.
So this is the update function
that is defined over here.
Okay.
So I've just called
the function over here.
So guys this whole set of code
will just calculate the Q value.
Okay.
This is exactly what we did
in our examples after that.
We have the training phase.
So guys remember the more
you train an algorithm the
better it's going to learn.
Okay so over here
I have provided
around 10,000 titrations.
Okay.
So my range is
10 thousand iterations meaning
that my age It will take
10,000 possible scenarios
and in go to 10,000 titrations
to find out the best policy.
So you're exactly
what I'm doing is I'm choosing
the current state
randomly after that.
I'm choosing the available
action from the current state.
So either I can go to stage
3 or straight five then
I'm calculating the next action
and then I'm finally
updating the value
in the Q Matrix and next.
We just normalize the Q Matrix.
So sometimes in our Q Matrix
the value might exceed.
Okay, let's say it.
Heated to 500 600 so
that time you want
to normalize The Matrix.
Okay, we want to bring
it down a little bit.
Okay, because larger numbers
we won't be able to understand
and computation would be
very hard on larger numbers.
That's why we
perform normalization.
You're taking your calculated
value and you're dividing it
with the maximum Q value in 200.
All right, so you
are normalizing it over here.
So guys, this is
the testing phase.
Okay here you will just randomly
set a current state and you
want given any other data
because you've already
trained our model.
Okay, you're To give
a Garden State then
you're going to tell your agent
that listen you're in room.
Number one.
Now.
You need to go
to room number five.
Okay, so he has to figure out
how to go to room number 5
because we have trained him now.
All right.
So here we have set
the current state to one
and we need to make sure
that it's not equal to 5
because 5 is the end goal.
So guys this is the same Loop
that we executed earlier.
So we're going to do
the same I trations again now
if I run this entire code,
let's look at the result.
So our current state
here we've chosen as one.
Okay and And if we go
back to our Matrix,
you can see that there is
a direct link from 1 to 5,
which means that the route
that the agent should
take is one to five.
Okay directly.
You should go from 1 to 5
because it will get
the maximum reward like that.
Okay.
Let's see if that's happening.
So if I run this it should give
me a direct path from 1 to 5.
Okay, that's exactly
what happened.
So this is the selected path
so directly from one to five
it went and it calculated
the entire Q Matrix.
Works for me.
So guys this is exactly
how it works.
Now.
Let's try to set
the initial stage
as that's a to so
if I set the initial stage as
to and if I try to run the code,
let's see the path
that it gives so
the selected path is
2 3 4 5 now chose this path
because it's giving
us the maximum reward
from this path.
Okay.
This is the Q Matrix
that are calculated
and this is the selected path.
All right, so guys with this we
come to the end of this demo.
So basically what we did
was we just placed an agent
in a room random room
and we ask it to Traverse
through and reach
to the end room,
which is room number five.
So basically we trained
our agent and we made sure
that it went through all
the possible paths.
to calculate the best
path the for a robot
and environment is a place
where it has been put to use.
Now.
Remember this reward is
itself the agent for example
an automobile Factory
where a robot is used
to move materials
from one place to another now
the task we discussed just now
have a property in common.
Now, these tasks involve
and environment and expect
the agent to learn
from the environment.
Now, this is where traditional
machine learning phase
and hence the need
for reinforcement learning now,
it is good to have Establish
overview of the problem
that is to be solved
using the Q learning
or the reinforcement learning.
So it helps to define
the main components
of a reinforcement
learning solution.
That is the agent environment
action rewards and States.
So let's suppose we are to build
a few autonomous robots for
an automobile building Factory.
Now, these robots will help
the factory personal
by conveying them
the necessary parts
that they would need
in order to pull the car.
Now these different
parts are located
at Nine different positions
within the factory warehouse
the car part include the chassis
Wheels dashboard the engine
and so on and
the factory workers
have prioritized the location
that contains the body
or the chassis to be
the topmost but they
have provided the priorities
for other locations as well,
which will look into the moment.
Now these locations
within the factory look
somewhat like this.
So as you can see here,
we have L1 L2 L3
all of these stations.
Now one thing you
might notice here
that there are little obstacle
prison in between the locations.
So L6 is the top
priority location
that contains the chassis
for preparing the car bodies.
Now the task is
to enable the robots
so that they can find
the shortest route
from any given location to
another location on their own.
Now the agents in this case are
the robots the environment is
the automobile factory
warehouse the let's talk
about the state's the states.
Are the location in which
a particular robot is present
in the particular
instance of time
which will denote it states
the machines understand numbers
rather than let us so let's map
the location codes to number.
So as you can see here,
we have map location l
1 to this t 0 L 2 and 1
and so on we have L8 as
state 7 + L line at state.
So next what we're going to talk
about are the actions.
So in our example,
the action will be the direct
location that a robot can.
Call from a particular location,
right consider a robot
that is a tel to location
and the Direct locations
to which it can move
our L5 L1 and L3.
Now the figure here may come
in handy to visualize this now
as you might have already
guessed the set of actions
here is nothing but the set
of all possible states
of the robot for each location
the set of actions
that a robot can take
will be different.
For example, the set
of actions will change
if the robot is.
An L1 rather than L2.
So if the robot is in L1,
it can only go to L
4 and L 2 directly now
that we are done with the states
and the actions.
Let's talk about the rewards.
So the states are
basically zero one two,
three four and the
actions are also 0 1
2 3 4 up till 8:00.
Now, the rewards now
will be given to a robot.
If a location
which is the state
is directly reachable
from a particular location.
So let's take an example
suppose l Lane is
directly reachable from L8.
Right?
So if a robot goes from LA
to align and vice versa,
it will be rewarded by one
and if a location is
not directly reachable
from a particular equation.
We do not give any reward
a reward of 0 now the reward
is just a number
and nothing else it enables
the robots to make sense
of the movements helping them
in deciding what locations
are directly reachable
and what are not now
with this Q. We
can construct a reward table
which contains all the required.
Use mapping between
all possible States.
So as you can see here
in the table the positions
which are marked green
have a positive reward.
And as you can see here,
we have all the possible rewards
that a robot can get by moving
in between the different states.
Now comes an
interesting decision.
Now remember that the factory
administrator prioritized L6
to be the topmost.
So how do we incorporate this
fact in the above table now,
this is done by associating
the topmost priority location
with a very high reward.
The usual ones so let's put 999
in the cell L 6 comma
and six now the table
of rewards with a higher reward
for the topmost location
looks something like this.
We have not formally defined
all the vital components
for the solution.
We are aiming for
the problem discussed now,
we will shift gears
a bit and study some
of the fundamental concepts
that Prevail in the world
of reinforcement learning
and q-learning the first
of all we'll start
with the Bellman equation now
consider the following Square.
Rooms, which is analogous
to the actual environment
from our original problem.
But without the barriers now
suppose a robot needs to go
to the room marked
in the green
from its current position a
using the specified Direction.
Now, how can we enable the robot
to do this programmatically
one idea would be introduced
some kind of a footprint
which the robot will be able
to follow now here
a constant value is specified
in each of the rooms,
which will come
along the robots way
if it follows the directions
by Fight about now in this way
if it starts at location
a it will be able to scan
through this constant value
and will move accordingly
but this will only work
if the direction is prefix
and the robot always starts
at the location a now
consider the robot starts
at this location rather
than its previous one.
Now the robot
now sees Footprints
in two different directions.
It is therefore unable
to decide which way to go
in order to get the destination
which is the Green Room.
It happens.
Primarily because the robot
does not have a way to remember
the directions to proceed.
So our job now is to enable
the robot with a memory.
Now, this is where the Bellman
equation comes into play.
So as you can see here,
the main reason
of the Bellman equation
is to enable the reward
with the memory.
That's the thing
we're going to use.
So the equation goes
something like this V
of s gives maximum a r
of s comma a plus gamma of vs -
where s is a particular state
Which is a room is
the Action Moving
between the rooms as -
is the state to which
the robot goes from s
and gamma is the discount Factor
now we'll get
into it in a moment
and obviously R of s comma
a is a reward function
which takes a state as an action
a and outputs the reward now V
of s is the value of being
in a particular state
which is the footprint
now we consider all
the possible actions
and take the one
that yields the maximum value.
Now there is one constraint.
However regarding
the value footprint
that is the room marked
in the yellow just
below the Green Room.
It will always have
the value of 1 to denote
that is one of the nearest room
adjacent to the green room.
Now.
This is also to ensure
that a robot gets a reward
when it goes from a yellow room
to The Green Room.
Let's see how to make
sense of the equation
which we have here.
So let's assume
a discount factor of 0.9
as remember gamma is
the discount value
or the discount Factor.
So let's Take a 0.9.
Now for the room,
which is Mark just below the one
or the yellow room,
which is the Aztec Mark
for this room.
What will be the V of s
that is the value of being
in a particular state?
So for this V of s
would be something
like maximum of a will take 0
which is the initial
of our s comma.
Hey plus 0.9
which is gamma into 1
that gives us zero point
nine now here the robot
will not get any reward
for Owing to a state marked
in yellow hence the IR s
comma a is 0 here
but the robot knows the value
of being in the yellow room.
Hence V of s Dash is
one following this
for the other states.
We should get 0.9 then again,
if we put 0.9 in this equation,
we get 0.81 then zero point
seven to nine and then we again
reached the starting point.
So this is
how the table looks with
some value Footprints computer.
From the Bellman equation now
a couple of things
to notice here is
that the max function
has the robot to always
choose the state
that gives it the maximum value
of being in that state now
the discount Factor
gamma notifies the robot
about how far it is
from the destination.
This is typically specified by
the developer of the algorithm.
That would be installed
in the robot.
Now, the other states can also
be given their respective values
in a similar way.
So as you can see here the boxes
Into the green one have one and
if we move away from one we
get 0.9 0.8 1 0 1 7 to 9.
And finally we reach
0.66 now the robot now
can precede its way
through the Green Room utilizing
these value Footprints event
if it's dropped
at any arbitrary room
in the given location now,
if a robot Lance up in
the highlighted Sky Blue Area,
it will still find
two options to choose
from but eventually
either of the parties.
It's will be good enough
for the robot to take
because Auto V
the value Footprints
are not only that out.
Now one thing to note is
that the Bellman equation is one
of the key equations
in the world of reinforcement
learning and Q learning.
So if we think realistically our
surroundings do not always work
in the way we expect
there is always a bit
of stochastic City
involved in it.
So this applies
to robot as well.
Sometimes it might so happen
that the robots
Machinery got corrupted.
Sometimes the robot makes come
across some hindrance on its way
which may not be known
to it beforehand.
Right and sometimes even
if the robot knows
that it needs to take
the right turn it will not so
how do we introduce
this to cast a city
in our case now here comes
the Markov decision process
now consider the robot is
currently in the Red Room
and it needs to go
to the green room.
Now.
Let's now consider
the robot has a slight chance
of dysfunctioning
and might take the left
or the right or the bottom.
On instead updating
the upper turn in order to get
to The Green Room
from where it is now,
which is the Red Room.
Now the question is,
how do we enable the robot
to handle this when it is out
in the given environment right.
Now, this is a situation
where the decision making
regarding which turn is
to be taken is partly random
and partly another control
of the robot now partly random
because we are not sure
when exactly the robot mind
dysfunctional and partly
under the control of the robot
because it is still
Making a decision
of taking a turn right
on its own and with the help
of the program embedded into it.
So a Markov decision process
is a discrete time
stochastic Control process.
It provides a mathematical
framework for modeling
decision-making in situations
where the outcomes
are partly random
and partly under control
of the decision maker.
Now we need to give this concept
a mathematical shape most
likely an equation
which then can be taken
further now you might be Price
that we can do this
with the help
of the Bellman equation
with a few minor tweaks.
So if we have a look
at the original Bellman equation
V of X is equal to maximum
of our s comma a plus
gamma V of s stash
what needs to be changed
in the above equation
so that we can introduce
some amount of Randomness
here as long as we are not sure
when the robot might not take
the expected turn.
We are then also not sure
in which room it might end up
in which is nothing
but the room it.
Moves from its current
room at this point
according to the equation.
We are not sure of the S stash
which is the next state
or the room,
but we do know all the probable
turns the reward might take now
in order to incorporate each
of this probabilities
into the above equation.
We need to associate
a probability with each
of the turns to
quantify the robot
if it has got any experts it is
chance of taking this turn now
if we do,
so We get PS is equal to maximum
of our s comma a plus gamma
into summation of s -
PS comma a comma s stash into V
of his stash now the PS a--
and a stash is the probability
of moving from room s
to establish with the action a
and the submission
here is the expectation
of the situation that
the robot in curse,
which is the randomness now,
let's take a look
at this example here.
So when We associate
the probabilities to each
of these Stones.
We essentially mean
that there is an 80% chance
that the robot will
take the upper turn.
Now, if you put all
the required values
in our equation,
we get V of s is equal
to maximum of our of s comma a +
comma of 0.8 into V
of room up plus 0.1
into V of room down 0.03
into a room of V of from left
plus 0.03 into Vo Right now note
that the value Footprints
will not change due to the fact
that we are incorporating
stochastic Ali here.
But this time we
will not calculate
those values Footprints instead.
We will let the robot
to figure it out.
Now up until this point.
We have not considered
about rewarding the robot
for its action of going
into a particular room.
We are only watering the robot
when it gets
to the destination now,
ideally there should be a reward
for each action the robot
takes to help it better
as Assess the quality
of the actions,
but there was need
not to be always be the same
but it is much better
than having some amount
of reward for the actions
than having no rewards at all.
Right and this idea is known as
the living penalty in reality.
The reward system
can be very complex
and particularly modeling
sparse rewards is an active area
of research in the domain
of reinforcement learning.
So by now we have
got the equation
which we have a so what?
To do is now transition
to Q learning.
So this equation gives
us the value of going
to a particular State
taking the stochastic city
of the environment into account.
Now, we have also learned
very briefly about the idea
of living penalty
which deals with associating
each move of the robot
with a reward so
Q learning processes
and idea of assessing
the quality of an action
that is taken to move
to a state rather than
determining the possible
value of the state
which is being moved
to So earlier we had 0.8
into V of s 1 0.03 into V
of S 2 0 point 1 into V
of S 3 and so on now
if you incorporate the idea
of assessing the quality
of the action for moving
to a certain state
so the environment
with the agent
and the quality of the action
will look something like this.
So instead of 0.8 V
of s 1 will have q of s
1 comma a one will have q
of S 2 comma 2 You
of S3 not the robot now has
four different states to choose
from and along with that.
There are four different actions
also for the current
state it is in so
how do we calculate Q of s comma
a that is the cumulative quality
of the possible actions
the robot might take so
let's break it down.
Now from the equation V of s
equals maximum a RS comma a +
comma summation s -
PSAs stash -
into V of s -
if we discard the maximum
function we have is
of a plus gamma into summation p
and v now essentially
in the equation
that produces V
of s we are considering
all possible actions
and all possible States
from the current state
that the robot is in
and then we are taking
the maximum value caused
by taking a certain action
and the equation produces
a value footprint,
which is for just
one possible action.
In fact if we can think
of it as the quality
of the action so
Q of s comma a is equal
to RS comma a plus gamma
of summation p and v now
that we have got an equation
to quantify the quality
of a particular action.
We are going to make
a little adjustment
in the equation we can now say
that we of s is the maximum
of all the possible values
of Q of s comma a right.
So let's utilize this fact
and replace V of s
Stash as a function
of Q so q s comma
a becomes R of s comma a
+ comma of summation PSAs -
and maximum of the que es -
a - so the equation
of V is now turned
into an equation of Q,
which is the quality.
But why would we do that now?
This is done to
ease our calculations
because now we have
only one function Q,
which is also the core
of the Programming language.
We have only one function Q
to calculate an R of s comma
a is a Quantified metric
which produces reward
of moving to a certain State.
Now, the qualities
of the actions are
called The Q values
and from now on we will refer
to the value Footprints
as the Q values
an important piece
of the puzzle is
the temporal difference.
Now temporal difference
is the component
that will help the robot
calculate the Q values
which respect to the change.
Changes in the
environment over time.
So consider our robot is
currently in the mark State
and it wants to move
to the Upper State.
One thing to note that here is
that the robot already knows
the Q value of making the action
that is moving through
the Upper State and we know
that the environment
is stochastic in nature
and the reward
that the robot will get
after moving to the Upper State
might be different
from an earlier observation.
So how do we capture
this change the real difference?
We calculate the new Q as My a
with the same formula
and subtract the previous you
known qsa from it.
So this will in turn give us
the new QA now the equation
that we just derived gifts
the temporal difference
in the Q values
which further helps
to capture the random changes
in the environment
which may impose now
the new q s comma a
is updated as the following
so Q T of s comma is equal
to QT minus 1 s comma
a plus Alpha TD.
ET of a comma s now here
Alpha is the learning
rate which controls
how quickly the robot adapts
to the random changes imposed
by the environment the qts comma
is the current state q value
and a QT minus 1 s comma is
the previously recorded Q value.
So if we replace the TDS comma a
with its full form equation,
we should get Q T of s
comma is equal to QT -
1 of s comma y plus Alpha
into our of S comma
a plus gamma maximum
of q s Dash a dash minus QT
minus 1 s comma a now
that we have all the little
pieces of q line together.
Let's move forward
to its implementation part.
Now, this is the final equation
of q-learning, right?
So, let's see
how we can implement this
and obtain the best path
for any robot to take now
to implement the algorithm.
We need to
understand the warehouse.
Ian and how that can be mapped
to different states.
So let's start by reconnecting
the sample environment.
So as you can see here,
we have L1 L2 L3 to align
and as you can see here,
we have certain borders also.
So first of all,
let's map each of the above
locations in the warehouse
two numbers or the states
so that it will ease
our calculations, right?
So what I'm going to do is
create a new Python 3 file
in the jupyter notebook
and I'll name it as
learning Numb, but
okay, so let's
define the states.
But before that what we
need to do is import numpy
because we're going to use numpy
for this purpose and let's
initialize the parameters.
That is the gamma
and Alpha parameters.
So gamma is 0.75,
which is the discount Factor
whereas Alpha is 0.9,
which is the learning rate.
Now next what we're going to do
is Define the states and map
it to numbers.
So as I mentioned earlier
l 1 is Zero and online.
We have defined the states
in the numerical form.
Now.
The next step is to define
the actions which is
as mentioned above
represents the transition
to the next state.
So as you can see here,
we have an array
of actions from 0 to 8.
Now, what we're going to do
is Define the reward table.
So as you can see here
is the same Matrix
that we created just now
that I showed you just now now
if you understood it correctly,
there isn't any real
Barrel limitation
as depicted in the image,
for example, the transitional
for tell one is allowed
but the reward will be 0
to discourage that path
or in tough situation.
What we do is add
a minus 1 there
so that it gets
a negative reward.
So in the above code snippet
as you can see here,
we took each of the It's and put
once in the respective state
that are directly reachable
from the certain State.
Now.
If you refer to that reward
table, once again,
which we created the above
or reconstruction will
be easy to understand
but one thing to note here is
that we did not consider the top
priority location L6 yet.
We would also need
an inverse mapping
from the state's back
to its original location
and it will be cleaner
when we reach to the other
depths of the algorithms.
So for that what we're going
to do is Have the inverse
map location state to location.
We will take the distinct
State and location
and convert it back.
Now.
What will do is will not Define
a function get optimal
which is the get optimal route,
which will have a start location
and an N location.
Don't worry the code is back.
But I'll explain you
each and every bit of the code.
It's not the get optimal root
function will take two arguments
the starting location
in the warehouse
and the end location
in the warehouse recipe lovely
and it will return
the optimal route
for reaching the end location
from the starting location
in the form of an ordered list
containing the letters.
So we'll start by defining
the function by initializing
the Q values to be all zeros.
So as you can see here we have
Even the Q value has to be 0
but before that
what we need to do is copy
the reward Matrix to a new one.
So this the rewards
new and next again,
what we need to do is get
the ending State corresponding
to the ending location.
And with this information
automatically will set
the priority of the given ending
stay to the highest one
that we are not defining it now,
but will automatically
set the priority
of the given ending
State as nine nine nine.
So what we're going to do is
initialize the Q values to be 0
and in the Learning process
what you can see here.
We are taking I in range
1000 and we're going to pick
up a state randomly.
So we're going to use
the MP dot random randint
and for traversing
through the neighbor location
in the same maze
we're going to iterate
through the new reward
Matrix and get the actions
which are greater
than 0 and after that
what we're going to do is pick
an action randomly from the list
of the playable actions
in years to the next state
will going to compute
the temporal difference,
which is TD,
which is the rewards plus gamma
into the queue of next state
and will take n p dot ARG Max
of Q of next 8 minus Q
of the current state.
We going to then update
the Q values using
the Bellman equation
as you can see here.
We have the Bellman equation
and we're going
to update the Q values
and after that we're going
to initialize the optimal route
with a starting location
now here we do not know
what the next location yet.
So initialize it with a value
of the starting location,
which Again is
the random location.
So we do not know
about the exact number
of iteration needed to reach
to the final location.
Hence while loop will be
a good choice for the iteration.
So when you're going to fetch
the starting State fetch
the highest Q value penetrating
to the starting State
we go to the index
or the next state,
but we need
the corresponding letter.
So we're going to use that state
to location function.
We just mentioned there
and after that we're going
to update the starting location
for the The next iteration
and finally we'll
return the root.
So let's take the starting
location of n line
and and location
of L while and see what part
do we actually get?
So as you can see here we
get Airline l8l 5 L2 and L1.
And if you have a look
at the image here,
we have if we start
from L9 to L1.
We got L8 L5 L
2 l 1 l 8l v L2 L1
that would He does the maximum
value of the maximum reward
for the robot.
So now we have come to the end
of this Q learning session
and I hope you got to know
what exactly is Q learning
with the analogy
all the way starting
from the number of rooms
and I hope the example
which I took the analogy
which I took was good enough
for you to understand q-learning
understand the Bellman equation
how to make quick changes
to the Bellman equation
and how to create
the reward table the cue.
Will and how to update
the Q values using
the Bellman equation,
what does alpha do
what does karma do?
