Hi guys, In the last decision tree video that
I made the splitting criteria was based on
entropy, so I thought of making a new video
where in the splitting criteria is Gini Index.
 
Here is the formula for genie index considering two
samples such as positive and negative the
the value of Gini Index at that particular
node if we make a split is 1 minus the square
of probability of positive samples in that
node minus the square of probability of negative
samples based on the individual Gini Indexes
that we calculate we then further go on to
calculate the reduction in impurity when we
make a split based on Gini Index.
I'll be considering the same example as that of the last time.
I have 14 samples four different factors which
contribute to me playing tennis or not.
 
Considering the sample considering the factor
wind there are two use cases when there is
a wind blowing and when there is no wind blowing
When there is wind blowing, I have 6 cases
3 times I play tennis and
3 times, I don't play tennis
and when there is no wind
blowing I have eight cases out of the 14 cases
six times out of which I play tennis and two
times.
I don't play tennis now as I have already
mentioned the the Gini Index at that particular
node is one minus the square of positive samples
in a node minus the square of probability
of negative sample.In that node so probability
add if I consider this node the the Gini Index
here is one minus three positive samples by
the total number of samples is six so three
by six the whole square minus again three
negative samples by the total number of samples
is six so three by six the whole square which
is point five similarly for this node.I'll
have one minus six positive samples by the
total number of samples, which is eight so
six by eight the whole square minus two negative
samples by total number of samples, which
is eight that is.
Two by eight the whole square calculating
this I get point five and point three seven
five as individual Gini Indexes of these two
nodes now.I want to calculate the reduction
in impurity so the reduction in impurity is
six by fourteen which is six samples here
upon the total number of samples here which
is fourteen into the Gini Index at that node
which is point five plus eight by fourteen
eight samples here by total number of samples
here are fourteen so eight by fourteen into
point three seven five, which is the Gini Index
that I have calculated here, so the net reduction
in impurity when we split based on wind is
point four to eight six.
Similarly 
if you do it for humidity, you will get something
of this sort and your net reduction in impurity
when we split based on humidity is
0.3674
Similarly when you split based on temperature
you.
Get a Gini index.
As 0.4405 or the net reduction
in your impurity when we split based on temperature
is 0.4405.
Similarly for
outlook, which was our main splitting criteria
in entropy gives
0.3429
which is not the maximum in this case.
So after
calculating all the attributes we get gene
indexes of outlook as point three four, two
nine temperature as point four four zero five
humidity is point three six seven four and
wind as point four two eight six now considering
the first split will be considering
gain of temperature is maximum so will be
considering temperature as a feature for our
first split.I hope you understood how a Gini Index functions hope you all enjoyed the video
for more such information videos do subscribe
to my channel, thanks for watching.
