>> This video is going to be
about the correlation
coefficient,
the small r. Now
let's do an example
of correlation coefficient.
An agricultural research
organization tested a particular
chemical fertilizer to try to
find out whether an increase
in the amount of
fertilizer used would lead
to a corresponding
increase in the food supply.
Let's look at the data;
fertilizer in pounds and bushels
of beans, and that's the data.
Now we need to think about who's
going to be x and who's going
to be y. In this case the
fertilizer is going to be x
because that's independent on
how many beans will be produced.
So this will be x and this will
be y. Y is called the dependent
variable because it
depends on the fertilizer.
Okay let's continue.
And what we want to do is we
want to write this instead
of horizontal we want
to write it vertically,
and now that we know which one's
x and which one's y we're going
to write the data vertically.
Now for the correlation
coefficient we're going
to need more variables; we're
going to need x times y,
we're going to need x
square, and also y square.
We're going to n-- we're
going to need those as well.
So let's write down
the data here,
put in the vertical setup here.
Alright, now let's multiply
x times y. So 2 times 4 is 8,
1 times 3 is 3, 3 times 4 is 12,
2 times 3 is 6, 4 times 6 is 24,
and 5 times 5 is 25,
and 3 times 5 is 15.
So we filled that column up.
Let's also now do x squared.
Make sure you get
the x variable.
So 2 square is 4, 1 square is
1, 3 square is 9, 2 square is 4,
4 square is 16, 5 square
is 25, and 3 square is 9.
Alright now let's do
the y square column.
Don't forget we need to utilize
the y column, so let's do that.
So 4 square is 16, 3
square is 9, 4 square is 16,
3 square is 9, 6 square
is 36, 5 square is 25,
and 5 square again is 25.
Great, so let's complete
the table by writing
down some vertical lines here
so I can see the columns better
and let's find the sums; we'll
add column xy, xy, x square
and y square so we
can find the sums.
So the sum of x, we add
that column, gives you 20.
The sum of y, we call--
we do that, we add it
all up, gives you 30.
The sum of xy, and when
we sum that we get 93.
The sum of x square we
sum that, we get 68.
And the sum of y square,
we add that up, we get 136.
Okay now let's write
down the formula for r,
the correlation coefficient.
r is n times the sum of x
times y; take away the sum
of x times the sum of y.
All that we're going to put--
we're going to divide that
and the denominator will be
n times the sum of x square,
take away the sum of x,
and when you have the sum
of x then square it all this
is going to be one operation,
we're going to put brackets on
it, and then another bracket;
n times the sum of y square--
y squared, and I'm
going to subtract that.
I'm going to take the sum of
y and when we find the sum
of y we're going to
square it, and we're going
to close the bracket,
and we want to take
the square root of it.
Now here is an important
point to--
to realize is that
a couple of things.
First of all, the best way to do
this problem is to do the math.
So best way to do, you
know, after you substitute,
multiply these two out,
multiply these two out,
and here's a subtraction,
right, write it out.
Another thing you'll
notice is the--
that there's a bracket here and
then there's a bracket there.
So it's best after you
place the numbers in here
when you substitute, you
know, multiply, square,
and then subtract, very
important to do the stepwise.
Also don't forget to take
the square root in the end
that sometimes we forget.
Now let's substitute the values.
n is seven, that's how
many data points we have.
The sum of xy, where's that at?
It's right here, it's 93, right.
And then we're going
to have a subtraction,
the sum of x times the sum
of y, and those you can find,
right here's the sum of
x, here's the sum of y,
making sure also that we don't
forget the subtraction sign,
right, and we keep the
parenthesis, right;
very important so
we can remember
that we need to multiply.
Alright we're going
to divide that,
we're going to keep
substituting,
and the denominator we have n,
right we have the bracket n,
which is seven, the sum of x
square is 68, take away the sum
of x. We look in the
column, which is 20,
and then we square that, and
then we close the bracket;
very important to keep
everything intact.
Then the new bracket
and a seven,
the sum of y square is
136, and subtract that
and we find the sum of
y and then we square it
and we close the bracket;
don't forget the square root.
Now the next step is going
to be to start multiplying
and simplifying this
a little bit.
So 7 times 93 is
going to give you 651;
don't forget the subtraction.
20 times 30 is going to be 600,
divided by 7 times
68 will give you 476.
Subtract 20 square
is going to be 400,
and then make sure
you put the brackets.
Then the next bracket, 7
times 136 will give you 9--
952, and subtract that from
30 square which is 900.
Close the bracket; don't
forget the square root.
So the next step will be 651
take away 600; that's going
to give you 51, divided by 476
take away 400; that's going
to give you 76; don't
forget the bracket.
952 take away 900; that's
going to give you 52.
Close the bracket; don't'
forget the square root.
Alright let's rewrite; 51
divided by 76 times 52 is going
to give you 3952; don't
forget the square root.
Then again 51 divided by the
square root of 300-- 3952.
That's going to be 62.8649.
And then we do the division and
give you 8.11, which is a very--
is a very strong correlation,
a strong relationship
between x and y.
