Another very common one, that
I'm sure Ron in particular
is very familiar with, as a
statistician, is correlation.
So correlation
measures, essentially,
the linear relationship
between the objects.
It tells us if object
p and q move together,
is kind of the way
to think about it.
So what we do with this
is we standardize each
of the objects' attributes.
And then we take
their dot product.
And it gives us a value
between 1 and negative 1--
so it's not exactly a standard
similarity measurement--
that we can square it and then
it becomes between 0 and 1
and becomes a standard
similarity measurement.
That's sometimes called the
coefficient of determination.
Sorry.
R is the coefficient
of determination.
R squared is the correlation.
I don't remember my statistics
classes well enough.
I apologize.
The two tend to get used in data
science very interchangeably.
So here, for those
of you who haven't
had that much statistics
or who don't remember,
is a visual example
of our correlations.
So when correlation
is negative 1,
which is the lowest
possible value,
we have a very
linear relationship.
As one object goes up,
the other comes down,
whatever up and down happen
to mean in this context.
And with a correlation
of 1, we have
the objects are going up
together or coming down
together.
And as we get to correlations
that are closer to 0,
we can see that
this data clearly
has very little relationship.
Whereas if we get closer
to 1 and negative 1,
we see a sharper and
sharper linear relationship
between the two.
Correlation is
one of the metrics
that we use to evaluate
regression models.
So we'll talk about it
more in that context.
But I just wanted to make
sure we introduced it
so people had heard
the word if you
haven't had much of a
statistics background,
or it's been a while.
