All right, so within
these two sort
of big categories
of attributes, we
have some subsets that are
also important to think about.
And one of the most
important of these
is the distinction between
categorical attributes
and non-categorical attributes.
So categorical attributes
are discrete attributes
that specifically have
a finite set of values
that they are allowed to take.
So for instance, so there's
several examples here.
And within categorical,
there are two useful subsets.
So categorical values
are any attribute,
categorical attributes
are any attribute
that have only a
finite set of values.
If that finite set of values
has a natural ordering,
so this is something like
rankings or grades or clothing
sizes, we call that
an ordinal attribute.
So ordinal means that it has an
order, pretty straightforward
linguistics there.
And ordinal attributes
are nice, because we
can code them as
integers and maintain
the ordering between them.
So we can, we don't know how
to treat them particularly
specially, but most
categorical variables
are what we call nominal
categorical variables
or attributes.
So nominal attributes have
no inherent ordering to them.
So I color zip codes,
ID numbers, hair color,
whether someone is married
or not, or divorced,
or living with a partner.
There's no way you
can say oh yes, blue
should have a value of 5, and
green should have a value of 2
because I don't like green eyes.
There's no ordering that you
can put into those variables.
So nominal attributes in
particular we have to handle,
we kind of have to be
careful about handling.
Other useful types to
think about in terms
of things that allow
us, variable types that
allow us to treat them specially
in ways that are useful,
that are easier.
On the continuous side are
interval and ratio variables.
You can certainly have intervals
or ratios that are discrete,
but for the most part, you see
them as real, or as continuous.
Interval variables
are a variable
where the measurement is
a measurement, basically,
where the difference
between two values
is constant and meaningful.
So for instance, with
temperature, say,
temperature in Celsius, a
temperature of 100 degrees
and a temperature of 90 degrees
have the same difference
in heat between them
as a heat of 80 degrees
and a heat of 90 degrees.
So interval variables are
basically continuous variables
that have a nice metric
we can assign them
that gives us some
nice handling.
Something like the decibel
scale, on the other hand,
is much harder to
handle as an interval,
because the decibel
scale, if you're
thinking about the actual
intensity of the sound,
it's a logarithmic scale.
So the difference between three
decibels and four decibels
is smaller than the difference
between 13 and 14 decibels.
So that's an example of a
continuous variable that
isn't an interval variable.
