And in addition to doing
something very complicated,
like a Fourier
transform, you can
take a lot more similar,
a lot more straightforward
transformations of your data.
So very common
transformations are
taking the
exponential of a data,
taking the logarithm
of a data value,
taking the absolute
value of a data value.
All of these allow us to--
all of these types
of things allow us to
very nicely to try to bring
out different dependencies
in our data, to try to correlate
our data attributes better
with whatever our target is.
The other two things
here I'm going
to take special
time to talk about
because they show up a lot.
So standardization
and normalization
are probably the
most common kinds
of transformations that are
applied to data, to attributes,
in data science.
Standardization is where
we take our numeric data
and we divide the numeric
data, each numeric value,
by the mean--
Sorry.
We subtract the mean and divide
by the standard deviation
of our dataset.
So what this does is it forces
our data to have a mean of 0
and a standard deviation of 1.
So that's why it's
standardization.
The reason why we do this
is that a lot of times
is that it's a way of
scaling our data down.
If you have, for instance,
age and annual income,
there are a lot of different--
really the majority
of model of algorithms
will overweight
your data science
or will overweight your
annual incomes, so if you
have age and annual income.
But if we standardize
both of those,
then age and annual
income are going
to be weighted in
exactly the same way.
A somewhat less extreme
version to do the same thing
is normalization where
we simply subtract
the minimum from
every data value
and then divide by the maximum.
And that maps the entire data
onto the range from 0 to 1.
It distorts the separation
between the values
to a certain extent.
But it does scale it
very nicely so that age--
again taking the age versus
annual income distinction--
age and annual income will end
up on the same 0 to 1 scale.
They'll be weighted the
same way by our algorithms.
