Hello!
Welcome to Dimensionality Reduction using feature extraction and feature selection
Dimensionality Reduction is the process of
reducing the number of variables/features
in review.
Dimensionality Reduction can be divided into
two subcategories called Feature Selection
which includes Wrappers, Filters, and Embedded.
And Feature Extraction which includes Principle
Components Analysis.
So how exactly does Dimensionality Reduction
Improve Performance?
It does so by reducing the number of features
that are to be considered.
To see how this works, think of a simple algebraic
equation.
a + b + c + d = e.
If you can equate ab = a + b, making a representation
of two variables into one, you're using Feature
Extraction to reduce the number of variables.
Now, consider if c was equal to 0 or an arbitrarily
small number, it wouldn't really be relevant,
therefore it could be taken out of the equation.
By doing so, you'd be using Feature Selection
because you'd be selecting only the relevant
variables and leaving out the irrelevant ones.
Feature Selection is the process of selecting
a subset of relevant features or variables.
There are 3 main subset types:
* Wrappers,
* Filters, and
* Embedded.
To help you visualize how feature selection
works, imagine a set of variables let's use
a series of shapes, for example -- with each
shape representing different dimensions or
features.
By ignoring the irrelevant variables, or selecting
the ones that improve accuracy, we reduce
the amount of strain on the system and produce
better results.
Wrappers use a predictive model that scores
feature subsets based on the error-rate of
the model.
While they're computationally intensive, they
usually produce the best selection of features.
A popular technique is called stepwise regression.
It's an algorithm that adds the best feature,
or deletes the worst feature at each iteration.
Filters use a proxy measure which is less
computationally intensive but slightly less
accurate.
So it might have a good prediction, but it
still may not be the best
Filters do capture the practicality of the
dataset but, in comparison to error measurement,
the feature set that's selected will be more
general than if a Wrapper was used.
An interesting fact about filters is that
they produce a feature set that don't contain
assumptions based on the predictive model,
making it a useful tool for exposing relationships
between features, such as which variables
are 'Bad' together and, as a result, drop
the accuracy or 'Good' together and therefore
raise the accuracy.
Embedded algorithms learn about which features
best contribute to an accurate model during
the model building process.
The most common type of is called a regularization
model.
In our shape example, it would be similar
to picking the shapes or good features in
each step of the model building process.
It might be Picking the Triangle feature in
Step One, Picking the Cross feature in Step
Two.
Or Picking the Lightning feature in Step Three
to obtain our accurate model.
Feature Extraction is the process of transforming
or projecting a space composing of many dimensions
into a space of fewer dimensions.
This is similar to representing data in multiple
dimensions to ones that are less.
This is useful for when you need to keep your
information but want to reduce the resources
that it may consume during processing.
The main linear technique is called Principle
Components Analysis.
There are other linear and non-linear techniques
but reviewing them here is out of scope for
this course.
Principle Components Analysis is the reduction
of higher vector spaces to lower orders through
projection.
It can be used to visualize the dataset through
compact representation and compression of
dimensions.
An easy representation of this would be the
projection from a 3-dimensional plane to a
2-dimensional one.
A plane is first found which captures most
(if not all) of the information.
Then the data is projected onto new axes and
a reduction in dimensions occur.
When the projection of components happens,
new axes are created to describe the relationship.
This is called the principle axes, and the
new data is called principle components.
This becomes a more compact visualization
for the data and thus, is easier to work with.
Thanks for watching!
