Hi, this is Sumit.
In this video we will learn the mathematical
approach of PCA, which is called principal components analysis.
First question is, why should I use PCA?
When we have large data sets that is many
variables.
Then information is lost due to the dropped
off many variables or features.
Second thing is, when many variables, or features
are present, we cannot easily plot the data
in its raw format.
The third point is we get difficulty in understanding
the trends present between the groups.
Now we will understand what is PCA.
PCA is a unsupervised statistical technique.
Unsupervised model means all data is unlabeled.
That means, we do not define the classes
for the samples.
that is used to reduce the dimensionality of the data.
This method is used to see the relationship
among the large number of variables
and to explain these variables in terms of small
number of variables, they are called principal
components.
Let us take a look at this figure.
PCA creates a visualization of the data.
The first component PC1 represent the maximum
variance direction in the data, while the
PC2 that is principal component 2 is showing the second largest source of variation in
the data.
This PC2 component is orthogonal to the PC1.
Now, how this PCA is important.
PCA allows you to see which samples, or groups
are similar to one another.
It is used for identifying the outliers.
This PCA score plot of pcarp, seed and skin
part of bitter melon with PC component 1
explaining 70.1 percentage of the variation
and component 2 explaining 16.6
of the variation showing a clear separation
between these three groups.
Let us have a look an example to see how
this PCA works.
So this data set has two dimensions, X1 and X2 with 10 samples.
Now we are going to use a reduced data set
to simplify the calculations.
First step is subtract the mean from each
of the data dimensions.
All the X1 values have X1 bar, (which is the
average of the X1 values of all the data points)
subtracted.
Similarly, for X2 values have X2 bar subtracted
from them.
Now the second step is compute the covariance
matrix. The covariance matrix for this two
dimensional data can be expressed as: covariance
matrix is equal to the variance of X1, covariance
of X1 X2, covariance of X1 X2 and variance
of X2, where variance of X1, and covariance
of X1 X2 are given by these formulas.
Let us make all calculations.
We get the covariance matrix.
Now, third step is compute the eigenvalues
of covariance matrix.
Simply, we can use this equation: determinant
of A minus lambda I is equal to zero, where
A is our covariance matrix, lambda is Eigen
value, and I is the identity matrix.
So the Eigen values are lambda one, which
is equal to 0.3425, and lambda two is equal
to 0.033.
Next step is compute the Eigen vectors
of the covariance matrix.
Here, X is Eigen vector.
Now we can solve for eigenvalues, lambda one
and lambda two.
Then we got the eigenvectors corresponding
to these two Eigen values.
This is our final eigenvector.
The final step is compute the principal components.
For this, we do the matrix multiplications.
First matrix is created when X1 bar and X2
bar are subtracted from X1 and X2
and second matrix is our Eigen vectors.
After multiplication, we can see principal
component one PC1, give the maximum variance in comparison to principal component 2.
From first column we can see this principal
component one, giving maximum variance.
We have already calculated the eigenvalues
and eigenvectors.
This table shows us the maximum Eigen value
correspond to the first principal component
one.
That is, it shows the maximum variance, which is 91.21%, and for second eigenvalue PC2
has 8.79% variance along the
line.
Limitations of PCA model are as follows. No guarantee about the different classes can be separated in the PCA model.
Sample to sample variations can occur.
If there are less samples than dimensions,
the PCA won't work.
So, finally we can use some supervised models
like PLS-DA, or OPLS-DA model.
This supervised models means, we labeled all
the data set.
If this is your first time watching my videos
and you are not yet subscribed my channel
kindly subscribe, and turn on the notification
icon.
Thank you.
