• PCA, or Principal Component Analysis, is a method that converts data from a higher-dimensional space to a lower-dimensional one while preserving the essential characteristics of the data.
  • This algorithm operates in an unsupervised manner.
  • PCA identifies the most optimal lower-dimensional representation of the data.
  • Recognized as a feature extraction technique, PCA is proficient at capturing key patterns within datasets.

It’s crucial to acknowledge that PCA’s effectiveness is influenced by the data’s scale, and it is advisable to standardize the data before conducting PCA.

Benefits of PCA:

  1. Improved Algorithm Execution Speed: PCA facilitates faster execution of algorithms, especially in scenarios where the original data’s dimensionality is high. By transforming the data into a lower-dimensional space, computational efficiency is enhanced.
  2. Visualization: PCA is a powerful tool for data visualization. It reduces complex datasets to a few principal components, allowing for easier visualization and interpretation of patterns, trends, and relationships within the data.

These advantages make PCA a valuable technique in various fields, ranging from machine learning to exploratory data analysis.

Geometric Intuition of PCA:

To grasp the geometric intuition of Principal Component Analysis (PCA), it’s essential to understand several key concepts: feature selection, variance, covariance, covariance matrix, and the Eigen decomposition of the covariance matrix.

  1. Feature Selection: This process involves choosing a subset of relevant features from a larger set. PCA inherently performs feature selection by identifying the most informative components.
  2. Variance: In statistics, variance measures the spread or dispersion of a set of values. In PCA, maximizing variance is a crucial objective, as it helps retain essential information during dimensionality reduction.
  3. Covariance: Covariance indicates how two variables change together. In the context of PCA, understanding the relationships between different features in terms of covariance is fundamental.
  4. Covariance Matrix: The covariance matrix summarizes the covariances between all pairs of features in a dataset. PCA operates by computing and analyzing this covariance matrix.
  5. Eigen Decomposition of Covariance Matrix: PCA involves finding the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the directions of maximum variance, while eigenvalues indicate the magnitude of this variance.

Now, let’s delve into the geometric intuition of PCA. Imagine the dataset as a cloud of points in a high-dimensional space, where each dimension corresponds to a feature. PCA seeks to identify the directions (principal components) along which the data varies the most. These directions are the eigenvectors of the covariance matrix.

The first principal component corresponds to the direction of maximum variance. Subsequent components capture orthogonal directions of decreasing variance. By projecting the data onto these principal components, PCA effectively transforms the dataset into a lower-dimensional space while preserving as much variance as possible.

In essence, PCA simplifies the dataset’s geometry by focusing on the directions of maximum variability, providing a more compact representation that retains essential information.

Feature Selection

Feature selection is the process of reducing the number of input variables when developing a predictive model.

In simple terms, selection of features.

Feature Selection:

In the context of the given diagram, featuring two columns, labeled as x and y, the objective is to choose the most informative feature for analysis. The selection process involves plotting the data in a 2D or 3D space and projecting it onto both the x and y axes.

The decision-making criterion is based on the spread of the data along each axis. By visually inspecting the projection, it becomes apparent whether one feature exhibits a more significant spread than the other. In mathematical terms, if the spread along the x-axis (denoted as d) is greater than the spread along the y-axis (denoted as d’), we opt to select the x feature.

Mathematically, this is represented as:

[ d > d’ ]

Essentially, the chosen feature is the one that demonstrates a larger spread when visualized in the chosen dimensional space. This method of feature selection allows us to focus on the feature that carries more variability, aiding in capturing essential patterns and information within the dataset.


The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is about the mean.

– In simple terms, it is the spread of your data set.

we can take modulus instead of variance but the modulus function is not differentiable. we differential so we can optimize


But we got a problem in Feature selection when we got the x-axis and y-axis had the same number of the spread of data.

In scenarios where the spread of data is equal, with (d) representing the spread along the x-axis and (d’) representing the spread along the y-axis:

Now, the question arises: which column should you select? In such situations of uncertainty, Feature Extraction comes to the rescue, specifically using Principal Component Analysis (PCA).

we rotate the axis to solve these kinds of problems.

Why Variance is Important?

The concept of Explained variance is useful in assessing how important each component is. In general, the larger the variance explained by a principal component, the more important that component is. PCA is a technique used to reduce the dimensionality of data.

The amount of variance explained by each direction is called the “explained variance.” Explained variance can be used to choose the number of dimensions to keep in a reduced dataset.

Covariance and Covariance Matrix


(i) (x – – x- – – x) — well, this might suggest the mean, but the mean alone cannot convey information about the spread of data. This is where Variance plays a crucial role.

(ii) (x- – – – – – x – – – – – – – – x) — In this case, Variance clarifies that figure (ii) exhibits a greater spread of data compared to figure (i).

Although the variances may be the same, they are not. Therefore, we seek another quantity that can describe the relationship between x and y, and that is Covariance.

Covariance involves multiplying each corresponding pair of points and then summing them up. The result is then divided by the number of data points.

Covariance Matrix

The covariance matrix defines both the spread(variance) and orientation(covariance) of data.

Eigendecomposition of a covariance matrix

eigen matrix whose direction is not changed after applying the linear transformation.

Eigenvector -> Linear Transformer -> no change in direction

Eigenvalues -> change in magnitude in Eigen matrix after Linear Transformer
Eigen Vector

AV(vector) = lambda * V(Vector)

where, A-> Matrix, Lambda -> Eigenvalue

The Largest eigenvector of the covariance matrix always points in the direction of the Largest variance of the data.

Step by Step for PCA

1. mean centering(standardizer)

2. Find the covariance matrix

3. Find the eigenvalue/vector

How do Transform points?

Points are transformed using the following steps in PCA:

  1. Mean Centering: Subtract the mean of each dimension from the corresponding data points. This ensures that the transformed points are centred around the origin.
  2. Multiply by Eigenvectors: Multiply the mean-centred data points by the eigenvectors obtained from the eigendecomposition of the covariance matrix. This step projects the data onto a new set of axes defined by the eigenvectors.

The transformed points represent the original data in a new coordinate system where the dimensions are aligned with the directions of maximum variance. This helps in reducing the dimensionality of the data while preserving its essential features.

This Python code demonstrates the application of Principal Component Analysis (PCA) on a synthetic dataset using the NumPy, Pandas, and Plotly libraries. Here is a step-by-step explanation of the code:

Step 1: Generate Synthetic Dataset

  • Two classes are created, each with three features (‘feature1’, ‘feature2’, ‘feature3’).
import numpy as np
import pandas as pd


mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20)

df = pd.DataFrame(class1_sample,columns=['feature1','feature2','feature3'])
df['target'] = 1

mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20)

df1 = pd.DataFrame(class2_sample,columns=['feature1','feature2','feature3'])

df1['target'] = 0

df = df.append(df1,ignore_index=True)

df = df.sample(40)
  • The dataset is combined into a Pandas DataFrame and visualized using a 3D scatter plot.

import plotly.express as px
#y_train_trf = y_train.astype(str)
fig = px.scatter_3d(df, x=df['feature1'], y=df['feature2'], z=df['feature3'],


Step 2: Standardize the Data

  • Standard scaling is applied to the features to ensure they have zero mean and unit variance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df.iloc[:, 0:3] = scaler.fit_transform(df.iloc[:, 0:3])

Step 3: Calculate Covariance Matrix

  • The covariance matrix is computed for the standardized features.
covariance_matrix = np.cov([df.iloc[:, 0], df.iloc[:, 1], df.iloc[:, 2]])
print('Covariance Matrix:\n', covariance_matrix)

Step 4: Perform Eigendecomposition

  • Eigenvalues and eigenvectors are obtained from the covariance matrix.
eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)
print('Eigenvalues:\n', eigen_values)
print('Eigenvectors:\n', eigen_vectors)

Step 5: Visualization of Eigenvectors

  • The eigenvectors are visualized in a 3D plot along with the original dataset.
fig = plt.figure(figsize=(7, 7))
ax = fig.add_subplot(111, projection='3d')
# Visualization code for eigenvectors and data points

Step 6: Project Data onto Principal Components

  • The data is projected onto the first two principal components.
pc = eigen_vectors[0:2]
transformed_df = np.dot(df.iloc[:, 0:3], pc.T)
new_df = pd.DataFrame(transformed_df, columns=['PC1', 'PC2'])
new_df['target'] = df['target'].values

Step 7: Visualize Transformed Data

  • The transformed data is visualized in a 2D scatter plot.
new_df['target'] = new_df['target'].astype('str')
fig = px.scatter(x=new_df['PC1'], y=new_df['PC2'], color=new_df['target'], color_discrete_sequence=px.colors.qualitative.G10)
fig.update_traces(marker=dict(size=12, line=dict(width=2, color='DarkSlateGrey')), selector=dict(mode='markers'))

This code showcases the key steps of PCA, including standardization, covariance matrix computation, eigendecomposition, and data transformation for dimensionality reduction. The resulting 2D scatter plot illustrates the separation of classes along the principal components.

Leave a Reply

Your email address will not be published. Required fields are marked *