Recovering Feature Names of explained_variance_ratio_ in PCA with sklearn

Last Updated : 23 Jul, 2025

Principal Component Analysis (PCA) is a powerful technique used in data science for dimensionality reduction, feature extraction, and data visualization. One of the key outputs of PCA is the explained_variance_ratio_, which indicates the proportion of the dataset's variance that each principal component accounts for. However, understanding which original features contribute to these principal components can be crucial for interpreting the results. This article will guide you through the process of recovering feature names associated with the explained_variance_ratio_ in PCA using Scikit-Learn.

Table of Content

Introduction to Principal Component Analysis

PCA is an unsupervised learning algorithm that transforms the original features into a new set of features called principal components. These components are orthogonal to each other and are ordered by the amount of variance they explain in the data. The primary goals of PCA include:

Dimensionality Reduction: Reducing the number of features while retaining most of the variance.
Feature Extraction: Identifying new features that capture the essential information.
Data Visualization: Simplifying the visualization of high-dimensional data.

Understanding Explained Variance Ratio

Explained variance ratio is a measure of the percentage of the total variance in a given dataset for each principal component. The explained variance ratio of a principal component is measured as the ratio of its eigenvalue to the sum of the eigenvalues of all the principal components.

Using the explained_variance_ratio attribute (from Sklearn PCA), one can access the explained variance ratio for each principal component. The code is as follows:

The steps for the above code as follows:

Create synthetic classification data with 10 features.
Apply PCA with n_components as 5.
Access the explained variance ratio using the explained_variance_ratio attribute from Sklearn PCA.
Print the explained variance ratio for each principal component.

Output:

PC1: 0.31 
PC2: 0.15 
PC3: 0.11 
PC4: 0.09 
PC5: 0.09

Visualization Plot for Explained Variance Ratio

Let's create a visualisation plot for explained variance ratio for each principal component.

Output:

👁 explained_var_ratio_plot

Explained variance ratio for each principal component

The above code plots the chart of variance proportions for each principal components. This is useful especially when trying to identify how much variation is explained by each of the principal components as well as when seeking to visualize the components within the data samples.

Recovering Feature Names in PCA

PCA is an orthogonal linear transformation that transforms the data into a new coordinate system. So, the features retrieved from PCA are not the original features. Information about the original feature is available in the pca attribute: components. To understand which original features contribute to the principal components, we need to examine the pca.components_ attribute.

Let's follow the below steps to recover feature names.

Step 1: Retraining and Applying PCA

First of all, to balance them, you need to fit a PCA model to your data.

Step 2: Access Component Weights

After applying PCA to the given data and finding the values of the first few principal components, it is possible to get the component weights which give the information about contribution of each original feature to the principal components.

Output:

Component Weights: 
[[ 0.57735027 0.57735027 0.57735027] 
 [-0.81649658 0.40824829 0.40824829]]

Step 3: Create Mapping Between Feature Names and Component Weights

Construct a list where each element is a mapping of the initial feature name and its corresponding weight for each of the principal components.

Step 4: Display Feature Contributions

They should show the name of each feature and its corresponding weight meaning how much it contributes to each of the principal component.

Output:

Feature names contributing to Component 1: 
Feature1: 0.577350269189626 
Feature2: 0.5773502691896255 
Feature3: 0.5773502691896255 
Feature names contributing to Component 2: 
Feature1: -0.8164965809277258 
Feature2: 0.40824829046386313 
Feature3: 0.40824829046386313

Example: Recovering Feature Names After PCA with Scikit-Learn

Here's an example code snippet demonstrating how to recover feature names after performing PCA using scikit-learn. Here we make use of iris dataset. It consists of 3 different types of irises (Setosa, Versicolour, and Virginica) and has 4 features: sepal length, sepal width, petal length, and petal width. The code is as follows:

Output:

Feature names contributing to Principal Components: 
Component 1: [('petal length (cm)', 0.8566706059498349), ('sepal length (cm)', 0.36138659178536864), ('petal width (cm)', 0.3582891971515507), ('sepal width (cm)', -0.08452251406456868)] 
Component 2: [('sepal width (cm)', 0.7301614347850266), ('sepal length (cm)', 0.6565887712868423), ('petal length (cm)', -0.17337266279585706), ('petal width (cm)', -0.07548101991746337)] 
Component 3: [('sepal width (cm)', 0.5979108301000855), ('sepal length (cm)', -0.5820298513060651), ('petal width (cm)', 0.5458314320200757), ('petal length (cm)', 0.0762360758209632)]

In the above code, n_components is set to 3. Hence, PCA will create 3 new features that are a linear combination of the 4 original features. Following the PCA data analysis and fitting the model, the next step is to get hold of the component weights by using the command: pca.components_. Finally, for each principal components, we generate the mapping that assigns names to the matrix columns and weights to the features. Last for all, we want to know the features associated to the principal components so we print them out.

Visualizing - Explained Variance ratio and Cumulative Explained Variance Ratio

Let's plot the explained variance ratio and cumulative explained variance for each principal component.

Output:

👁 iris_explained_variance_ratio

Explained Variance Ratio and Cumulative for each principal component

Here we get the explained variance ratio from the pca object using the explained_variance_ratio attribute and calculate the cumulative sum of the explained variance ratio using the cumsum() method provided by Numpy. Finally, we plot the explained variance ratio and cumulative explained variance ratio for each principal component using Matplotlib.

Practical Applications

Understanding the feature contributions to principal components can be useful in various scenarios:

Feature Selection: Identifying the most important features for model building.
Data Interpretation: Gaining insights into the underlying structure of the data.
Dimensionality Reduction: Reducing the number of features while retaining essential information.

Conclusion

Recovering the feature names associated with the explained_variance_ratio_ in PCA is a crucial step in interpreting the results of PCA. By examining the pca.components_ attribute, we can understand the contribution of each original feature to the principal components. This process enhances our ability to make informed decisions based on the PCA results.

Comment

Article Tags:

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/recovering-feature-names-of-explainedvarianceratio-in-pca-with-sklearn/