VOOZH about

URL: https://www.geeksforgeeks.org/data-analysis/exploring-correlation-in-python/

⇱ Exploring Correlation in Python - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Exploring Correlation in Python

Last Updated : 6 Feb, 2026

Correlation is one of the most commonly used statistical measures to understand how variables are related to each other. In Python, correlation helps identify whether two variables move together, move in opposite directions or have no relationship at all.

  • Helps understand data relationships.
  • Useful in feature selection for ML models.
  • Detects multicollinearity.
  • Supports better decision-making.
👁 pearson_correlation_coefficient
Correlation

Correlation measures the strength and direction of the relationship between two numerical variables. Value ranges from -1 to +1

  • +1: Perfect positive relationship. That means both variables increase or decrease together. Example: Height and weight
  • -1: Perfect negative relationship meaning one variable increases while the other decreases. Example: Price and demand
  • 0: No relationship or no correlation means no visible relationship between variables. Example: Shoe size and exam marks

Common Correlation Methods in Python

1. Pearson Correlation

Pearson Correlation measures linear relationship between two continuous variables.

  • Range: -1 to +1
  • Assumes normally distributed data

2. Spearman Correlation

Spearman Correlation measures monotonic relationship using ranks.

  • Works with non-linear data
  • Suitable for ordinal data

3. Kendall Correlation

Kendall Correlation measures rank consistency between variables.

  • More robust for small datasets

Correlation Using Python

Python provides built-in tools through pandas and visualization libraries to compute and analyze correlation efficiently. Understanding correlation helps build better models and gain deeper insights from data.

1. Sample Dataset

Here we will create a sample dataset and use it using pandas dataframe. We will use seaborn and matplotlib to visualize the relationship.

Output:

👁 Screenshot-2026-02-05-120733
Data

2. Pearson Correlation

  • Computes pairwise correlation between columns
  • Default method is Pearson
  • Higher values indicate stronger correlation

Output:

👁 Screenshot-2026-02-05-120855
Pearson Correlation

The above output shows that the relationship between maths, science and english.

3. Spearman Correlation

  • Converts values to ranks before correlation
  • Suitable for non-linear but monotonic relationships
  • Useful when data is not normally distributed

Output: 

👁 Screenshot-2026-02-05-123518
Spearman Correlation

4. Kendall Correlation

  • Measures agreement between rankings
  • Works well for small datasets

Output: 

👁 Screenshot-2026-02-05-123734
Kendall Correlation

5. Correlation Between Two Columns

  • Returns correlation value between two specific columns
  • Heatmap gives a visual understanding of relationships
  • Darker color indicates stronger correlation

Output: 

👁 Screenshot-2026-02-05-124740
Correlation Between Two Columns

Interpreting Correlation Values

Correlation ValueMeaning
0.8 to 1.0Strong positive
0.5 to 0.8Moderate positive
0.0 to 0.5Weak positive
0No correlation
-0.5 to 0Weak negative
-0.8 to -0.5Moderate negative
-1.0 to -0.8Strong negative

Limitations of Correlation

  • Only measures association
  • Sensitive to outliers

Applications of Correlation

  • Feature selection in machine learning
  • Financial market analysis
  • Medical research
  • Recommendation systems
Comment