DataFrame.corr() method in Pandas is used to calculate the correlation between numeric columns in a DataFrame. Correlation shows how strongly two columns are related. The result is returned as a new DataFrame called a correlation matrix, where each value ranges from -1 to 1.
- 1: perfect positive correlation
- -1: perfect negative correlation
- 0: no correlation
Non-numeric columns are ignored automatically.
Example: This code creates a simple DataFrame and finds the correlation between its columns.
Output A B
A 1.0 1.0
B 1.0 1.0
Explanation:
- df.corr() calculates correlation between all numeric columns.
- Column B = Column A + 5, which is a perfect linear relationship and hence the correlation = 1.
Syntax
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
Parameters:
- method: Correlation method (pearson, spearman, kendall), we get pearson correlation by default.
- min_periods: Minimum required matching values
- numeric_only: Includes only numeric columns if True
Examples
Example 1: This code finds correlation between height and weight columns with strong positive correlation.
Output Height Weight
Height 1.000000 0.973035
Weight 0.973035 1.000000
Explanation:
- Correlation is 0.97, which is close to +1 meaning Height and Weight increase together and and have a strong positive relationship, but not perfectly proportional.
- Pearson correlation uses covariance of actual values and since most Height and Weight values increase together, covariance is high positive, giving a value close to +1.
Example 2: This code shows negative correlation between two columns using Kendall rank method.
Output
StudyHours StressLevel
StudyHours 1.000000 -0.666667
StressLevel -0.666667 1.000000
Explanation:
- Correlation is -0.6, which shows a moderate negative relationship, meaning as StudyHours increase, StressLevel generally decreases, but not always.
- Kendall correlation counts concordant and discordant rank pairs and since more pairs have opposite order than same order, result is negative, but not all pairs are opposite, so it is -0.6 instead of -1.
Example 3: This code calculates correlation using sperman rank-based method.
Output MathMarks SportsScore
MathMarks 1.0 -1.0
SportsScore -1.0 1.0
Explanation:
- The correlation is -1.0, which shows a perfect negative relationship between the two columns.
- Spearman correlation uses rank differences and here the rank order is exactly reversed, so rank difference is maximum opposite, resulting in perfect negative correlation (-1).