![]() |
VOOZH | about |
Correlation is used to summarize the strength and direction of the linear association between two quantitative variables. It is denoted by r and values between -1 and +1. A positive value for r indicates a positive association and a negative value for r indicates a negative association. Let's explore several methods to calculate correlation between columns in a pandas DataFrame.
corr() calculates the Pearson correlation coefficient between two individual columns (Series) in a pandas DataFrame. It’s simple and quick when you want to check the correlation between just two variables.
0.9931532689569343
Explanation: This code computes the correlation coefficient between math and science scores, a value between -1 and +1 that measures the strength and direction of their linear relationship. +1 indicates a perfect positive correlation, -1 a perfect negative and 0 means no linear correlation.
Dataframe corr() computes the correlation matrix for all numeric columns in the DataFrame. It returns pairwise correlation coefficients between all columns, making it easy to see relationships across multiple variables at once.
Output
Explanation: This code calculates the correlation matrix for all numeric columns in the dataframe, showing the pairwise correlation coefficients between each subject's scores. Each value ranges from -1 to +1, indicating the strength and direction of the linear relationships among the columns.
corrcoef() from the NumPy library calculates the Pearson correlation coefficient matrix between two arrays. It is useful when working directly with NumPy arrays or when pandas is not required.
0.976632340152094
Explanation: This code calculates the Pearson correlation coefficient between math and English scores using NumPy’s corrcoef function. It returns a value between -1 and +1 that measures the strength and direction of the linear relationship between these two columns.
This function calculates the Pearson correlation coefficient along with the p-value to test the hypothesis of no correlation. It is helpful if you want to know both the strength of the correlation and its statistical significance.
Output
0.9045939369328619
0.03486446724084317
Explanation: This code calculates the Pearson correlation and p-value between science and history scores, showing their linear relationship and its statistical significance.