![]() |
VOOZH | about |
In this article, we will discuss What is Cohen’s Kappa and How to Calculate Cohen’s Kappa in the R Programming Language.
Cohen's Kappa is a statistical measure used to assess inter-rater reliability or agreement between two raters when dealing with categorical data. It quantifies the level of agreement between the raters by taking into account the agreement that could be expected by chance alone. It's particularly useful when assessing agreement on subjective judgments or classifications.
Cohen's Kappa is important because it helps ensure that different people making subjective judgments agree consistently. This is crucial in fields where opinions or classifications vary. By using Cohen's Kappa, researchers or professionals can check if the agreement between raters is real or just by chance. It helps ensure that the data collected is reliable and trustworthy.
Categorical agreement refers to the degree to which two raters assign the same category or label to a given set of data. In Cohen's Kappa calculation, categorical agreement serves as the foundation for evaluating the level of agreement between the raters. The Kappa statistic compares the observed level of agreement with the level of agreement expected by chance alone.
k=Po-Pe/1-Pe
Where:
Po is the observed proportion of agreement between the raters.
Pe is the expected proportion of agreement by chance.
Observed agreement refers to the proportion of cases in which two raters or methods agree on the categorization or classification of items. It represents the actual, observed instances where both raters provide the same classification.
Example: Suppose two medical professionals independently examine a set of X-ray images and categorize each image as either showing signs of a specific condition or not. If they both agree on the classification for 80 out of 100 X-ray images, then the observed agreement is 80%.
Expected agreement is what we would expect to happen by chance. It considers how often they would agree just by guessing, based on the overall probability of each choice.
Example: In the X-ray example, if the prevalence of the condition in the dataset is 30%, and both raters are assigning categories randomly based on this prevalence, the expected agreement can be calculated. If 30 out of 100 X-ray images are expected to show signs of the condition, and both raters are randomly classifying them, the expected agreement for this category can be determined. This process is repeated for each category.
Cohen's Kappa ranges from -1 to 1:
- k=1: Perfect agreement beyond chance.
- k=0: Agreement equal to that expected by chance alone.
- k=−1: Perfect disagreement beyond chance.
By comparing observed and expected agreement, Cohen's Kappa provides a normalized measure of agreement that accounts for the possibility of chance agreement. Categorical agreement is crucial in this context because it forms the basis for understanding the level of agreement between raters, which is then used to calculate Kappa. The Kappa coefficient helps researchers assess the reliability and validity of categorical assignments, taking into account what could be expected due to random chance.
Let's consider a scenario where two doctors are assessing the presence or absence of a specific medical condition (Condition X) in a set of patients. Each patient is either diagnosed as having the condition (Positive) or not having the condition (Negative). The two doctors independently review a sample of 100 patients, and we want to assess the agreement between their diagnoses using Cohen's Kappa.
Here,
1.Calculate Observed Agreement (Po):
Po = a+d/a+b+c+d
Po = 60+15/60+10+15+15
Po = 75/100
Po = 0.75
2.Calculate Agreement Expected by Chance (Pe):
Po = (a+b)*(a+c)*(c+d)*(b+d)/(a+b+c+d)2
Po = (60+10)*(60+15)*(15+15)*(10+15)/(60+10+15+15)2
Po = 70*75*30*25/1002
Po = 5250*750/10000
Po = 6000/10000
Po = 0.6
3.Calculate Cohen's Kappa:
k = Po-Pe/1-Pe
k = 0.75-0.6/1-0.6
k = 0.15/0.4
k = 0.375
Therefore, Cohen's Kappa for the two doctors' diagnoses of Condition X is 0.375. The interpretation of this value would depend on the context, but generally, values above 0.6 are considered substantial agreement. In this case, there is a moderate level of agreement between the two doctors in diagnosing Condition X.
We can calculate Cohen's Kappa in R using functions from packages such as irr (for inter-rater reliability), psych (for psychological statistics) or vcd (Visualizing Categorical Data) package.
Install and Load 'irr' package
Output:
Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 5
Raters = 2
Kappa = 0.688
z = 2.28
p-value = 0.0224
First Installs and loads the irr package.
1.Install and load the 'vcd' package:
Output:
0 1
0 3 1
1 1 5
Use the kappa2 function to calculate Cohen's Kappa
Output:
Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 2
Raters = 2
Kappa = -0.333
z = -1.41
p-value = 0.157
Displays counts of observations for each combination of ratings.
Output:
Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)
Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries
lower estimate upper
unweighted kappa -0.089 0.25 0.59
weighted kappa 0.128 0.57 1.00
Number of subjects = 5
Install and load the "psych" package in R.
For unweighted kappa:
For weighted kappa:
For unweighted kappa, the estimate of kappa is 0.25. This suggests slight to fair agreement.
For weighted kappa, the estimate of kappa is 0.57. This indicates moderate to substantial agreement.
Cohen's Kappa is a important tool for assessing inter-rater agreement in various fields. By accounting for chance agreement, it provides a more accurate measure of reliability than simple agreement percentages. Applying Cohen's Kappa can enhance the quality and validity of research findings, ensuring consistency and thoroughness in categorical judgments.