Many estimators of the measure of agreement
between two dichotomous ratings of a person have
been proposed. The results of Fleiss (1975) are
extended, and it is shown that four estimators-
Scott’s (1955) π coefficient, Cohen’s (1960) kˆ,
Maxwell & Pilliner’s (1968) r₁₁, and Mak’s (1988)
p˜-are interpretable both as chance-corrected
measures of agreement and as intraclass correlation
coefficients for different ANOVA models. Relationships
among these estimators are established
for finite samples. Under Kraemer’s (1979) model,
it is shown that these estimators are equivalent in
large samples, and that the equations for their
large sample variances are equivalent. Index
terms: index of agreement, interrater reliability,
intraclass correlation, kappa statistic.
Blackman, Nicole J.-M & Koval, John J. (1993). Estimating rater agreement in 2x2 tables: Correction for chance and intraclass correlation. Applied Psychological Measurement, 17, 211-223. doi:10.1177/014662169301700302
Blackman, Nicole J.-M.; Koval, John J..
Estimating rater agreement in 2x2 tables: Correction for chance and intraclass correlation.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.