Scale-dependent procedures are presented for
assessing the reliability of ratings for multiple
judges using intraclass correlation. Scale type is
defined in terms of admissible transformations,
and standardizing transformations for ratio and interval
scales are presented to solve the problem of
adjusting ratings for "arbitrary scale factors" (unit
and/or origin of the scale). The theory of meaningfulness
of numerical statements is introduced
and the coefficient of relational agreement (Stine,
1989b) is defined as the degree of agreement
among judges, with respect to (scale-dependent)
empirically meaningful relationships. Other topics
discussed include the treatment of variability due
to judges in relation to scale type, and the reliability
of magnitude estimates in psychophysics.
Index terms: coefficient of agreement, intraclass
correlation, meaningfulness, metric scales, reliability
of rating scales.
Fagot, Robert F. (1991). Reliability of ratings for multiple judges: Intraclass correlation and metric scales. Applied Psychological Measurement, 15, 1-11. doi:10.1177/014662169101500101
Fagot, Robert F..
Reliability of ratings for multiple judges: Intraclass correlation and metric scales.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.