Reliability of ratings for multiple judges: Intraclass correlation and metric scales

Fagot, Robert F.2011-08-262011-08-261991Fagot, Robert F. (1991). Reliability of ratings for multiple judges: Intraclass correlation and metric scales. Applied Psychological Measurement, 15, 1-11. doi:10.1177/014662169101500101doi:10.1177/014662169101500101https://hdl.handle.net/11299/113942Scale-dependent procedures are presented for assessing the reliability of ratings for multiple judges using intraclass correlation. Scale type is defined in terms of admissible transformations, and standardizing transformations for ratio and interval scales are presented to solve the problem of adjusting ratings for "arbitrary scale factors" (unit and/or origin of the scale). The theory of meaningfulness of numerical statements is introduced and the coefficient of relational agreement (Stine, 1989b) is defined as the degree of agreement among judges, with respect to (scale-dependent) empirically meaningful relationships. Other topics discussed include the treatment of variability due to judges in relation to scale type, and the reliability of magnitude estimates in psychophysics. Index terms: coefficient of agreement, intraclass correlation, meaningfulness, metric scales, reliability of rating scales.enReliability of ratings for multiple judges: Intraclass correlation and metric scalesArticle