Browsing by Author "Fleiss, Joseph L."

Now showing 1 - 4 of 4

Balanced incomplete block designs for inter-rater reliability studies
(1981) Fleiss, Joseph L.
Occasionally, an inter-rater reliability study must be designed so that each subject is rated by fewer than all the participating raters. If there is interest in comparing the raters’ mean levels of rating, and if it is desired that each mean be estimated with the same precision, then a balanced incomplete block design for the reliability study is indicated. Methods for executing the design and for analyzing the resulting data are presented, using data from an actual study for illustration.
Comparison of the null distributions of weighted kappa and the C ordinal statistic
(1977) Cicchetti, Domenic V.; Fleiss, Joseph L.
It frequently occurs in psychological research that an investigator is interested in assessing the extent of interrater agreement when the data are measured on an ordinal scale. This monte carlo study demonstrates that the appropriate statistic to apply is weighted kappa with its revised standard error. The study also demonstrates that the minimal number of cases required for the valid application of weighted kappa varies between 20 and 100, depending upon the size of the ordinal scale. This contrasts with a previously cited large sample estimate of 200. Given the difficulty of obtaining sample sizes this large, the latter finding should be of some comfort to investigators who use weighted kappa to measure interrater consensus.
Inference about weighted kappa in the non-null case
(1978) Fleiss, Joseph L.; Cicchetti, Domenic V.
The accuracy of the large sample standard error of weighted kappa appropriate to the non-null case was studied by computer simulation. Results indicate that only moderate sample sizes are required to test the hypothesis that two independently derived estimates of weighted kappa are equal. However, in most instances the minimal sample sizes required for setting confidence limits around a single value of weighted kappa are inordinately large. An alternative, but as yet untested procedure for setting confidence limits, is suggested as being potentially more accurate.
The reliability of dichotomous judgments: Unequal numbers of judges per subject
(1979) Fleiss, Joseph L.; Cuzick, Jack
Consider a reliability study in which different subjects are judged on a dichotomous trait by different sets of judges, possibly unequal in number. A kappa-like measure of reliability is proposed, its correspondence to an intraclass correlation coefficient is pointed out, and a test for its statistical significance is presented. A numerical example is given.

University Digital Conservancy

Browse by Author

Browsing by Author "Fleiss, Joseph L."