University Digital Conservancy :: Browsing by Author "Cohen, Allan S."

Browsing by Author "Cohen, Allan S."

Now showing 1 - 7 of 7

A comparison of Lord's x² and Raju's area measures in detection of DIF
(1993) Cohen, Allan S.; Kim, Seock-ho
The area between item response functions estimated in different samples is often used as a measure of differential item functioning (DIF). Under item response theory, this area should be 0, except for errors of measurement. This study examined the effectiveness of two statistical tests of this area—a Z test for exact signed area and a Z test for exact unsigned area—for different test length, sample size, proportion of DIF items on the test, and item parameter estimation conditions using the two-parameter model. Errors in detection made using these two statistics were compared with errors made using Lord’s Χ². Differences between all three statistics were relatively small; however, the Χ² statistic was more effective than either of the two Z tests at detecting simulated DIF. The Z test for the exact signed area was the least effective and was the most likely to result in false negative errors. Index terms: area measures, differential item functioning, item response theory, item bias, Lord’s Χ².
A comparison of partial and complete paired comparisons in sociometric measurement of preschool groups
(1978) Cohen, Allan S.; Van Tassel, Elizabeth
Low test-retest reliabilities over periods from ten days to five months have been obtained on a partial-rank order sociometric, the PSI, of preschool-age children’s peer preferences. These results have been interpreted to mean that preschool-age children do not have stable and enduring friendships with their peers. An alternative possibility is that the reliabilities of partially ranked data are so low as to obscure the existence of stable individual friendships in this age group. A full-rank order sociometric instrument, the PCST, utilizing color photographs of the children in a preschool group as aids in eliciting friendship choices from the children, was developed and tested on a group of three-year-olds and a group of four-year-olds. The sociometric measurements from both the PSI and the PCST were most reliable for the four-year-old group. Correlations between the PSI and the PCST, when corrected for attenuation, revealed that the two measures were probably assessing the same peer choice behavior, although the PCST was markedly superior in reliability. Administration time for the PCST was higher but substantially less than for previous paired-comparisons procedures.
A comparison of two area measures for detecting differential item functioning
(1991) Kim, Seock-ho; Cohen, Allan S.
The area between two item response functions is often used as a measure of differential item functioning under item response theory. This area can be measured over either an open interval (i.e., exact) or closed interval. Formulas are presented for computing the closed-interval signed and unsigned areas. Exact and closed-interval measures were estimated on data from a test with embedded items intentionally constructed to favor one group over another. No real differences in detection of these items were found between exact and closed-interval methods. Index terms: BILOG, closed interval, differential item functioning, item response functions, open interval, signed area, unsigned area.
Detection of differential item functioning in the graded response model
(1993) Cohen, Allan S.; Kim, Seock-Ho; Baker, Frank B.
Methods for detecting differential item functioning (DIF) have been proposed primarily for the item response theory dichotomous response model. Three measures of DIF for the dichotomous response model are extended to include Samejima’s graded response model: two measures based on area differences between item true score functions, and a χ² statistic for comparing differences in item parameters. An illustrative example is presented. Index terms: differential item functioning, graded response model, item response theory.
An investigation of Lord's procedure for the detection of differential item functioning
(1994) Kim, Seock-Ho; Cohen, Allan S.; Kim, Hae-Ok
Type I error rates of Lord’s X² chi; test for differential item functioning were investigated using monte carlo simulations. Two- and three-parameter item response theory (IRT) models were used to generate 50-item tests for samples of 250 and 1,000 simulated examinees. Item parameters were estimated using two algorithms (marginal maximum likelihood estimation and marginal Bayesian estimation) for three IRT models (the three-parameter model, the three-parameter model with a fixed guessing parameter, and the two-parameter model). Proportions of significant X²s at selected nominal α levels were compared to those from joint maximum likelihood estimation as reported by McLaughlin & Drasgow (1987). Type I error rates for the three-parameter model consistently exceeded theoretically expected values. Results for the three-parameter model with a fixed guessing parameter and for the two-parameter model were consistently lower than expected values at the a levels in this study. Index terms: differential item functioning, item response theory, Lord’s X².
An investigation of the likelihood ratio test for detection of differential item functioning
(1996) Cohen, Allan S.; Kim, Seock-Ho; Wollack, James A.
Type I error rates for the likelihood ratio test for detecting differential item functioning (DIF) were investigated using monte carlo simulations. Two- and three-parameter item response theory (IRT) models were used to generate 100 datasets of a 50-item test for samples of 250 and 1,000 simulated examinees for each IRT model. Item parameters were estimated by marginal maximum likelihood for three IRT models: the three-parameter model, the three-parameter model with a fixed guessing parameter, and the two-parameter model. All DIF comparisons were simulated by randomly pairing two samples from each sample size and IRT model condition so that, for each sample size and IRT model condition, there were 50 pairs of reference and focal groups. Type I error rates for the two-parameter model were within theoretically expected values at each of the α levels considered. Type I error rates for the three-parameter and three-parameter model with a fixed guessing parameter, however, were different from the theoretically expected values at the α levels considered. Index terms: bias, differential item functioning, item bias, item response theory, likelihood ratio test for DIF.
A minimum X² method for equating tests under the graded response model
(1995) Kim, Seock-Ho; Cohen, Allan S.
The minimum X² method for computing equating coefficients for tests with dichotomously scored items was extended to the case of Samejima’s graded response items. The minimum X² method was compared with the test response function method (also referred to as the test characteristic curve method) in which the equating coefficients were obtained by matching the test response functions of the two tests. The minimum X² method was much less demanding computationally and yielded equating coefficients that differed little from those obtained using the test response function approach. Index terms: equating, graded response model, item response theory, minimum X² method, test response function method.

University Digital Conservancy

Browsing by Author "Cohen, Allan S."

Results Per Page

Sort Options