Two issues related to differential item functioning (DIF) in the context of computerized adaptive testing (CAT) were addressed in this study: 1) the effect of DIF in operational items on the accuracy of the ability estimate (θ ̂_"CAT" ) and 2) the accuracy of detecting DIF in pretest items when DIF occurred in operational items and examinees were matched on the number-correct score (NCS), the ability estimate obtained from nonadaptive computer-based testing (θ ̂_"CBT" ), and θ ̂_"CAT" . To investigate the first issue, a series of simulations were conducted by varying the level of DIF magnitude (0, .4, 1, and 1.6); DIF type (uniform and nonuniform); DIF contamination or the number of DIF items (6, 15, and 24 items out of the 30-item test); and DIF occurrence (first, middle, last, and across stages of CAT). For the latter issue, test impact (μ_R-μ_F = 0 and 1) and sample size ratio (NR:NF = 1:1 and 9:1 ) were also added to the simulation. It was found in the first simulation that CAT could adjust for the effect of DIF in operational items if DIF occurred in the early stages of CAT, with some restrictions though. Specifically, CAT successfully adjusted for the effect of DIF at the earlier stages if the number of DIF items and the magnitude of DIF were moderate. In other situations, CAT seemed to reduce the effect of DIF as seen in the trend of SEs which increased when DIF items were delivered and decreased after CAT administered a new DIF-free item. However, the self-adjustment of CAT was not enough to recover θ ̂_"CAT" from DIF effects. The results from another simulation suggested that matching examinees on θ ̂_"CAT" did not provide impressive advantages over the NCS and θ ̂_"CBT" in most of the simulation conditions. Overall, when operational items were contaminated with moderate DIF magnitude, the three matching variables yielded comparable results of DIF detection in pretest items. However, when the level of DIF contamination in operational items increased, matching examinees on θ ̂_"CAT" led to the worst situation of detecting DIF in pretest items, especially when large-uniform DIF items were used in the operational test. It was also evident that DIF in operational items, especially CAT items, led to false identification of DIF type. Specifically, pretest items exhibiting uniform DIF were mistakenly identified as nonuniform DIF if the matching variable was obtained from nonuniform-DIF operational items.
University of Minnesota Ph.D. dissertation. April 2014. Major: Educational Psychology. Advisor: Ernest Davenport. 1 computer file (PDF); ix, 192 pages, appendices A-H.
Differential item functioning in computerized adaptive testing: can CAT self-adjust enough?.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.