A comparison of item- and person-fit methods of assessing model-data fit in IRT

Reise, Steven P.2011-06-212011-06-211990Reise, Steven P. (1990). A comparison of item- and person-fit methods of assessing model-data fit in IRT. Applied Psychological Measurement, 14, 127-137. doi:10.1177/014662169001400202doi:10.1177/014662169001400202https://hdl.handle.net/11299/107793Many item-fit statistics have been proposed for assessing whether the responses to test items aggregated across examinees conform to IRT test models. Conversely, person-fit statistics have been proposed for assessing whether an examinee’s responses aggregated across items are congruent with a specified IRT model. Statistical procedures to assess item fit have differed from those to assess person fit. This research compared a x² item-fit index with a likelihood-based person-fit index. Eight 0,1 data matrices were simulated under the three-parameter logistic test model. Both the likelihood-based and x² fit statistics were then computed for examinees and items, and Type I and Type II error rates were analyzed. With data simulated to fit the IRT model, the x² test overidentified examinees and items as being misfitting, while the likelihood-based fit index held closer to the specified α levels. The two fit indices gave consistent (mis)fit-to-model results in 94 and 97 percent of cases for items and examinees, respectively, across simulations. Under simulated conditions of data misfit, the x² statistic detected misfit at a higher rate than the likelihood-based statistic, indicating that the x² statistic was slightly more sensitive to response pattern aberrancy. However, other considerations led to a recommendation for employing the likelihood-based index in applied fit analyses to evaluate both examinee and item model-data (mis)fit. Index terms: chi-square index, item fit, item response theory, model fit, person fit, response aberrancy.enA comparison of item- and person-fit methods of assessing model-data fit in IRTArticle