A comparative study of item-level fit Indices in item response theory.

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


A comparative study of item-level fit Indices in item response theory.

Published Date




Thesis or Dissertation


Item-level fit indices (IFI) in item response theory (IRT) are designed to assess the degree to which an estimated item response function approximates an observed item response pattern. There are numerous IFIs whose theoretical sampling distributions are specified; however, in some cases little is known regarding the degree to which these indices follow their theoretical distributions in practice. If an IFI departs substantially from its theoretical distribution, degree of misfit will be misestimated, and test developers will have very little idea of whether their models provide accurate depictions of true item response behavior. Therefore, a Monte Carlo simulation study was conducted to assess the degree to which many available IFIs follow their theoretical distributions. The IFIs examined in this study were (1) Infit (VI) and Outfit (VO), two IFIs commonly used for the Rasch model; (2) Yen’s (1981) c2 (Q1) and Orlando and Thissen’s (2000) c2 (QO); (3) three Langrange multiplier statistics [LM(a), LM(b), and LM(ab)] proposed by Glas (1999); and (4) Dragow, Levine, and Williams’ (1985) person fit Lz modified by Reise (1990) to assess item fit. The primary research objective of this study was to determine how a number of factors (listed below) affect Type I error rates and empirical sampling distributions of IFIs. The relationship between IFIs and item parameters was also examined. The crossed between-subjects conditions were: IRT model (1-, 2-, and 3 parameter); data noise, operationalized as strictly unidimensional vs. essentially unidimensional data; item discrimination (high and low); test length (n = 15 and n = 75); and sample size (N = 500 and N = 1,500). There were also two crossed within-subjects factors to capture the impact of item and person parameter estimation error. The dependent variables in this study were IFI Type I error rates and empirical sampling distribution moments across 18,750 replicated items. Data were analyzed and summarized using ANOVA, Pearson correlations, and graphical procedures. The Kolmogorov-Smirnov test was used to directly assess distributional assumptions. The results of the study indicated that QO was the only statistic to adhere closely to its theoretical sampling distribution across all study conditions. For VI, VO, Lz, and Q1 statistics, sampling distributions were strongly influenced by test length, parameter estimation error, and, to a lesser degree, sample size. In the absence of parameter estimation error, all statistics more closely approximated their theoretical sampling distributions and were affected little by other study conditions. The presence of person parameter estimation error tended to have an inflationary effect on sampling distribution means whereas the presence of item parameter estimation error tended to have a deflationary effect on sampling distribution variances. VI, VO, and Lz functioned very similarly to one another, with Type I error rates tending to be grossly inflated for n = 15 and deflated for n = 75 when both person and item parameter error were present. Q1 Type I error rates were also grossly inflated for n = 15, but were near nominal levels for n = 75. Finally, the LM statistics generally exhibited inflated Type I error rates and were moderately influenced by IRT model and discrimination; only for LM(b) did empirical sampling distributions tend to approach theoretical distributions, primarily when discrimination was lower or for the 3-parameter model at both levels of discrimination.


University of Minnesota Ph.D. dissertation. July 2009. Major: Psychology. Advisor: Professor David J. Weiss. 1 computer file (PDF); ix, 423 pages, appendices A-H.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Davis, Jennifer Paige. (2009). A comparative study of item-level fit Indices in item response theory.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/53401.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.