A comparative study of item-level fit Indices in item response theory.

Davis, Jennifer Paige2009-09-102009-09-102009-07https://hdl.handle.net/11299/53401University of Minnesota Ph.D. dissertation. July 2009. Major: Psychology. Advisor: Professor David J. Weiss. 1 computer file (PDF); ix, 423 pages, appendices A-H.Item-level fit indices (IFI) in item response theory (IRT) are designed to assess the degree to which an estimated item response function approximates an observed item response pattern. There are numerous IFIs whose theoretical sampling distributions are specified; however, in some cases little is known regarding the degree to which these indices follow their theoretical distributions in practice. If an IFI departs substantially from its theoretical distribution, degree of misfit will be misestimated, and test developers will have very little idea of whether their models provide accurate depictions of true item response behavior. Therefore, a Monte Carlo simulation study was conducted to assess the degree to which many available IFIs follow their theoretical distributions. The IFIs examined in this study were (1) Infit (VI) and Outfit (VO), two IFIs commonly used for the Rasch model; (2) Yen’s (1981) c2 (Q1) and Orlando and Thissen’s (2000) c2 (QO); (3) three Langrange multiplier statistics [LM(a), LM(b), and LM(ab)] proposed by Glas (1999); and (4) Dragow, Levine, and Williams’ (1985) person fit Lz modified by Reise (1990) to assess item fit. The primary research objective of this study was to determine how a number of factors (listed below) affect Type I error rates and empirical sampling distributions of IFIs. The relationship between IFIs and item parameters was also examined. The crossed between-subjects conditions were: IRT model (1-, 2-, and 3 parameter); data noise, operationalized as strictly unidimensional vs. essentially unidimensional data; item discrimination (high and low); test length (n = 15 and n = 75); and sample size (N = 500 and N = 1,500). There were also two crossed within-subjects factors to capture the impact of item and person parameter estimation error. The dependent variables in this study were IFI Type I error rates and empirical sampling distribution moments across 18,750 replicated items. Data were analyzed and summarized using ANOVA, Pearson correlations, and graphical procedures. The Kolmogorov-Smirnov test was used to directly assess distributional assumptions. The results of the study indicated that QO was the only statistic to adhere closely to its theoretical sampling distribution across all study conditions. For VI, VO, Lz, and Q1 statistics, sampling distributions were strongly influenced by test length, parameter estimation error, and, to a lesser degree, sample size. In the absence of parameter estimation error, all statistics more closely approximated their theoretical sampling distributions and were affected little by other study conditions. The presence of person parameter estimation error tended to have an inflationary effect on sampling distribution means whereas the presence of item parameter estimation error tended to have a deflationary effect on sampling distribution variances. VI, VO, and Lz functioned very similarly to one another, with Type I error rates tending to be grossly inflated for n = 15 and deflated for n = 75 when both person and item parameter error were present. Q1 Type I error rates were also grossly inflated for n = 15, but were near nominal levels for n = 75. Finally, the LM statistics generally exhibited inflated Type I error rates and were moderately influenced by IRT model and discrimination; only for LM(b) did empirical sampling distributions tend to approach theoretical distributions, primarily when discrimination was lower or for the 3-parameter model at both levels of discrimination.en-USFit IndicesIRTModel FitSimulationPsychologyA comparative study of item-level fit Indices in item response theory.Thesis or Dissertation