Item-level fit indices (IFI) in item response theory (IRT) are designed to assess
the degree to which an estimated item response function approximates an observed item
response pattern. There are numerous IFIs whose theoretical sampling distributions are
specified; however, in some cases little is known regarding the degree to which these
indices follow their theoretical distributions in practice. If an IFI departs substantially
from its theoretical distribution, degree of misfit will be misestimated, and test developers
will have very little idea of whether their models provide accurate depictions of true item
response behavior. Therefore, a Monte Carlo simulation study was conducted to assess
the degree to which many available IFIs follow their theoretical distributions. The IFIs
examined in this study were (1) Infit (VI) and Outfit (VO), two IFIs commonly used for
the Rasch model; (2) Yen’s (1981) c2 (Q1) and Orlando and Thissen’s (2000) c2 (QO);
(3) three Langrange multiplier statistics [LM(a), LM(b), and LM(ab)] proposed by Glas
(1999); and (4) Dragow, Levine, and Williams’ (1985) person fit Lz modified by Reise
(1990) to assess item fit.
The primary research objective of this study was to determine how a number of
factors (listed below) affect Type I error rates and empirical sampling distributions of
IFIs. The relationship between IFIs and item parameters was also examined. The crossed
between-subjects conditions were: IRT model (1-, 2-, and 3 parameter); data noise,
operationalized as strictly unidimensional vs. essentially unidimensional data; item
discrimination (high and low); test length (n = 15 and n = 75); and sample size (N = 500
and N = 1,500). There were also two crossed within-subjects factors to capture the impact
of item and person parameter estimation error. The dependent variables in this study were
IFI Type I error rates and empirical sampling distribution moments across 18,750
replicated items. Data were analyzed and summarized using ANOVA, Pearson
correlations, and graphical procedures. The Kolmogorov-Smirnov test was used to
directly assess distributional assumptions.
The results of the study indicated that QO was the only statistic to adhere closely
to its theoretical sampling distribution across all study conditions. For VI, VO, Lz, and Q1 statistics, sampling distributions were strongly influenced by test length, parameter
estimation error, and, to a lesser degree, sample size. In the absence of parameter
estimation error, all statistics more closely approximated their theoretical sampling
distributions and were affected little by other study conditions. The presence of person
parameter estimation error tended to have an inflationary effect on sampling distribution
means whereas the presence of item parameter estimation error tended to have a
deflationary effect on sampling distribution variances. VI, VO, and Lz functioned very
similarly to one another, with Type I error rates tending to be grossly inflated for n = 15
and deflated for n = 75 when both person and item parameter error were present. Q1
Type I error rates were also grossly inflated for n = 15, but were near nominal levels for n
= 75. Finally, the LM statistics generally exhibited inflated Type I error rates and were
moderately influenced by IRT model and discrimination; only for LM(b) did empirical
sampling distributions tend to approach theoretical distributions, primarily when
discrimination was lower or for the 3-parameter model at both levels of discrimination.
University of Minnesota Ph.D. dissertation. July 2009. Major: Psychology. Advisor: Professor David J. Weiss. 1 computer file (PDF); ix, 423 pages, appendices A-H.
Davis, Jennifer Paige.
A comparative study of item-level fit Indices in item response theory..
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.