Two psychometric models with very different
parametric formulas and item response functions
can make virtually the same predictions in all
applications. By applying some basic results from
the theory of hypothesis testing and from signal
detection theory, the power of the most powerful
test for distinguishing the models can be computed.
Measuring model misspecification by computing
the power of the most powerful test is
proposed. If the power of the most powerful test
is low, then the two models will make nearly the
same prediction in every application. If the power
is high, there will be applications in which the
models will make different predictions. This
measure, that is, the power of the most powerful
test, places various types of model misspecification- item parameter estimation error, multidimensionality,
local independence failure, learning
and/or fatigue during testing-on a common scale.
The theory supporting the method is presented and
illustrated with a systematic study of misspecification
due to item response function estimation error.
In these studies, two joint maximum likelihood
estimation methods (LOGIST 2B and LOGIST 5) and two
marginal maximum likelihood estimation methods
(BILOG and ForScore) were contrasted by measuring
the difference between a simulation model and a
model obtained by applying an estimation method
to simulation data. Marginal estimation was found
generally to be superior to joint estimation. The
parametric marginal method (BILOG) was superior
to the nonparametric method only for three-parameter
logistic models. The nonparametric marginal
method (ForScore) excelled for more general
models. Of the two joint maximum likelihood
methods studied, LOGIST s appeared to be more
accurate than LOGIST 2B. Index terms: BILOG;
forced-choice experiment; ForScore; ideal observer
method; item response theory, estimation, models;
LOGIST; multilinear formula score theory.
Levine, Michael V, Drasgow, Fritz, Williams, Bruce, McCusker, Christopher & et al. (1992). Measuring the difference between two models. Applied Psychological Measurement, 16, 261-278. doi:10.1177/014662169201600307
Levine, Michael V.; Drasgow, Drasgow, Fritz Fritz; Williams, Bruce; McCusker, Christopher; Thomasson, Gary L..
Measuring the difference between two models.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.