Levine, Michael V.Drasgow, Drasgow, Fritz FritzWilliams, BruceMcCusker, ChristopherThomasson, Gary L.2011-09-212011-09-211992Levine, Michael V, Drasgow, Fritz, Williams, Bruce, McCusker, Christopher & et al. (1992). Measuring the difference between two models. Applied Psychological Measurement, 16, 261-278. doi:10.1177/014662169201600307doi:10.1177/014662169201600307https://hdl.handle.net/11299/115716Two psychometric models with very different parametric formulas and item response functions can make virtually the same predictions in all applications. By applying some basic results from the theory of hypothesis testing and from signal detection theory, the power of the most powerful test for distinguishing the models can be computed. Measuring model misspecification by computing the power of the most powerful test is proposed. If the power of the most powerful test is low, then the two models will make nearly the same prediction in every application. If the power is high, there will be applications in which the models will make different predictions. This measure, that is, the power of the most powerful test, places various types of model misspecification- item parameter estimation error, multidimensionality, local independence failure, learning and/or fatigue during testing-on a common scale. The theory supporting the method is presented and illustrated with a systematic study of misspecification due to item response function estimation error. In these studies, two joint maximum likelihood estimation methods (LOGIST 2B and LOGIST 5) and two marginal maximum likelihood estimation methods (BILOG and ForScore) were contrasted by measuring the difference between a simulation model and a model obtained by applying an estimation method to simulation data. Marginal estimation was found generally to be superior to joint estimation. The parametric marginal method (BILOG) was superior to the nonparametric method only for three-parameter logistic models. The nonparametric marginal method (ForScore) excelled for more general models. Of the two joint maximum likelihood methods studied, LOGIST s appeared to be more accurate than LOGIST 2B. Index terms: BILOG; forced-choice experiment; ForScore; ideal observer method; item response theory, estimation, models; LOGIST; multilinear formula score theory.enMeasuring the difference between two modelsArticle