This study examined how well current software
implementations of four polytomous item response
theory models fit several multiple-choice tests. The
models were Bock’s (1972) nominal model,
Samejima’s (1979) multiple-choice Model C, Thissen
& Steinberg’s (1984) multiple-choice model, and
Levine’s (1993) maximum-likelihood formula scoring
model. The parameters of the first three of these models
were estimated with Thissen’s (1986) MULTILOG
computer program; Williams & Levine’s (1993)
FORSCORE program was used for Levine’s model. Tests
from the Armed Services Vocational Aptitude Battery, the Scholastic Aptitude Test, and the American College
Test Assessment were analyzed. The models were fit in
estimation samples of approximately 3,000; cross-validation
samples of approximately 3,000 were used to
evaluate goodness of fit. Both fit plots and X² statistics
were used to determine the adequacy of fit. Bock’s
model provided surprisingly good fit; adding parameters
to the nominal model did not yield improvements
in fit. FORSCORE provided generally good fit for
Levine’s nonparametric model across all tests. Index
terms: Bock’s nominal model, FORSCORE, maximum
likelihood formula scoring, MULTILOG, polytomous IRT.
Drasgow, Fritz, Levine, Michael V, Tsien, Sherman, Williams, Bruce & et al. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143-165. doi:10.1177/014662169501900203
Drasgow, Fritz; Levine, Michael V.; Tsien, Sherman; Williams, Bruce; Mead, Alan D..
Fitting polytomous item response theory models to multiple-choice tests.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.