This study compared several IRT calibration procedures
to determine which procedure, if any, consistently
produced the most accurate item parameter estimates.
A new criterion of calibration efficiency was
used for evaluating the calibration procedures; this criterion
considers the joint effects of individual item parameter
errors as they relate to the accuracy of &thetas; estimation.
Four methods of item calibration were
evaluated: (1) heuristic estimates obtained from transformations
of traditional item statistics; (2) ANCILLES,
a program that first fits the c parameter and then transforms
traditional item statistics to IRT a and b parameters
; (3) LOGIST, a joint maximum likelihood procedure
; and (4) ASCAL, a modification of LOGIST’S
algorithm which applies Bayesian priors to the abilities
and item parameters. These were compared with each
other and with a constant item parameter baseline condition.
ASCAL and LOGIST produced estimates of essentially
equivalent accuracy, although ASCAL’s estimates
of the c parameters were slightly superior. The heuristic
estimates and those from ANCILLES were generally
poor in comparison, particularly for smaller sample
sizes. Index terms: Calibration efficiency, Item
calibration, Item parameter estimation, Item response theory, Latent trait models.