Six methods of equating The Test of English as a
Foreign Language (TOEFL) test scores were evaluated
in terms of scale stability. True score item response
theory (IRT) equating based on "Fixed b’s" scaling,
the current TOEFL operational scaling and equating
procedure, was found to produce the least discrepant
results when compared to two IRT models (b parameter
estimated, a and c parameters fixed; all three parameters
reestimated), and to three conventional equating
methods (Tucker, Levine, and equipercentile). The
results for Fixed b’s scaling were limited by an inadequately
fit item; but if such items can be identified
prior to calibration, or if pretested data are observed to
produce reliable estimates of total group data, then
true score IRT equating based on scaling by fixing the
b parameters of a set of pretested items may be a very
Hicks, Marilyn M. (1983). True score equating by Fixed b's scaling: A flexible and stable equating alternative. Applied Psychological Measurement, 7, 255-266. doi:10.1177/014662168300700302
Hicks, Marilyn M..
True score equating by Fixed b's scaling: A flexible and stable equating alternative.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital
Conservancy may be subject to additional license and use
restrictions applied by the depositor.