When determining how many items to include on
a criterion-referenced test, practitioners must resolve
various nonstatistical issues before a particular
solution can be applied. A fundamental
problem is deciding which of three true scores
should be used. The first is based on the probability
that an examinee is correct on a "typical"
test item. The second is the probability of having
acquired a typical skill among a domain of skills,
and the third is based on latent trait models. Once
a particular true score is settled upon, there are
several perspectives that might be used to determine
test length. The paper reviews and critiques
these solutions. Some new results are described that
apply when latent structure models are used to estimate
an examinee’s true score.