Livingston, Samuel A.2011-02-152011-02-151980Livingston, Samuel A. (1980). Comments on criterion-referenced testing. Applied Psychological Measurement, 4, 575-581. doi:10.1177/014662168000400409doi:10.1177/014662168000400409https://hdl.handle.net/11299/100276The six papers in this issue summarize 10 years of theory development, empirical research, and practical experience in criterion-referenced testing. Much of the theory development has focused on questions and issues raised by Popham and Husek (1969), who pointed out that much of traditional psychometric theory did not work well when applied to criterion-referenced tests. The six papers, taken together, represent an attempt to answer four basic questions: 1. How should the reliability of a criterion-referenced test be measured? 2. How should it be decided how many items are needed in a criterion-referenced test? 3. How should criterion-referenced tests be used to make decisions about the people taking the tests? 4. What kind of evidence should be provided for the validity of a criterion-referenced test? Attempts to answer these questions have been complicated by the lack of a universally accepted, unambiguous definition of the term "criterion-referenced test." Glaser’s (1963) article, in which the term first appeared, defined criterion-referenced measures as those that "depend on an absolute standard of quality" (p. 519). However, Glaser went on to say that "the standard against which a student’s performance is compared when measured in this manner is the behavior which defines each point along the achievement continuum" (p. 519) and that "we need to behaviorally specify minimum levels of performance..." (p. 520). These two ideas-absolute standards and behavioral test content specifications-received varying degrees of emphasis from the different individuals who attempted to develop criterion-referenced tests and to theorize about criterion-referenced testing. As a result, there are now several different answers to some of the questions that Popham and Husek (1969) raised.enComments on criterion-referenced testingArticle