Browsing by Author "Hambleton, Ronald K."

Now showing 1 - 8 of 8

Advances in item response theory and applications: An introduction
(1982) Hambleton, Ronald K.; Van der Linden, Wim J.
Test theories can be divided roughly into two categories. The first is classical test theory, which dates back to Spearman’s conception of the observed test score as a composite of true and error components, and which was introduced to psychologists at the beginning of this century. Important milestones in its long and venerable tradition are Gulliksen’s Theory of Mental Tests (1950) and Lord and Novick’s Statistical Theories of Mental Test Scores (1968). The second is item response theory, or latent trait theory, as it has been called until recently. At the present time, item response theory (IRT) is having a major impact on the field of testing. Models derived from IRT are being used to develop tests, to equate scores from nonparallel tests, to investigate item bias, and to report scores, as well as to address many other pressing measurement problems (see, e.g., Hambleton, 1983; Lord, 1980). IRT differs from classical test theory in that it assumes a different relation of the test score to the variable measured by the test. Although there are parallels between models from IRT and psychophysical models formulated around the turn of the century, only in the last 10 years has IRT had any impact on psychometricians and test users. Work by Rasch (1980/1960), Fischer (1974), 9 Birnbaum (1968), ivrighi and Panchapakesan (1969), Bock (1972), and Lord (1974) has been especially influential in this turnabout; and Lazarsfeld’s pioneering work on latent structure analysis in sociology (Lazarsfeld, 1950; Lazarsfeld & Henry, 1968) has also provided impetus. One objective of this introduction is to review the conceptual differences between classical test theory and IRT. A second objective is to introduce the goals of this special issue on item response theory and the seven papers. Some basic problems with classical test theory are reviewed in the next section. Then, IRT approaches to educational and psychological measurement are presented and compared to classical test theory. The final two sections present the goals for this special issue and an outline of the seven invited papers.
Application of item response models to criterion-referenced assessment
(1983) Hambleton, Ronald K.
Of interest in this study was the use of item response models for obtaining accurate examinee domain score estimates and for increasing the probabilities with which examinees are assigned correctly to mastery states with criterion-referenced test scores. Specifically, the purpose of this investigation was to compare the one-, two-, and three-parameter logistic test models for estimating domain scores and making mastery/nonmastery decisions. Computer simulation methods were used to recover a set of true domain scores with each of the logistic test models under a variety of testing conditions. Also, the percent of times the use of each model led to decisions which were consistent with decisions made with the true domain scores was studied. The one-parameter and three-parameter model resulted in highly comparable results for middle and high ability examinees, while for low ability examinees, the more general model always performed somewhat better.
Assessing the dimensionality of a set of test items
(1986) Hambleton, Ronald K.; Rovinelli, Richard J.
This study compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis, residual analysis, and a method developed by Bejar (1980). Five artificial test datasets (for 40 items and 1,500 examinees) were generated to be consistent with the three-parameter logistic model and the assumption of either a one- or a two-dimensional latent space. Two variables were manipulated: (1) the correlation between the traits (r = .10 or r = .60) and (2) the percent of test items measuring each trait (50% measuring each trait, or 75% measuring the first trait and 25% measuring the second trait). While linear factor analysis in all instances overestimated the number of underlying dimensions in the data, nonlinear factor analysis with linear and quadratic terms led to correct determination of the item dimensionality in the three datasets where it was used. Both the residual analysis method and Bejar’s method proved disappointing. These results suggest the need for extreme caution in using linear factor analysis, residual analysis, and Bejar’s method until more investigations of these methods can confirm their adequacy. Nonlinear factor analysis appears to be the most promising of the four methods, but more experience in applying the method seems necessary before wide-scale use can be recommended.
The changing conception of measurement: A commentary
(1986) Hambleton, Ronald K.
This paper comments on the contributions to this special issue on item banking. An historical framework for viewing the papers is provided by brief reviews of the literature in the areas of item response theory, item banking, and computerized testing. In general, the eight papers are viewed as contributing valuable technical knowledge for implementing testing programs with the aid of item banks.
Contributions to criterion-referenced testing technology: An introduction
(1980) Hambleton, Ronald K.
Glaser (1963) and Popham and Husek (1969) were the first researchers to draw attention to the need for criterion-referenced tests, which were to be tests specifically designed to provide score information in relation to sets of well-defined objectives or competencies. They felt that test score information referenced to clearly specified domains of content was needed by (1) teachers for successfully monitoring student progress and diagnosing student instructional needs in objectives-based programs and by (2) evaluators for determining program effectiveness. Norm-referenced tests were not deemed appropriate for providing the necessary test score information. Many definitions of criterion-referenced tests have been offered in the last 10 years (Gray, 1978; Nitko, 1980). In fact, Gray (1978) reported the existence of 57 different definitions. Popham’s definition, reported by Hambleton (1981) in a slightly modified form, is probably the most widely used: A criterion-referenced test is constructed to assess the performance levels of examinees in relation to a set of well-defined objectives (or competencies).
Influence of the criterion variable on the identification of differentially functioning test items using the Mantel-Haenszel statistic
(1991) Clauser, Brian E.; Mazor, Kathleen; Hambleton, Ronald K.
This study investigated the effectiveness of the Mantel-Haenszel (MH) statistic in detecting differentially functioning (DIF) test items when the internal criterion was varied. Using a dataset from a statewide administration of a life skills examination, a sample of 1,000 Anglo-American and 1,000 Native American examinee item response sets were analyzed. The MH procedure was first applied to all the items involved. The items were then categorized as belonging to one or more of four subtests based on the skills or knowledge needed to select the correct response. Each subtest was then analyzed as a separate test, using the MH procedure. Three control subtests were also established using random assignment of test items and were analyzed using the MH procedure. The results revealed that the choice of criterion, total test score versus subtest score, had a substantial influence on the classification of items as to whether or not they were differentially functioning in the American and Native American groups. Evidence for the convergence of judgmental and statistical procedures was found in the unusually high proportion of DIF items within one of the classifications and in the results of the reanalysis of this group of items. Index terms: differential item functioning, item bias, Mantel-Haenszel statistic, test bias.
On problems encountered using decision theory to set cutoff scores
(1984) De Gruijter, Dato N. M.; Hambleton, Ronald K.
In the decision-theoretic approach to determining a cutoff score, the cutoff score chosen is that which maximizes expected utility of pass/fail decisions. This approach is not without its problems. In this paper several of these problems are considered: inaccurate parameter estimates, choice of test model and consequences, choice of subpopulations, optimal cutoff scores on various occasions, and cutoff scores as targets. It is suggested that these problems will need to be overcome and/or understood more thoroughly before the full potential of the decision-theoretic approach can be realized in practice.
Reply to van der Linden's "Thoughts on the use of decision theory to set cutoff scores"
(1984) De Gruijter, Dato N. M.; Hambleton, Ronald K.

University Digital Conservancy

Browse by Author

Browsing by Author "Hambleton, Ronald K."