Browsing by Author "Van der Linden, Wim J."
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item Advances in item response theory and applications: An introduction(1982) Hambleton, Ronald K.; Van der Linden, Wim J.Test theories can be divided roughly into two categories. The first is classical test theory, which dates back to Spearman’s conception of the observed test score as a composite of true and error components, and which was introduced to psychologists at the beginning of this century. Important milestones in its long and venerable tradition are Gulliksen’s Theory of Mental Tests (1950) and Lord and Novick’s Statistical Theories of Mental Test Scores (1968). The second is item response theory, or latent trait theory, as it has been called until recently. At the present time, item response theory (IRT) is having a major impact on the field of testing. Models derived from IRT are being used to develop tests, to equate scores from nonparallel tests, to investigate item bias, and to report scores, as well as to address many other pressing measurement problems (see, e.g., Hambleton, 1983; Lord, 1980). IRT differs from classical test theory in that it assumes a different relation of the test score to the variable measured by the test. Although there are parallels between models from IRT and psychophysical models formulated around the turn of the century, only in the last 10 years has IRT had any impact on psychometricians and test users. Work by Rasch (1980/1960), Fischer (1974), 9 Birnbaum (1968), ivrighi and Panchapakesan (1969), Bock (1972), and Lord (1974) has been especially influential in this turnabout; and Lazarsfeld’s pioneering work on latent structure analysis in sociology (Lazarsfeld, 1950; Lazarsfeld & Henry, 1968) has also provided impetus. One objective of this introduction is to review the conceptual differences between classical test theory and IRT. A second objective is to introduce the goals of this special issue on item response theory and the seven papers. Some basic problems with classical test theory are reviewed in the next section. Then, IRT approaches to educational and psychological measurement are presented and compared to classical test theory. The final two sections present the goals for this special issue and an outline of the seven invited papers.Item Assembling tests for the measurement of multiple traits(1996) Van der Linden, Wim J.For the measurement of multiple traits, this paper proposes assembling tests based on the targets for the (asymptotic) variance functions of the estimators of each of the traits. A linear programming model is presented that can be used to computerize the assembly process. Several cases of test assembly dealing with multidimensional traits are distinguished, and versions of the model applicable to each of these cases are discussed. An empirical example of a test assembly problem from a two-dimensional mathematics item pool is provided. Index terms: asymptotic variance functions, linear programming, multidimensional IRT, test assembly, test design.Item Binomial test models and item difficulty(1979) Van der Linden, Wim J.In choosing a binomial test model, it is important to know exactly what conditions are imposed on item difficulty. In this paper these conditions are examined for both a deterministic and a stochastic conception of item responses. It appears that they are more restrictive than is generally understood and differ for both conceptions. When the binomial model is applied to a fixed examinee, the deterministic conception imposes no conditions on item difficulty but requires instead that all items have characteristic functions of the Guttman type. In contrast, the stochastic conception allows non- Guttman items but requires that all characteristic functions must intersect at the same point, which implies equal classically defined difficulty. The beta-binomial model assumes identical characteristic functions for both conceptions, and this also implies equal difficulty. Finally, the compound binomial model entails no restrictions on item difficulty.Item The changing conception of measurement in education and psychology(1986) Van der Linden, Wim J.Since the era of Binet and Spearman, classical test theory and the ideal of the standard test have gone hand in hand, in part because both are based on the same paradigm of experimental control by manipulation and randomization. Their longevity is a consequence of this mutually beneficial symbiosis. A new type of theory and practice in testing is replacing the standard test by the test item bank, and classical test theory by item response theory. In this paper it is shown how these also reinforce and complete each other.Item Coefficients for tests from a decision theoretic point of view(1978) Van der Linden, Wim J.; Mellenbergh, Gideon J.From a decision theoretic point of view a general coefficient for tests, d, is derived. The coefficient is applied to three kinds of decision situations. First, the situation is considered in which a true score is estimated by a function of the observed score of a subject on a test (point estimation). Using the squared error loss function and Kelley’s formula for estimating the true score, it is shown that d equals the reliability coefficient from classical test theory. Second, the situation is considered in which the observed scores are split into more than two categories and different decisions are made for the categories (multiple decision). The general form of the coefficient is derived, and two loss functions suited to multiple decision situations are described. It is shown that for the loss function specifying constant losses for the various combinations of categories on the true and on the observed scores, the coefficient can be computed under the assumptions of the beta-binomial model. Third, the situation is considered in which the observed scores are split into only two categories and different decisions are made for each category (dichotomous decisions). Using a loss function that specifies constant losses for combinations of categories on the true and observed score and the assumption of an increasing regression function of t on x, it is shown that coefficient d equals Loevinger’s coefficient H between true and observed scores. The coefficient can be computed under the assumption of the beta-binomial model. Finally, it is shown that for a linear loss function and Kelley’s formula for the regression of the true score on the observed score, the coefficient equals the reliability coefficient of classical test theory.Item Decision models for use with criterion-referenced tests(1980) Van der Linden, Wim J.The problem of mastery decisions and optimizing cutoff scores on criterion-referenced tests is considered. This problem can be formalized as an (empirical) Bayes problem with decisions rules of a monotone shape. Next, the derivation of optimal cutoff scores for threshold, linear, and normal ogive loss functions is addressed, alternately using such psychometric models as the classical model, the beta-binomial, and the bivariate normal model. One important distinction made is between decisions with an internal and an external criterion. A natural solution to the problem of reliability and validity analysis of mastery decisions is to analyze with a standardization of the Bayes risk (coefficient delta). It is indicated how this analysis proceeds and how, in a number of cases, it leads to coefficients already known from classical test theory. Finally, some new lines of research are suggested along with other aspects of criterion-referenced testing that can be approached from a decision-theoretic point of view.Item An empirical Bayesian approach to item banking(1986) Van der Linden, Wim J.; Eggen, Theo J. H. M.A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayesian approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is shown how a paired-comparisons design deals with the usual incompleteness of calibration data and how the item parameters can be estimated using this design. Next, the procedure for a sequential optimization of the item parameter estimators is given, both for individuals responding to pairs of items and for item and examinee groups of any size. The paper concludes with a discussion of the choice of the first priors in the procedure and the problems involved in its generalization to other item response models.Item Estimating the parameters of Emrick's mastery testing mode(1981) Van der Linden, Wim J.Emrick’s model is a latent class or state model for mastery testing that entails a simple rule for separating masters from nonmasters with respect to a homogeneous domain of items. His method for estimating the model parameters has only restricted applicability inasmuch as it assumes a mixing parameter equal to .50 and an a priori known ratio of the two latent success probabilities. The maximum likelihood method is also available but yields an intractable system of estimation equations which can only be solved iteratively. The emphasis in this paper is on estimates to be computed by hand but nonetheless accurate enough for most practical situations. It is shown how the method of moments can be used to obtain such "quick and easy" estimates. In addition, an endpoint method is discussed that assumes that the parameters can be estimated from the tails of the sample distribution. A monte carlo experiment demonstrated that for a great variety of parameter values, test lengths, and sample sizes, the method of moments yields excellent results and is uniformly much better than the endpoint method.Item The internal and external optimality of decisions based on tests(1979) Mellenbergh, Gideon J.; Van der Linden, Wim J.In applied measurement, test scores are usually transformed to decisions. Analogous to classical test theory, the reliability of decisions has been defined as the consistency of decisions on a test and a retest or on two parallel tests. Coefficient kappa (Cohen, 1960) is used for assessing the consistency of decisions. This coefficient has been developed for assessing agreement between nominal scales. It is argued that the coefficient is not suited for assessing consistency of decisions. Moreover, it is argued that the concept consistency of decisions is not appropriate for assessing the quality of a decision procedure. It is proposed that the concept consistency of decisions be replaced by the concept optimality of the decision procedure. Two types of optimality are distinguished. The internal optimality is the risk of the decision procedure with respect to the true score the test is measuring. The external optimality is the risk of the decision procedure with respect to an external criterion. For assessing the optimality of a decision procedure, coefficient delta (van der Linden & Mellenbergh, 1978), which can be considered a standardization of the Bayes risk or expected loss, can be used. Two loss functions are dealt with: the threshold and the linear loss functions. Assuming psychometric theory, coefficient delta for internal optimality can be computed from empirical data for both the threshold and the linear loss functions. The computation of coefficient delta for external optimality needs no assumption of psychometric theory. For six tests coefficient delta as an index for internal optimality is computed for both loss functions; the results are compared with coefficient kappa for assessing the consistency of decisions with the same tests.Item IRT-based internal measures of differential functioning of items and tests(1995) Raju, Nambury S.; Van der Linden, Wim J.; Fleer, Paul F.Internal measures of differential functioning of items and tests (DFIT) based on item response theory (IRT) are proposed. Within the DFIT context, the new differential test functioning (DTF) index leads to two new measures of differential item functioning (DIF) with the following properties: (1) The compensatory DIF (CDIF) indexes for all items in a test sum to the DTF index for that test and, unlike current DIF procedures, the CDIF index for an item does not assume that the other items in the test are unbiased ; (2) the noncompensatory DIF (NCDIF) index, which assumes that the other items in the test are unbiased, is comparable to some of the IRT-based DIP indexes; and (3) CDIF and NCDIF, as well as DTF, are equally valid for polytomous and multidimensional IRT models. Monte carlo study results, comparing these indexes with Lord’s X² test, the signed area measure, and the unsigned area measure, demonstrate that the DFIT framework is accurate in assessing DTF, CDIF, and NCDIF. Index Terms: area measures of DIF, compensatory DIF, differential functioning of items and tests (DFIT), differential item functioning, differential test functioning, Lord’s X²; noncompensatory DIF, nonuniform DIF, uniform DIF.Item Optimal cutting scores using a linear loss function(1977) Van der Linden, Wim J.; Mellenbergh, Gideon J.The situation is considered in which a total score on a test is used for classifying examinees into two categories: "accepted (with scores above a cutting score on the test) and "not accepted" (with scores below the cutting score). A value on the latent variable is fixed in advance; examinees above this value are "suitable" and those below are "not suitable." Using a linear loss function, a procedure is described for computing a cutting score that minimizes the risk for the decision rule. The procedure is demonstrated with a criterion-referenced achievement test of elementary statistics administered to 167 students.Item Some thoughts on the use of decision theory to set cutoff scores: Comment on de Gruijter and Hambleton(1984) Van der Linden, Wim J.In response to an article by de Gruijter and Hambleton (1984), some thoughts on the use of decision theory for setting cutoff scores on mastery tests are presented. This paper argues that decision theory offers much more than suggested by de Gruijter and Hambleton and that an attempt at evaluating its potentials for mastery testing should address the full scale of possibilities. As for the problems de Gruijter and Hambleton have raised, some of them disappear if proper choices from decision theory are made, while others are inherent in mastery testing and will be encountered by any method of setting cutoff scores. Further, this paper points at the development of new technology to assist the mastery tester in the application of decision theory. From this an optimistic attitude towards the potentials of decision theory for mastery testing is concluded.Item A zero-one programming approach to Gulliksen's matched random subtests method(1988) Van der Linden, Wim J.; Boekkooi-Timminga, EllenGulliksen’s matched random subtests method is a graphical method to split a test into parallel test halves. The method has practical relevance because it maximizes coefficient α as a lower bound to the classical test reliability coefficient. In this paper the same problem is formulated as a zero-one programming problem, the advantage being that it can be solved by computer algorithms that already exist. It is shown how the procedure can be generalized to split tests of any length. The paper concludes with an empirical example comparing Gulliksen’s original hand-method with the zero-one programming version. Index terms: Classical test theory, Gulliksen’s matched random subtests method, Item matching, Linear programming, Parallel tests, Test reliability, Zero-one programming.