Applied Psychological Measurement, Volume 14, 1990

Persistent link for this collection

https://hdl.handle.net/11299/103305

Search within Applied Psychological Measurement, Volume 14, 1990

Browse

Now showing 1 - 20 of 31

Implications of three causal models for the measurement of halo error
(1990) Fisicaro, Sebastiano A.; Lance, Charles E.
The appropriateness of a traditional correlational measure of halo error (the difference between dimensional rating intercorrelations and dimensional true score intercorrelations) is reexamined in the context of three causal models of halo error. Mathematical derivations indicate that the traditional correlational measure typically will underestimate halo error in ratings and can suggest no halo error or even "negative" halo error when positive halo error actually occurs. A corrected correlational measure is derived that avoids these problems, and the traditional and corrected measures are compared empirically. Results suggest that use of the traditional correlational measure of halo error be discontinued. Index terms: halo, halo effect, halo error, performance ratings, rating accuracy, rating errors.
Longitudinal factor score estimation using the Kalman filter
(1990) Oud, Johan H.; Van den Bercken, John H.; Essers, Raymond J.
The advantages of the Kalman filter as a factor score estimator in the presence of longitudinal data are described. Because the Kalman filter presupposes the availability of a dynamic state space model, the state space model is reviewed first, and it is shown to be translatable into the LISREL model. Several extensions of the LISREL model specification are discussed in order to enhance the applicability of the Kalman filter for behavioral research data. The Kalman filter and its main properties are summarized. Relationships are shown between the Kalman filter and two well-known cross-sectional factor score estimators: the regression estimator, and the Bartlett estimator. The indeterminacy problem of factor scores is also discussed in the context of Kalman filtering, and the differences are described between Kalman filtering on the basis of a zero-means and a structured-means LISREL model. By using a structured-means LISREL model, the Kalman filter is capable of estimating absolute latent developmental curves. An educational research example is presented. Index terms: factor score estimation, indeterminacy of factor scores, Kalman filter, L,ISREL longitudinal LISREL modeling, longitudinal factor analysis, state space modeling.
A method for the age standardization of test scores
(1990) Schagen, I. P.
A procedure is presented to generate standardized scores from raw test data that are, as far as possible, age independent and normally distributed. The model is fitted to the percentile points of the raw score distribution, and assumes a linear trend of each percentile with age. The fitted slopes can be constant or can vary quadratically with the percentiles. A nonlinear transformation of the data is also possible to allow for "ceiling effects." These models are described and the methods used to fit them to test data are discussed; examples are presented of their use in standardizing tests, and the use of the diagnostic plots produced by the program are discussed. Index terms: age standardization, linear regression, nonlinear regression, nonparallel regression, parallel linear regression, percentiles, score transformation.
Using Bayesian decision theory to design a computerized mastery test
(1990) Lewis, Charles; Sheehan, Kathleen
A theoretical framework for mastery testing based on item response theory and Bayesian decision theory is described. The idea of sequential testing is developed, with the goal of providing shorter tests for individuals who have clearly mastered (or clearly not mastered) a given subject and longer tests for those individuals for whom the mastery decision is not as clear-cut. In a simulated application of the approach to a professional certification examination, it is shown that average test lengths can be reduced by half without sacrificing classification accuracy. Index terms: Bayesian decision theory, computerized mastery testing, item response theory, sequential testing, variable-length tests.
The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model
(1990) Dodd, Barbara G.
Real and simulated datasets were used to investigate the effects of the systematic variation of two major variables on the operating characteristics of computerized adaptive testing (CAT) applied to instruments consisting of polychotomously scored rating scale items. The two variables studied were the item selection procedure and the stepsize method used until maximum likelihood trait estimates could be calculated. The findings suggested that (1) item pools that consist of as few as 25 items may be adequate for CAT; (2) the variable stepsize method of preliminary trait estimation produced fewer cases of nonconvergence than the use of a fixed stepsize procedure; and (3) the scale value item selection procedure used in conjunction with a minimum standard error stopping rule outperformed the information item selection technique used in conjunction with a minimum information stopping rule in terms of the frequencies of nonconvergent cases, the number of items administered, and the correlations of CAT θ estimates with full scale estimates and known θ values. The implications of these findings for implementing CAT with rating scale items are discussed. Index terms: adaptive testing, attitude measurement, computerized adaptive testing, item response theory, rating scale model.
A cluster-based method for test construction
(1990) Boekkooi-Timminga, Ellen
Several methods for optimal test construction from item banks have recently been proposed using information functions. The main problem with these methods is the large amount of time required to identify an optimal test. In this paper, a new method is presented for the Rasch model that considers groups of interchangeable items, instead of individual items. The process of item clustering is described, the cluster-based test construction model is outlined, and the computational procedure and results are given. Results indicate that this method produces accurate results in small amounts of time. Index terms: information functions, item banking, item response theory, linear programming, test construction.
Estimation problems in the block-diagonal model of the multitrait-multimethod matrix
(1990) Brannick, Michael T.; Spector, Paul E.
The most popular method used to analyze the multitrait-multimethod (MTMM) matrix has been confirmatory factor analysis (CFA). The block-diagonal model, in which trait effects, trait correlations, method effects, and method correlations are simultaneously estimated is examined in detail. Analysis of published data from 18 correlation matrices showed estimation problems in all but one case. Simulations were used to show how identification and specification difficulties may account for these problems. Even trivial misspecification of a single parameter can prevent program convergence. These problems render the CFA block-diagonal approach to analyzing MTMM data less useful than has generally been thought. Index terms: construct validity, covariance structure modeling, factor analysis, multitrait-multimethod matrix, parameter estimation in confirmatory factor analysis.
Effect of scale adjustment on the comparison of item and ability parameters
(1990) Liou, Michelle
The standardized mean-squared difference (SMSD) has been used for summarizing the bias of parameter estimates in the three-parameter logistic (3PL) model. Due to the indeterminacy problem of the 3PL model, researchers must select a common scale for comparing the theoretical and estimated parameters. The use of different scales can yield noncomparable SMSD values, which in turn can affect the comparison of bias between different parameters. This research used three methods for selecting the common scale. Through a simulation,the three scaling methods were used to numerically demonstrate their effect on SMSD values. Index terms: equating, indeterminacy problem, Samejima scale, standardized mean-squared difference, Stocking and Lord scale, three-parameter logistic model.
Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions
(1990) Seong, Tae-je
The sensitivity of marginal maximum likelihood estimation of item and ability (θ) parameters was examined when the prior θ distributions are not matched to the underlying θ distributions. Thirty sets of 45-item test data were generated by specification of three types of underlying θ distributions. They were then analyzed with PC-BILOG. Appropriate specification of the prior θ distribution increased the accuracy of estimation for item and θ parameters when the sample size was large. With a small dataset, the appropriate specification of the prior increased the accuracy of θ parameter estimation, but it did not have that effect on item parameter estimation. Only with a large dataset and matched underlying and prior θ distributions did increasing the number of quadrature points improve the accuracy of estimation of the item parameters. However, the accuracy of θ estimation was increased by increasing the number of quadrature points, regardless of sample size and appropriateness of the prior θ distribution. The number of examinees had an important effect on the accuracy of item parameter estimation. Index terms: ability distribution, BILOG, item response theory, marginal maximum likelihood estimation, parameter estimation, quadrature points.
Theoretical and empirical comparison of the Mokken and the Rasch approach to IRT
(1990) Meijer, Rob R.; Sijtsma, Klaas; Smid, Nico G.
The Mokken model of monotone homogeneity, the Mokken model of double monotonicity, and the Rasch model are theoretically and empirically compared. These models are compared with respect to restrictiveness to empirical test data, properties of the scale, and accuracy of measurement. Application of goodness-of-fit procedures to empirical data largely confirmed the expected order of the models according to restrictiveness: Almost all items were in concordance with the model of monotone homogeneity, and fewer items complied with the model of double monotonicity and the Rasch model. The model of monotone homogeneity was found to be a suitable alternative to more restrictive models for basic testing applications; more sophisticated applications, such as equating and adaptive testing, appear to require the use of parametric models. Index terms: goodness-of-fit, item response theory, measurement properties, Mokken model, Rasch model.
Rasch models in latent classes: An integration of two approaches to item analysis
(1990) Rost, Jürgen
A model is proposed that combines the theoretical strength of the Rasch model with the heuristic power of latent class analysis. It assumes that the Rasch model holds for all persons within a latent class, but it allows for different sets of item parameters between the latent classes. An estimation algorithm is outlined that gives conditional maximum likelihood estimates of item parameters for each class. No a priori assumption about the item order in the latent classes or the class sizes is required. Application of the model is illustrated, both for simulated data and for real data. Index terms: conditional likelihood, EM algorithm, latent class analysis, Rasch model.
individual differences in unfolding preference data: A restricted latent class approach
(1990) Böckenholt, Ulf; Böckenholt, Ingo
A latent class scaling approach is presented for modeling paired comparison and "pick-any/t" data obtained in a preference study. Although the latent class part of the model identifies homogeneous subgroups that are characterized by their choice probabilities for a set of alternatives, the scaling part of the model describes the single peakedness structure of the choice data. Procedures are suggested for examining the unfolding structure in an unrestricted latent class solution. Two applications are presented to illustrate the technique. In the first application, scaling solutions obtained from a latent class scaling model and a marginal maximum likelihood latent trait model are compared. Index terms: latent class analysis, paired comparison data, pick any/t data, unfolding models.
Using the circular equating paradigm for comparison of linear equating models
(1990) Gafni, Naomi; Melamed, Estela
Equating error was estimated using the same test by three linear equating methods in three paradigms: (1) single-link equating of a test to itself, in which a test was administered on two different dates and the later administration was equated to the earlier administration ; (2) circular equating through a chain, starting and ending at the same test; and (3) pseudo-circular equating, in which a test was equated to itself as in the first approach through equating chains containing a different number of links as in the second approach. The mean difference between the actual scores and the equated scores, as well as the root mean square of this difference, were used as the criterion measures for equating error. The results suggested a superiority of the Tucker method for the conventional circular equating chain, and the Levine and VCI methods yielded smaller errors in about half the equating chains for the pseudo-circular chain. Unexpectedly, there was not found to be a clear relationship between the number of links in the equating chain and the resulting error. Index terms: circular equating, equating chains, equating error, equating methods, linear equating.
A generative analysis of a three-dimensional spatial task
(1990) Bejar, Isaac I.
The feasibility of incorporating research results from cognitive science into the modeling of performance on psychometric tests and the construction of test items is considered, particularly the feasibility of modeling performance on a three-dimensional rotation task within the context of item response theory (IRT). Three-dimensional items were selected because of the rich literature on the mental models that are used in their solution. An 80-item, three-dimensional rotation test was constructed. An inexpensive computer system was also developed to administer the test and record performance, including response-time data. Data were collected on high school juniors and seniors. As expected, angular disparity was a potent determinant of item difficulty. The applicability of IRT to these data was investigated by dichotomizing response time at increasing elapsed times, and applying standard item parameter estimation procedures. It is concluded that this approach to psychometric modeling, which explicitly incorporates information on the mental models examinees use in solving an item, is workable and important for future developments in psychometrics. Index terms: cognitive psychology, continuous response, item response theory, mental rotation, response latency.
A structural theory of spatial abilities
(1990) Guttman, Ruth; Epstein, Elizabeth E.; Amir, Marianne; Guttman, Louis
A cylindrical-wedge model is proposed to represent the correlational structure of a variety of spatial ability tests. The model corresponds to the design of the tests’ content, according to three facets: (1) type of rule task, (2) dimensionality of the test items, and (3) need to mentally rotate test objects in space. Additional facets are suggested to refine the theoretical and empirical structure. The model emphasizes regionality for representing interrelationships as an alternative to factor analytic models which seek meaningful reference axes. The axis approach has not supplied an unambiguous theory that unifies content classification with the empirical structure of spatial abilities; it is also technically more awkward and less parsimonious than the regional approach. This paper advances theory and data analysis in the field of spatial ability by providing a unified conceptual framework that can be refined and expanded systematically, and that serves as an actual experimental design that can be easily executed by other workers in the field. Existing data are shown to support the regional cylindrical-wedge model. Index terms: facet theory, factor analysis, intelligence, mapping sentence, Smallest Space Analysis, spatial ability
Determining the significance of estimated signed and unsigned areas between two item response functions
(1990) Raju, Nambury S.
Asymptotic sampling distributions (means and variances) of estimated signed and unsigned areas between two item response functions (IRFS) are presented for the Rasch model, the two-parameter model, and the three-parameter model with fixed lower asymptotes. In item bias or differential item functioning research, it may be of interest to determine whether the estimated signed and unsigned areas between IRFS calibrated with two different groups are significantly different from 0. The usefulness of these sampling distributions in this context is discussed and illustrated. More empirical research with the proposed significance tests is necessary. Index terms: asymptotic mean and variance, differential item functioning, item bias, item response functions, item response theory.
Problems in the measurement of latent variables in structural equations causal models
(1990) Cohen, Patricia; Cohen, Jacob; Teresi, Jeanne; Marchi, Margaret L.; Velez, C. Noemi
Some problems in the measurement of latent variables in structural equations causal models are presented, with examples from recent empirical studies. Latent variables that are theoretically the source of correlation among the empirical indicators are differentiated from unmeasured variables that are related to the empirical indicators for other reasons. It is pointed out that these should also be represented by different analytical models, and that much published research has treated this distinction as if it had no analytic consequences. The connection between this theoretical distinction and disattenuation effects in latent variable models is shown, and problems with these estimates are discussed. Finally, recommendations are made for decisions about whether and how to measure latent variables when manifest variables are potentially available. Index terms: causal models, disattenuation, emergent variables, latent variable measurement, latent variables, structural equations modeling.
Test construction by means of linear programming
(1990) De Gruijter, Dato N. M.
The use of linear programming in the selection of test items entails setting a target information value for several ability levels, then constructing a test of minimum length that satisfies the constraints given by the target values. In the present paper the case of the uniform target is reconsidered. The dependency of item selection on item pool characteristics is demonstrated, and the relevance of uniform targets for test construction and the applicability of linear programming for test construction are discussed. Index terms: item response theory, item selection, linear programming, test length.
Improving IRT item bias detection with iterative linking and ability scale purification
(1990) Park, Dong-gun; Lautenschlager, Gary J.
The effectiveness of several iterative methods of item response theory (IRT) item bias detection was examined in a simulation study. The situations employed were based on biased items created using a two-dimensional IRT model. Previous research demonstrated that the non-iterative application of some IRT parameter linking procedures produced unsatisfactory results in a simulation study involving unidirectional item bias. A modified form of Drasgow’s iterative item parameter linking method and an adaptation of Lord’s test purification procedure were examined in conditions that simulated unidirectional and mixed-directional forms of item bias. The results illustrate that iterative linking holds promise for differentiating biased from unbiased items under several item bias conditions. In addition, a combination of methods, involving cycles of iterative linking followed by ability scale purification, was found to be even more effective than iterative linking alone. This combination of procedures totally eliminated false positive misidentifications for the most pervasive item bias condition, and false negative misidentifications were also reduced. Combining iterative linking with ability scale purification appears to be a viable method for analyzing multidimensional IRT data with unidimensional IRT item-bias methods. Index terms: ability scale purification, item bias, item response theory, iterative linking, iterative methods, metric linking, multidimensional IRT model.
The relationship of expert-system scored constrained free-response items to multiple-choice and open-ended items
(1990) Bennett, Randy Elliot; Rock, Donald A.; Braun, Henry I.; Douglas, Frye; Spohrer, James C.; Soloway, Elliot
This study examined the relationship of an expert-system scored constrained free-response item (requiring the student to debug a faulty computer program) to two other item types: (1) multiple-choice and (2) free-response (requiring production of a program). Confirmatory factor analysis was used to test the fit of a three-factor model to these data and to compare the fit of the model to three alternatives. These models were fit using two random-half samples, one given a faulty program containing one bug and the other a program with three bugs. A single-factor model best fit the data for the sample taking the one-bug constrained free response and a two-factor model fit the data somewhat better for the second sample. In addition, the factor intercorrelations showed this item type to be highly related to both the free-response and multiple-choice measures. Index terms: artificial intelligence, constructed-response items, expert-system scoring, free-response items, open-ended items.