Applied Psychological Measurement, Volume 14, 1990

Persistent link for this collection

https://hdl.handle.net/11299/103305

Browse

Now showing 1 - 20 of 31

Fitting a polytomous item response model to Likert-type data
(1990) Muraki, Eiji
This study examined the application of the MML-EM algorithm to the parameter estimation problems of the normal ogive and logistic polytomous response models for Likert-type items. A rating-scale model was developed based on Samejima’s (1969) graded response model. The graded response model includes a separate slope parameter for each item and an item response parameter. In the rating-scale model, the item response parameter is resolved into two parameters: the item location parameter, and the category threshold parameter characterizing the boundary between response categories. For a Likert-type questionnaire, where a single scale is employed to elicit different responses to the items, this item response model is expected to be more useful for analysis because the item parameters can be estimated separately from the threshold parameters associated with the points on a single Likert scale. The advantages of this type of model are shown by analyzing simulated data and data from the General Social Surveys. Index terms: EM algorithm, General Social Surveys, graded response model, item response model, Likert scale, marginal maximum likelihood, polytomous item response model, rating-scale model.
On the construct validity of multiple-choice items for reading comprehension
(1990) Van den Bergh, Huub
In this study 590 third-grade students took one of four reading comprehension tests with either multiple-choice items or open-ended items. Each also took 32 tests indicating 16 semantic Structure-of-Intellect (si) abilities. Four conditions or groups were distinguished on the basis of the reading comprehension tests. The four 33 x 33 correlation matrices were analyzed simultaneously with a four-group LISREL model. The 16 intellectual abilities explained approximately 62% of the variance in true reading comprehension scores. None of the SI abilities proved to be differentially related to item type. Therefore, it was concluded that item type for reading comprehension is congeneric with respect to the SI abilities measured. Index terms: construct validity, item format, free response, reading comprehension, Structure-of-Intellect model.
Fitting the two-parameter model to personality data
(1990) Reise, Steven P.; Waller, Niels G.
The Multidimensional Personality Questionnaire (MPQ; Tellegen, 1982) was parameterized using the two-parameter logistic item response model. This entailed assessment of the suitability of personality data for item response analyses, including the assessment of dimensionality, monotonicity of item response, and data-model fit. The latter issue received special emphasis. Similarities and differences between maximum performance and typical performance data are discussed in relation to item response theory. Results suggest that the two-parameter model fits the MPQ data and that researchers engaged in the assessment of normal-range personality processes have much to gain from exploiting item response models. Index terms: item fit, item response theory, Multidimensional Personality Questionnaire, personality measurement, two-parameter model. Within the family of item response models, the
Some observations on the metric of PC-BILOG results
(1990) Baker, Frank B.
The computer program PC-BILOG uses the estimated posterior θ distribution to establish the location and metric of the θ scale. This approach to solving the identification problem has not been examined extensively. Consequently, this study investigated the equating of PC-BILOG results to an underlying metric when a two-parameter IRT model was used. The simulation results showed that the means of the estimated item and θ parameters generally were insensitive to characteristics of the prior distribution on the item discriminations. The finding of greatest interest was that the PC-BILOG procedures preserved the variability of true θ distributions having small variances while standardizing the variability of those having large variances. However, in both cases the results could be equated to the true metric using existing techniques. Index terms: ability metric, Bayesian estimation, BILOG, equating, item response theory, prior distributions.
Implications of three causal models for the measurement of halo error
(1990) Fisicaro, Sebastiano A.; Lance, Charles E.
The appropriateness of a traditional correlational measure of halo error (the difference between dimensional rating intercorrelations and dimensional true score intercorrelations) is reexamined in the context of three causal models of halo error. Mathematical derivations indicate that the traditional correlational measure typically will underestimate halo error in ratings and can suggest no halo error or even "negative" halo error when positive halo error actually occurs. A corrected correlational measure is derived that avoids these problems, and the traditional and corrected measures are compared empirically. Results suggest that use of the traditional correlational measure of halo error be discontinued. Index terms: halo, halo effect, halo error, performance ratings, rating accuracy, rating errors.
Theoretical and empirical comparison of the Mokken and the Rasch approach to IRT
(1990) Meijer, Rob R.; Sijtsma, Klaas; Smid, Nico G.
The Mokken model of monotone homogeneity, the Mokken model of double monotonicity, and the Rasch model are theoretically and empirically compared. These models are compared with respect to restrictiveness to empirical test data, properties of the scale, and accuracy of measurement. Application of goodness-of-fit procedures to empirical data largely confirmed the expected order of the models according to restrictiveness: Almost all items were in concordance with the model of monotone homogeneity, and fewer items complied with the model of double monotonicity and the Rasch model. The model of monotone homogeneity was found to be a suitable alternative to more restrictive models for basic testing applications; more sophisticated applications, such as equating and adaptive testing, appear to require the use of parametric models. Index terms: goodness-of-fit, item response theory, measurement properties, Mokken model, Rasch model.
A structural theory of spatial abilities
(1990) Guttman, Ruth; Epstein, Elizabeth E.; Amir, Marianne; Guttman, Louis
A cylindrical-wedge model is proposed to represent the correlational structure of a variety of spatial ability tests. The model corresponds to the design of the tests’ content, according to three facets: (1) type of rule task, (2) dimensionality of the test items, and (3) need to mentally rotate test objects in space. Additional facets are suggested to refine the theoretical and empirical structure. The model emphasizes regionality for representing interrelationships as an alternative to factor analytic models which seek meaningful reference axes. The axis approach has not supplied an unambiguous theory that unifies content classification with the empirical structure of spatial abilities; it is also technically more awkward and less parsimonious than the regional approach. This paper advances theory and data analysis in the field of spatial ability by providing a unified conceptual framework that can be refined and expanded systematically, and that serves as an actual experimental design that can be easily executed by other workers in the field. Existing data are shown to support the regional cylindrical-wedge model. Index terms: facet theory, factor analysis, intelligence, mapping sentence, Smallest Space Analysis, spatial ability
Test construction by means of linear programming
(1990) De Gruijter, Dato N. M.
The use of linear programming in the selection of test items entails setting a target information value for several ability levels, then constructing a test of minimum length that satisfies the constraints given by the target values. In the present paper the case of the uniform target is reconsidered. The dependency of item selection on item pool characteristics is demonstrated, and the relevance of uniform targets for test construction and the applicability of linear programming for test construction are discussed. Index terms: item response theory, item selection, linear programming, test length.
Estimation problems in the block-diagonal model of the multitrait-multimethod matrix
(1990) Brannick, Michael T.; Spector, Paul E.
The most popular method used to analyze the multitrait-multimethod (MTMM) matrix has been confirmatory factor analysis (CFA). The block-diagonal model, in which trait effects, trait correlations, method effects, and method correlations are simultaneously estimated is examined in detail. Analysis of published data from 18 correlation matrices showed estimation problems in all but one case. Simulations were used to show how identification and specification difficulties may account for these problems. Even trivial misspecification of a single parameter can prevent program convergence. These problems render the CFA block-diagonal approach to analyzing MTMM data less useful than has generally been thought. Index terms: construct validity, covariance structure modeling, factor analysis, multitrait-multimethod matrix, parameter estimation in confirmatory factor analysis.
Rasch models in latent classes: An integration of two approaches to item analysis
(1990) Rost, Jürgen
A model is proposed that combines the theoretical strength of the Rasch model with the heuristic power of latent class analysis. It assumes that the Rasch model holds for all persons within a latent class, but it allows for different sets of item parameters between the latent classes. An estimation algorithm is outlined that gives conditional maximum likelihood estimates of item parameters for each class. No a priori assumption about the item order in the latent classes or the class sizes is required. Application of the model is illustrated, both for simulated data and for real data. Index terms: conditional likelihood, EM algorithm, latent class analysis, Rasch model.
Tree versus geometric representation of tests and items
(1990) Beller, Michal
Factor-analytic techniques and multidimensional scaling models are the traditional ways of representing the interrelations among tests and items. Both can be classified as geometric approaches. This study attempted to broaden the scope of models traditionally used, and to apply an additive tree model (ADDTREE) that belongs to the family of network models. Correlation matrices were obtained from three studies and were analyzed using two representation models: Smallest Space Analysis (SSA), which is a multidimensional scaling model, and ADDTREE. The results of both analyses were compared for the two criteria of goodness of fit and interpretability. To enable a comparison with the more traditional factor-analytic approach, the data were also subjected to principal components analyses. ADDTREE fared better in both comparisons. Moreover, ADDTREE lends itself readily to an interpretation in terms of hierarchical cluster structure, whereas it is difficult to interpret SSA’s dimensions. ADDTREE’S close fit to the data and its coherence of presentation make it a convenient means of representing tests and items. Index terms: additive trees, ADDTREE, factor analysis, hierarchical clustering, multidimensional scaling, Smallest Space Analysis.
individual differences in unfolding preference data: A restricted latent class approach
(1990) Böckenholt, Ulf; Böckenholt, Ingo
A latent class scaling approach is presented for modeling paired comparison and "pick-any/t" data obtained in a preference study. Although the latent class part of the model identifies homogeneous subgroups that are characterized by their choice probabilities for a set of alternatives, the scaling part of the model describes the single peakedness structure of the choice data. Procedures are suggested for examining the unfolding structure in an unrestricted latent class solution. Two applications are presented to illustrate the technique. In the first application, scaling solutions obtained from a latent class scaling model and a marginal maximum likelihood latent trait model are compared. Index terms: latent class analysis, paired comparison data, pick any/t data, unfolding models.
Bias and the effect of priors in Bayesian estimation of parameters of item response models
(1990) Gifford, Janice A.; Swaminathan, Hariharan
The effectiveness of a Bayesian approach to the estimation problem in item response models has been sufficiently documented in recent years. Although research has indicated that Bayesian estimates, in general, are more accurate than joint maximum likelihood (JML) estimates, the effect of choice of priors on the Bayesian estimates is not well known. Moreover, the extent to which the Bayesian estimates are biased in comparison with JML estimates is not known. The effect of priors and the amount of bias in Bayesian estimates is examined in this paper through simulation studies. It is shown that different specifications of prior information have relatively modest effects on the Bayesian estimates. For small samples, it is shown that the Bayesian estimates are less biased than their JML counterparts. Index terms: accuracy, Bayesian estimates, bias, item response models, joint maximum likelihood estimates, priors.
Effect of scale adjustment on the comparison of item and ability parameters
(1990) Liou, Michelle
The standardized mean-squared difference (SMSD) has been used for summarizing the bias of parameter estimates in the three-parameter logistic (3PL) model. Due to the indeterminacy problem of the 3PL model, researchers must select a common scale for comparing the theoretical and estimated parameters. The use of different scales can yield noncomparable SMSD values, which in turn can affect the comparison of bias between different parameters. This research used three methods for selecting the common scale. Through a simulation,the three scaling methods were used to numerically demonstrate their effect on SMSD values. Index terms: equating, indeterminacy problem, Samejima scale, standardized mean-squared difference, Stocking and Lord scale, three-parameter logistic model.
Estimating item and ability parameters in homogeneous tests with the person characteristic function
(1990) Carroll, John B.
On the basis of monte carlo runs, in which item response data were generated for a variety of test characteristics, procedures for estimating item and ability parameters for homogeneous, unidimensional tests are developed on the assumption that values of the slope parameter a and the guessing parameter c are constant over items. The procedures focus on estimates of the a parameter, regarded as an important statistic for characterizing an ability. This parameter is estimated from person characteristic functions for different levels of the total raw score distribution. The procedures can be applied to datasets with relatively small or very large Ns and with either relatively small or large numbers of items. They are illustrated with data from several cognitive ability tests. Index terms: cognitive ability tests, homogeneous tests, item parameter estimation, item response theory, person characteristic function.
Problems in the measurement of latent variables in structural equations causal models
(1990) Cohen, Patricia; Cohen, Jacob; Teresi, Jeanne; Marchi, Margaret L.; Velez, C. Noemi
Some problems in the measurement of latent variables in structural equations causal models are presented, with examples from recent empirical studies. Latent variables that are theoretically the source of correlation among the empirical indicators are differentiated from unmeasured variables that are related to the empirical indicators for other reasons. It is pointed out that these should also be represented by different analytical models, and that much published research has treated this distinction as if it had no analytic consequences. The connection between this theoretical distinction and disattenuation effects in latent variable models is shown, and problems with these estimates are discussed. Finally, recommendations are made for decisions about whether and how to measure latent variables when manifest variables are potentially available. Index terms: causal models, disattenuation, emergent variables, latent variable measurement, latent variables, structural equations modeling.
Some contrasts between maximum likelihood factor analysis and alpha factor analysis
(1990) Kaiser, Henry F.; Derflinger, Gerhard
The fundamental mathematical model of Thurstone’s common factor analysis is reviewed. The basic covariance matrices of maximum likelihood factor analysis (MLFA) and alpha factor analysis (AFA) are presented. Putting aside the principles on which they are based, these two methods are compared in terms of a number of computational and scaling contrasts following from the application of their respective developments. The paper concludes with a discussion of the number-of-factors problem, the weighting problem in MLFA and AFA, and possible bases for a choice between the two. Index terms: alpha factor analysis, common factor analysis, maximum likelihood factor analysis, number of common factors, scaling and weighting in common factor analysis.
Improving IRT item bias detection with iterative linking and ability scale purification
(1990) Park, Dong-gun; Lautenschlager, Gary J.
The effectiveness of several iterative methods of item response theory (IRT) item bias detection was examined in a simulation study. The situations employed were based on biased items created using a two-dimensional IRT model. Previous research demonstrated that the non-iterative application of some IRT parameter linking procedures produced unsatisfactory results in a simulation study involving unidirectional item bias. A modified form of Drasgow’s iterative item parameter linking method and an adaptation of Lord’s test purification procedure were examined in conditions that simulated unidirectional and mixed-directional forms of item bias. The results illustrate that iterative linking holds promise for differentiating biased from unbiased items under several item bias conditions. In addition, a combination of methods, involving cycles of iterative linking followed by ability scale purification, was found to be even more effective than iterative linking alone. This combination of procedures totally eliminated false positive misidentifications for the most pervasive item bias condition, and false negative misidentifications were also reduced. Combining iterative linking with ability scale purification appears to be a viable method for analyzing multidimensional IRT data with unidimensional IRT item-bias methods. Index terms: ability scale purification, item bias, item response theory, iterative linking, iterative methods, metric linking, multidimensional IRT model.
A generative analysis of a three-dimensional spatial task
(1990) Bejar, Isaac I.
The feasibility of incorporating research results from cognitive science into the modeling of performance on psychometric tests and the construction of test items is considered, particularly the feasibility of modeling performance on a three-dimensional rotation task within the context of item response theory (IRT). Three-dimensional items were selected because of the rich literature on the mental models that are used in their solution. An 80-item, three-dimensional rotation test was constructed. An inexpensive computer system was also developed to administer the test and record performance, including response-time data. Data were collected on high school juniors and seniors. As expected, angular disparity was a potent determinant of item difficulty. The applicability of IRT to these data was investigated by dichotomizing response time at increasing elapsed times, and applying standard item parameter estimation procedures. It is concluded that this approach to psychometric modeling, which explicitly incorporates information on the mental models examinees use in solving an item, is workable and important for future developments in psychometrics. Index terms: cognitive psychology, continuous response, item response theory, mental rotation, response latency.
Determining the significance of estimated signed and unsigned areas between two item response functions
(1990) Raju, Nambury S.
Asymptotic sampling distributions (means and variances) of estimated signed and unsigned areas between two item response functions (IRFS) are presented for the Rasch model, the two-parameter model, and the three-parameter model with fixed lower asymptotes. In item bias or differential item functioning research, it may be of interest to determine whether the estimated signed and unsigned areas between IRFS calibrated with two different groups are significantly different from 0. The usefulness of these sampling distributions in this context is discussed and illustrated. More empirical research with the proposed significance tests is necessary. Index terms: asymptotic mean and variance, differential item functioning, item bias, item response functions, item response theory.