Applied Psychological Measurement, Volume 14, 1990
Persistent link for this collectionhttps://hdl.handle.net/11299/103305
Browse
Browsing Applied Psychological Measurement, Volume 14, 1990 by Issue Date
Now showing 1 - 20 of 31
- Results Per Page
- Sort Options
Item A structural theory of spatial abilities(1990) Guttman, Ruth; Epstein, Elizabeth E.; Amir, Marianne; Guttman, LouisA cylindrical-wedge model is proposed to represent the correlational structure of a variety of spatial ability tests. The model corresponds to the design of the tests’ content, according to three facets: (1) type of rule task, (2) dimensionality of the test items, and (3) need to mentally rotate test objects in space. Additional facets are suggested to refine the theoretical and empirical structure. The model emphasizes regionality for representing interrelationships as an alternative to factor analytic models which seek meaningful reference axes. The axis approach has not supplied an unambiguous theory that unifies content classification with the empirical structure of spatial abilities; it is also technically more awkward and less parsimonious than the regional approach. This paper advances theory and data analysis in the field of spatial ability by providing a unified conceptual framework that can be refined and expanded systematically, and that serves as an actual experimental design that can be easily executed by other workers in the field. Existing data are shown to support the regional cylindrical-wedge model. Index terms: facet theory, factor analysis, intelligence, mapping sentence, Smallest Space Analysis, spatial abilityItem Implications of three causal models for the measurement of halo error(1990) Fisicaro, Sebastiano A.; Lance, Charles E.The appropriateness of a traditional correlational measure of halo error (the difference between dimensional rating intercorrelations and dimensional true score intercorrelations) is reexamined in the context of three causal models of halo error. Mathematical derivations indicate that the traditional correlational measure typically will underestimate halo error in ratings and can suggest no halo error or even "negative" halo error when positive halo error actually occurs. A corrected correlational measure is derived that avoids these problems, and the traditional and corrected measures are compared empirically. Results suggest that use of the traditional correlational measure of halo error be discontinued. Index terms: halo, halo effect, halo error, performance ratings, rating accuracy, rating errors.Item Test construction by means of linear programming(1990) De Gruijter, Dato N. M.The use of linear programming in the selection of test items entails setting a target information value for several ability levels, then constructing a test of minimum length that satisfies the constraints given by the target values. In the present paper the case of the uniform target is reconsidered. The dependency of item selection on item pool characteristics is demonstrated, and the relevance of uniform targets for test construction and the applicability of linear programming for test construction are discussed. Index terms: item response theory, item selection, linear programming, test length.Item individual differences in unfolding preference data: A restricted latent class approach(1990) Böckenholt, Ulf; Böckenholt, IngoA latent class scaling approach is presented for modeling paired comparison and "pick-any/t" data obtained in a preference study. Although the latent class part of the model identifies homogeneous subgroups that are characterized by their choice probabilities for a set of alternatives, the scaling part of the model describes the single peakedness structure of the choice data. Procedures are suggested for examining the unfolding structure in an unrestricted latent class solution. Two applications are presented to illustrate the technique. In the first application, scaling solutions obtained from a latent class scaling model and a marginal maximum likelihood latent trait model are compared. Index terms: latent class analysis, paired comparison data, pick any/t data, unfolding models.Item Robustness of marginal maximum likelihood estimation in the Rasch model(1990) Zwinderman, Aeilko H.; Van den Wollenberg, Arnold L.Simulation studies examined the effect of misspecification of the latent ability (θ) distribution on the accuracy and efficiency of marginal maximum likelihood (MML) item parameter estimates and on MML statistics to test sufficiency and conditional independence. Results were compared to the conditional maximum likelihood (CML) approach. Results showed that if θ is assumed to be normally distributed when its distribution is actually skewed, MML estimators lose accuracy and efficiency when compared to CML estimators. The effects are not large, though they increase as the skewness of the number-correct score distribution increases. However, statistics to test the sufficiency and conditional independence assumptions of the Rasch model in the MML approach are very sensitive to misspecification of the θ distribution. Index terms: ability distribution, conditional likelihood, efficiency, goodness of fit, marginal likelihood, Rasch model, robustness.Item A cluster-based method for test construction(1990) Boekkooi-Timminga, EllenSeveral methods for optimal test construction from item banks have recently been proposed using information functions. The main problem with these methods is the large amount of time required to identify an optimal test. In this paper, a new method is presented for the Rasch model that considers groups of interchangeable items, instead of individual items. The process of item clustering is described, the cluster-based test construction model is outlined, and the computational procedure and results are given. Results indicate that this method produces accurate results in small amounts of time. Index terms: information functions, item banking, item response theory, linear programming, test construction.Item Standard errors of correlations adjusted for incidental selection(1990) Allen, Nancy L.; Dunbar, Stephen B.The standard error of correlations that have been adjusted for selection with commonly used formulas developed by Pearson (1903) was investigated. The major purposes of the study were (1) to provide large-sample approximations of the standard error of a correlation adjusted using the Pearson-Lawley three-variable correction formula; (2) to examine the standard errors of adjusted correlations under specific conditions; and (3) to compare various estimates of the standard errors under direct and indirect selection. Two theory-based large-sample estimates of the standard error of a correlation adjusted for indirect selection were developed using the delta method. These two estimates were compared to one another, to a bootstrap estimate, and to an empirical standard deviation of a series of adjusted correlations generated in a simulation study. The simulation study manipulated factors defined by sample size, selection ratio, underlying population distribution, and population correlations in situations that satisfied the basic assumptions of the Pearson-Lawley procedures. The results indicated that the large-sample and bootstrap estimates were very similar when the sample size was 500 and, in most cases, the simpler of the two large-sample approximations appears to offer a reasonable estimate of the standard error of an adjusted correlation without resorting to complex, computer-intensive approaches. Index terms: correlation coefficients, missing data, Pearson-Lawley corrections, selection, standard errors of correlations, validity studies.Item A comparison of item- and person-fit methods of assessing model-data fit in IRT(1990) Reise, Steven P.Many item-fit statistics have been proposed for assessing whether the responses to test items aggregated across examinees conform to IRT test models. Conversely, person-fit statistics have been proposed for assessing whether an examinee’s responses aggregated across items are congruent with a specified IRT model. Statistical procedures to assess item fit have differed from those to assess person fit. This research compared a x² item-fit index with a likelihood-based person-fit index. Eight 0,1 data matrices were simulated under the three-parameter logistic test model. Both the likelihood-based and x² fit statistics were then computed for examinees and items, and Type I and Type II error rates were analyzed. With data simulated to fit the IRT model, the x² test overidentified examinees and items as being misfitting, while the likelihood-based fit index held closer to the specified α levels. The two fit indices gave consistent (mis)fit-to-model results in 94 and 97 percent of cases for items and examinees, respectively, across simulations. Under simulated conditions of data misfit, the x² statistic detected misfit at a higher rate than the likelihood-based statistic, indicating that the x² statistic was slightly more sensitive to response pattern aberrancy. However, other considerations led to a recommendation for employing the likelihood-based index in applied fit analyses to evaluate both examinee and item model-data (mis)fit. Index terms: chi-square index, item fit, item response theory, model fit, person fit, response aberrancy.Item Using the circular equating paradigm for comparison of linear equating models(1990) Gafni, Naomi; Melamed, EstelaEquating error was estimated using the same test by three linear equating methods in three paradigms: (1) single-link equating of a test to itself, in which a test was administered on two different dates and the later administration was equated to the earlier administration ; (2) circular equating through a chain, starting and ending at the same test; and (3) pseudo-circular equating, in which a test was equated to itself as in the first approach through equating chains containing a different number of links as in the second approach. The mean difference between the actual scores and the equated scores, as well as the root mean square of this difference, were used as the criterion measures for equating error. The results suggested a superiority of the Tucker method for the conventional circular equating chain, and the Levine and VCI methods yielded smaller errors in about half the equating chains for the pseudo-circular chain. Unexpectedly, there was not found to be a clear relationship between the number of links in the equating chain and the resulting error. Index terms: circular equating, equating chains, equating error, equating methods, linear equating.Item Theoretical and empirical comparison of the Mokken and the Rasch approach to IRT(1990) Meijer, Rob R.; Sijtsma, Klaas; Smid, Nico G.The Mokken model of monotone homogeneity, the Mokken model of double monotonicity, and the Rasch model are theoretically and empirically compared. These models are compared with respect to restrictiveness to empirical test data, properties of the scale, and accuracy of measurement. Application of goodness-of-fit procedures to empirical data largely confirmed the expected order of the models according to restrictiveness: Almost all items were in concordance with the model of monotone homogeneity, and fewer items complied with the model of double monotonicity and the Rasch model. The model of monotone homogeneity was found to be a suitable alternative to more restrictive models for basic testing applications; more sophisticated applications, such as equating and adaptive testing, appear to require the use of parametric models. Index terms: goodness-of-fit, item response theory, measurement properties, Mokken model, Rasch model.Item Some contrasts between maximum likelihood factor analysis and alpha factor analysis(1990) Kaiser, Henry F.; Derflinger, GerhardThe fundamental mathematical model of Thurstone’s common factor analysis is reviewed. The basic covariance matrices of maximum likelihood factor analysis (MLFA) and alpha factor analysis (AFA) are presented. Putting aside the principles on which they are based, these two methods are compared in terms of a number of computational and scaling contrasts following from the application of their respective developments. The paper concludes with a discussion of the number-of-factors problem, the weighting problem in MLFA and AFA, and possible bases for a choice between the two. Index terms: alpha factor analysis, common factor analysis, maximum likelihood factor analysis, number of common factors, scaling and weighting in common factor analysis.Item Estimation problems in the block-diagonal model of the multitrait-multimethod matrix(1990) Brannick, Michael T.; Spector, Paul E.The most popular method used to analyze the multitrait-multimethod (MTMM) matrix has been confirmatory factor analysis (CFA). The block-diagonal model, in which trait effects, trait correlations, method effects, and method correlations are simultaneously estimated is examined in detail. Analysis of published data from 18 correlation matrices showed estimation problems in all but one case. Simulations were used to show how identification and specification difficulties may account for these problems. Even trivial misspecification of a single parameter can prevent program convergence. These problems render the CFA block-diagonal approach to analyzing MTMM data less useful than has generally been thought. Index terms: construct validity, covariance structure modeling, factor analysis, multitrait-multimethod matrix, parameter estimation in confirmatory factor analysis.Item A method for the age standardization of test scores(1990) Schagen, I. P.A procedure is presented to generate standardized scores from raw test data that are, as far as possible, age independent and normally distributed. The model is fitted to the percentile points of the raw score distribution, and assumes a linear trend of each percentile with age. The fitted slopes can be constant or can vary quadratically with the percentiles. A nonlinear transformation of the data is also possible to allow for "ceiling effects." These models are described and the methods used to fit them to test data are discussed; examples are presented of their use in standardizing tests, and the use of the diagnostic plots produced by the program are discussed. Index terms: age standardization, linear regression, nonlinear regression, nonparallel regression, parallel linear regression, percentiles, score transformation.Item A generative analysis of a three-dimensional spatial task(1990) Bejar, Isaac I.The feasibility of incorporating research results from cognitive science into the modeling of performance on psychometric tests and the construction of test items is considered, particularly the feasibility of modeling performance on a three-dimensional rotation task within the context of item response theory (IRT). Three-dimensional items were selected because of the rich literature on the mental models that are used in their solution. An 80-item, three-dimensional rotation test was constructed. An inexpensive computer system was also developed to administer the test and record performance, including response-time data. Data were collected on high school juniors and seniors. As expected, angular disparity was a potent determinant of item difficulty. The applicability of IRT to these data was investigated by dichotomizing response time at increasing elapsed times, and applying standard item parameter estimation procedures. It is concluded that this approach to psychometric modeling, which explicitly incorporates information on the mental models examinees use in solving an item, is workable and important for future developments in psychometrics. Index terms: cognitive psychology, continuous response, item response theory, mental rotation, response latency.Item Using Bayesian decision theory to design a computerized mastery test(1990) Lewis, Charles; Sheehan, KathleenA theoretical framework for mastery testing based on item response theory and Bayesian decision theory is described. The idea of sequential testing is developed, with the goal of providing shorter tests for individuals who have clearly mastered (or clearly not mastered) a given subject and longer tests for those individuals for whom the mastery decision is not as clear-cut. In a simulated application of the approach to a professional certification examination, it is shown that average test lengths can be reduced by half without sacrificing classification accuracy. Index terms: Bayesian decision theory, computerized mastery testing, item response theory, sequential testing, variable-length tests.Item The relationship of expert-system scored constrained free-response items to multiple-choice and open-ended items(1990) Bennett, Randy Elliot; Rock, Donald A.; Braun, Henry I.; Douglas, Frye; Spohrer, James C.; Soloway, ElliotThis study examined the relationship of an expert-system scored constrained free-response item (requiring the student to debug a faulty computer program) to two other item types: (1) multiple-choice and (2) free-response (requiring production of a program). Confirmatory factor analysis was used to test the fit of a three-factor model to these data and to compare the fit of the model to three alternatives. These models were fit using two random-half samples, one given a faulty program containing one bug and the other a program with three bugs. A single-factor model best fit the data for the sample taking the one-bug constrained free response and a two-factor model fit the data somewhat better for the second sample. In addition, the factor intercorrelations showed this item type to be highly related to both the free-response and multiple-choice measures. Index terms: artificial intelligence, constructed-response items, expert-system scoring, free-response items, open-ended items.Item Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions(1990) Seong, Tae-jeThe sensitivity of marginal maximum likelihood estimation of item and ability (θ) parameters was examined when the prior θ distributions are not matched to the underlying θ distributions. Thirty sets of 45-item test data were generated by specification of three types of underlying θ distributions. They were then analyzed with PC-BILOG. Appropriate specification of the prior θ distribution increased the accuracy of estimation for item and θ parameters when the sample size was large. With a small dataset, the appropriate specification of the prior increased the accuracy of θ parameter estimation, but it did not have that effect on item parameter estimation. Only with a large dataset and matched underlying and prior θ distributions did increasing the number of quadrature points improve the accuracy of estimation of the item parameters. However, the accuracy of θ estimation was increased by increasing the number of quadrature points, regardless of sample size and appropriateness of the prior θ distribution. The number of examinees had an important effect on the accuracy of item parameter estimation. Index terms: ability distribution, BILOG, item response theory, marginal maximum likelihood estimation, parameter estimation, quadrature points.Item The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model(1990) Dodd, Barbara G.Real and simulated datasets were used to investigate the effects of the systematic variation of two major variables on the operating characteristics of computerized adaptive testing (CAT) applied to instruments consisting of polychotomously scored rating scale items. The two variables studied were the item selection procedure and the stepsize method used until maximum likelihood trait estimates could be calculated. The findings suggested that (1) item pools that consist of as few as 25 items may be adequate for CAT; (2) the variable stepsize method of preliminary trait estimation produced fewer cases of nonconvergence than the use of a fixed stepsize procedure; and (3) the scale value item selection procedure used in conjunction with a minimum standard error stopping rule outperformed the information item selection technique used in conjunction with a minimum information stopping rule in terms of the frequencies of nonconvergent cases, the number of items administered, and the correlations of CAT θ estimates with full scale estimates and known θ values. The implications of these findings for implementing CAT with rating scale items are discussed. Index terms: adaptive testing, attitude measurement, computerized adaptive testing, item response theory, rating scale model.Item Determining the significance of estimated signed and unsigned areas between two item response functions(1990) Raju, Nambury S.Asymptotic sampling distributions (means and variances) of estimated signed and unsigned areas between two item response functions (IRFS) are presented for the Rasch model, the two-parameter model, and the three-parameter model with fixed lower asymptotes. In item bias or differential item functioning research, it may be of interest to determine whether the estimated signed and unsigned areas between IRFS calibrated with two different groups are significantly different from 0. The usefulness of these sampling distributions in this context is discussed and illustrated. More empirical research with the proposed significance tests is necessary. Index terms: asymptotic mean and variance, differential item functioning, item bias, item response functions, item response theory.Item Tree versus geometric representation of tests and items(1990) Beller, MichalFactor-analytic techniques and multidimensional scaling models are the traditional ways of representing the interrelations among tests and items. Both can be classified as geometric approaches. This study attempted to broaden the scope of models traditionally used, and to apply an additive tree model (ADDTREE) that belongs to the family of network models. Correlation matrices were obtained from three studies and were analyzed using two representation models: Smallest Space Analysis (SSA), which is a multidimensional scaling model, and ADDTREE. The results of both analyses were compared for the two criteria of goodness of fit and interpretability. To enable a comparison with the more traditional factor-analytic approach, the data were also subjected to principal components analyses. ADDTREE fared better in both comparisons. Moreover, ADDTREE lends itself readily to an interpretation in terms of hierarchical cluster structure, whereas it is difficult to interpret SSA’s dimensions. ADDTREE’S close fit to the data and its coherence of presentation make it a convenient means of representing tests and items. Index terms: additive trees, ADDTREE, factor analysis, hierarchical clustering, multidimensional scaling, Smallest Space Analysis.