Applied Psychological Measurement, Volume 13, 1989
Persistent link for this collection
Browse
Browsing Applied Psychological Measurement, Volume 13, 1989 by Issue Date
Now showing 1 - 20 of 34
Results Per Page
Sort Options
Item Confirmatory factor analyses of multitrait-multimethod data: Many problems and a few solutions(1989) Marsh, Herbert W.During the last 15 years there has been a steady increase in the popularity and sophistication of the confirmatory factor analysis (CFA) approach to multitrait-multimethod (MTMM) data. This approach, however, incurs some important problems, the most serious being the ill-defined solutions that plague MTMM studies and the assumption that so-called method factors reflect primarily the influence of method effects. In three different MTMM studies, ill-defined solutions were frequent and alternative parameterizations designed to solve this problem tended to mask the symptoms instead of eliminating the problem. More importantly, so-called method factors apparently represented trait variance in addition to, or instead of, method variance for at least some models in all three studies. Further support for this counterinterpretation of method factors was found when external validity criteria were added to the MTMM models and correlated with trait and so-called method factors. This problem, when it exists, invalidates the traditional interpretation of trait and method factors and the comparison of different MTMM models. A new specification of method effects as correlated uniquenesses instead of method factors was less prone to ill-defined solutions and, apparently, to the confounding of trait and method effects. Index terms: confirmatory factor analysis, construct validity, convergent validity, correlated uniquenesses, discriminant validity, empirical underidentification, LISREL, method effects, multitrait-multimethod analysis.Item A consumer's guide to LOGIST and BILOG(1989) Mislevy, Robert J.; Stocking, Martha L.Since its release in 1976, Wingersky, Barton, and Lord’s (1982) LOGIST has been the most widely used computer program for estimating the parameters of the three-parameter logistic item response model. An alternative program, Mislevy and Bock’s (1983) BILOG, has recently become available. This paper compares the approaches taken by the two programs and offers some guidelines for choosing between the two programs for particular applications. Index terms: Bayesian estimation, BILOG, IRT estimation procedures, LOGIST, marginal maximum likelihood, maximum likelihood, three-parameter logistic model estimation procedures.Item Detection of invalid response patterns on the California Psychological Inventory(1989) Lanning, KevinWhen faced with the task of responding to a personality questionnaire, an individual may respond with a number of strategies or test-taking attitudes. Among these, deceptive (fake) and disengaged (random) attitudes are of particular interest, for these can potentially mislead and misinform test users. A two-stage model was devised to detect deceptive and disengaged protocols on the California Psychological Inventory. Using parameters from signal detection theory, this model is found to be highly sensitive in detecting invalidity. Index terms: California Psychological Inventory, expected utility, faking on personality inventories, personality assessment, random response patterns, signal detection theory.Item Paradoxes, contradictions, and illusions(1989) Humphreys, Lloyd G.; Drasgow, FritzThere is no contradiction between a powerful significance test based on a difference score and the necessity for reliable measurement of the dependent measure in a controlled experiment. In fact, the former requires the latter. In this paper we review the conclusions that were drawn by Humphreys and Drasgow (1989) and show that Overall’s (1989) "contradiction" is an illusion derived from imprecise language. Index terms: analysis of covariance, baseline correction, control of individual differences, difference scores, measurement of change, reliability of the marginal distribution, statistical power, within-group reliabilities.Item Inhibition in prolonged work tasks(1989) Van der Ven, Ad H. G. S.; Smit, J. C.; Jansen, R. W. T. L.A new model is presented that explains reaction time fluctuations in prolonged work tasks. The model extends the so-called Poisson-Erlang model and can account for long-term trend effects in the reaction time curve. The model is consistent with Spearman’s hypothesis that inhibition increases during work and decreases during rest. Predictions concerning the long-term trend were tested against data from the Bourdon-Vos cancellation test. The long-term trend in the mean and in the variance was perfectly described by the model. A goodness-of-fit test comparing frequency distributions of observed reaction and simulated reaction times was also supported by the model. Index terms: concentration, continuous work, distraction, inhibition, prolonged work, reaction time, response time.Item Congeneric modeling of reliability using censored variables(1989) Brown, R. L.This paper explores the use of Jöreskog’s (1970) congeneric modeling approach to reliability using censored quantitative variables, and discusses the compound problem of non-normality and attenuation that occurs when estimating censored continuous variables. Two monte carlo studies were conducted. The first study demonstrated the inappropriateness of using normal theory generalized least-squares (NTGLS) for estimating reliability on censored variables. The second study compared three different estimation procedures- NTGLS, asymptotically distribution free (ADF) estimators, and latent TOBIT estimators-as to their efficiency in estimating individual and composite reliability on censored variables. Results from the studies indicate that problems of non-normality and attenuation must be addressed before accurate reliability estimates may be obtained. Index terms: censored variables, congeneric model, covariance modeling, monte carlo study, reliability, TOBIT correlations.Item Psychometric properties of finite-state scores versus number-correct and formula scores: A simulation study(1989) García-Pérez, Miguel A.; Frary, Robert B.As developed by García-Pérez (1987), finite-state scores are nonlinear transformations of the proportions of conventional multiple-choice responses that are correct, incorrect, and omitted. They estimate the proportions of item alternatives which the examinees had the knowledge needed to classify (as correct or incorrect) before seeing them together in the items. The present study used simulation techniques to generate conventional test responses and to track the proportions of alternatives the examinees could classify independently before taking the test and the proportions they could classify after taking the test. Then the finite-state scores were computed and compared with these actual values and with number-correct and formula scores based on the conventional responses. Highly favorable results were obtained leading to recommendations for the use of finite-state scores. These results were almost the same when the simulation proceeded according to the model and when it was based on a naturalistic process completely independent of the model. Hence the scoring procedures on which finite-state scores are based are both accurate and robust. Index terms: applied measurement models, examinee behavior, finite-state scores, guessing, multiple-choice tests, test scoring.Item Estimating unrestricted population parameters from restricted sample data in employment testing(1989) Burke, Michael J.; Normand, Jacques; Doran, LucindaThis study examined the accuracy of Alexander, Alliger, and Hanges’ (1984) method for estimating unrestricted univariate predictor means and variances from sample data drawn from three populations in two personnel selection contexts: (1) where there was direct nonstrict truncation on the predictor, and (2) where there was direct strict truncation on the predictor. In addition, the accuracy of corrected (estimated unrestricted) validity coefficients based on estimated population predictor standard deviations was assessed in the nonstrict truncation condition. In general, there was inconsistency in the accuracy of the population predictor mean and standard deviation estimates obtained across the present datasets and conditions. Caution is advised in the interpretation and reporting of corrected validity coefficients in employment testing based on estimated population predictor standard deviations. Index terms: employment testing, personnel selection, range restriction, true validity estimation, unrestricted population parameters.Item The reliability of a linear composite of nonequivalent subtests(1989) Rozeboom, William W.Traditional formulas for estimating the reliability of a composite test from its internal item statistics are inappropriate to judge the reliability of multiple regressions and other weighted composites of subtests that are appreciably nonequivalent. Formulas are provided here for the reliability of such a composite given the reliabilities of its component subtests, followed by a comparison of the composite’s reliability to that of its components. Compositing can easily incur a substantial loss of reliability, though gains are entirely possible as well. Index terms: combining nonequivalent subtests, composite reliability, item weighting, nonequivalent subtests, nonhomogeneous item composites.Item Some comments on the relation between reliability and statistical power(1989) Humphreys, Lloyd G.; Drasgow, FritzSeveral articles have discussed the curious fact that a difference score with zero reliability can nonetheless allow a powerful test of change. This statistical legerdemain should not be overemphasized for three reasons. First, although the reliability of the difference score may be unrelated to power, the reliabilities of the variables used to create the difference scores are directly related to the power of the test. Second, with what some will regard as additional legerdemain, it is possible to define reliability in the context of a difference score in such a way that power is a direct function of reliability. The third and most serious objection to the conclusion that the reliability of a difference score is unimportant is that the underlying statistical model used in its derivation is rarely appropriate for psychological data. Index terms: control of individual differences, difference scores, reliability, reliability of the marginal distribution, statistical power, within-group reliabilities.Item Adaptive and conventional versions of the DAT: The first complete test battery comparison(1989) Henly, Susan J.; Klebe, Kelli J.; McBride, James R.; Cudeck, RobertA group of covariance structure models was examined to ascertain the similarity between conventionally administered and computerized adaptive (CAT) versions of the complete battery of the Differential Aptitude Tests (DAT). Two factor analysis models developed from classical test theory and three models with a multiplicative structure for these multitrait-multimethod data were developed and then fit to sample data in a double cross-validation design. All three direct-product models performed better than the factor analysis models in both calibration and cross-validation subsamples. The cross-validated, disattenuated correlation between the administration methods in the best-performing direct-product model was very high in both groups (.98 and .97), suggesting that the CAT version of the DAT is an adequate representation of the conventional test battery. However, some evidence suggested that there are substantial differences between the printed and computerized versions of the one speeded test in the battery. Index terms: adaptive tests, computerized adaptive testing, covariance structure, cross-validation, Differential Aptitude Tests, direct-product models, factor analysis, multitrait-multimethod matrices.Item Distinguishing between measurements and dependent variables(1989) Overall, John E.Humphreys and Drasgow (1989b) recognize two types of dependent variables: the original measurements collected in an experiment and mathematical variables that are subjected to statistical analysis. Overall and Woodward (1975) were explicitly concerned with the latter, whereas Humphreys and Drasgow contend that they were concerned with reliability of the original measurements from which difference scores may be computed. These are quite different matters. Criticisms should focus on points of disagreement, and there has never been any disagreement concerning the importance of reliability of the original measurements. The notion that treatment effects should be considered a part of the true variance for calculation of reliability estimates is rejected as stemming from their failure to understand the basic difference between reliability and validity. Index terms: control of individual differences, difference scores, measurement of change, reliability of the marginal distribution, statistical power, within-group reliabilities.Item Adaptive estimation when the unidimensionality assumption of IRT is violated(1989) Folk, Valerie G.; Green, Bert F.This study examined some effects of using a unidimensional IRT model when the assumption of unidimensionality was violated. Adaptive and nonadaptive tests were formed from two-dimensional item sets. The tests were administered to simulated examinee populations with different correlations of the two underlying abilities. Scores from the adaptive tests tended to be related to one or the other ability rather than to a composite. Similar but less disparate results were obtained with IRT scoring of nonadaptive tests, whereas the conventional standardized number-correct score was equally related to both abilities. Differences in item selection from the adaptive administration and in item parameter estimation were also examined and related to differences in ability estimation. Index terms: ability estimation, adaptive testing, item parameter estimation, item response theory, multidimensionality.Item A comparison of pseudo-Bayesian and joint maximum likelihood procedures for estimating item parameters in the three-parameter IRT model(1989) Skaggs, Gary; Stevenson, JoséThis study compared pseudo-Bayesian and joint maximum likelihood procedures for estimating item parameters for the three-parameter logistic model in item response theory. Two programs, ASCAL and LOGIST, which employ the two methods were compared using data simulated from a three-parameter model. Item responses were generated for sample sizes of 2,000 and 500, test lengths of 35 and 15, and examinees of high, medium, and low ability. The results showed that the item characteristic curves estimated by the two methods were more similar to each other than to the generated item characteristic curves. Pseudo-Bayesian estimation consistently produced more accurate item parameter estimates for the smaller sample size, whereas joint maximum likelihood was more accurate as test length was reduced. Index terms: ASCAL, item response theory, joint maximum likelihood estimation, LOGIST, parameter estimation, pseudo-Bayesian estimation, three-parameter model.Item Contradictions can never a paradox resolve(1989) Overall, John E.The fact that difference scores tend to be less reliable than the original measurements from which they are calculated should not be a matter of concern in testing the significance of treatment-induced change. The reliabilities of the original measurements are important because unreliability attenuates correlation, and substantial correlation between prescores and postscores is required for difference scores to be of value in controlling for individual differences. Reliability notwithstanding, difference scores provide superior control over true baseline differences in quasi-experimental research, whereas the analysis of covariance (ANCOVA) is generally preferable for baseline control in randomized experimental designs. Index terms: analysis of covariance, baseline correction, difference scores, measurement of change, reliability.Item Correction of an orthogonal procrustes rotation procedure described by Guilford and Hoepfner(1989) Ten Berge, Jos M. F.Index terms: factor matching, least-squares rotation, target rotation.Item Two-dimensional configurations on unidimensional stimulus sets in nonmetric multidimensional scaling(1989) Davison, Mark L.; Hearn, MarshaWhen unidimensional stimulus sets are subjected to a nonmetric scaling in two dimensions, the stimuli frequently form a C- or S-shaped configuration. In simulated unidimensional data scaled in two dimensions, stimuli formed a C-shaped configuration when the monotone function relating distances to dissimilarity data was negatively accelerating. They formed an S-shaped configuration when the monotone function was positively accelerating. Results suggest that when unidimensional stimulus sets are scaled in two dimensions using a rational starting configuration, the nature of the two-dimensional configuration can indicate the general form of the function mapping psychological dissimilarity, represented as distance in the scaling model, onto the observed response scale. Index terms: data transformations, multidimensional scaling, paired comparisons, proximity data, unidimensional scaling, unidimensionality.Item A Monte Carlo examination of external unfolding(1989) Thompson, PaulMonte carlo techniques were used to examine regression approaches to external unfolding. The present analysis examined the technique to determine if various characteristics of the points are recovered (such as ideal points). Characteristics of the situations were manipulated, including number of stimuli, error level, and measurement level. Generally, monotonic analyses resulted in good recovery. Recovery was poor when the data were generated by a weighted Euclidean process. Negative weights were commonly encountered, apparently due to error in the data. In this approach, estimation is done by statistical techniques, and some statistical concerns should be taken into account when examining the results. Index terms: external unfolding, monte carlo simulation, multidimensional scaling, nonmetric methods, parameter estimation, unfolding.Item Unidimensional IRT calibration of compensatory and noncompensatory multidimensional items(1989) Ackerman, Terry A.The characteristics of unidimensional ability estimates obtained from data generated using multidimensional compensatory models were compared with estimates from noncompensatory IRT models. Reckase, Carlson, Ackerman, and Spray (1986) reported that when a compensatory model is used and item difficulty is confounded with dimensionality, the composition of the unidimensional ability estimates differs for different points along the unidimensional ability (θ) scale. Eight datasets (four compensatory, four noncompensatory) were generated for four different levels of correlated two-dimensional θs. In each dataset, difficulty was confounded with dimensionality and then calibrated using LOGIST and BILOG. The confounding of difficulty and dimensionality affected the BILOG calibration of response vectors using matched multidimensional item parameters more than it affected the LOGIST calibration. As the correlation between the generated two-dimensional θs increased, the response data became more unidimensional as shown in bivariate plots of the mean θ̂₁ as opposed to the mean of θ̂₂ for specified unidimensional quantiles. Index terms: BILOG, compensatory IRT models, IRT ability estimation, LOGIST, multidimensional item response theory, noncompensatory IRT models.Item PACM: A two-stage procedure for analyzing structural models(1989) Lehmann, Donald R.; Gupta, SunilAn alternative procedure for estimating structural equations models is described. The two-stage procedure, Path Analysis of Covariance Matrix (PACM), separately estimates the measurement and structural models using standard least-squares procedures. PACM was empirically compared to simultaneous maximum likelihood estimation of measurement and structural models using LISREL. PACM produced results similar to LISREL in many cases; it also seems to have advantages when dealing with large-scale problems, model misspecifications, collinearity among indicators, and missing data. Index terms: causal models, confirmatory factor analysis, LISREL, path analysis, structural equations models.