Applied Psychological Measurement, Volume 14, 1990
Persistent link for this collectionhttps://hdl.handle.net/11299/103305
Browse
Browsing Applied Psychological Measurement, Volume 14, 1990 by Title
Now showing 1 - 20 of 31
- Results Per Page
- Sort Options
Item Bias and the effect of priors in Bayesian estimation of parameters of item response models(1990) Gifford, Janice A.; Swaminathan, HariharanThe effectiveness of a Bayesian approach to the estimation problem in item response models has been sufficiently documented in recent years. Although research has indicated that Bayesian estimates, in general, are more accurate than joint maximum likelihood (JML) estimates, the effect of choice of priors on the Bayesian estimates is not well known. Moreover, the extent to which the Bayesian estimates are biased in comparison with JML estimates is not known. The effect of priors and the amount of bias in Bayesian estimates is examined in this paper through simulation studies. It is shown that different specifications of prior information have relatively modest effects on the Bayesian estimates. For small samples, it is shown that the Bayesian estimates are less biased than their JML counterparts. Index terms: accuracy, Bayesian estimates, bias, item response models, joint maximum likelihood estimates, priors.Item A cluster-based method for test construction(1990) Boekkooi-Timminga, EllenSeveral methods for optimal test construction from item banks have recently been proposed using information functions. The main problem with these methods is the large amount of time required to identify an optimal test. In this paper, a new method is presented for the Rasch model that considers groups of interchangeable items, instead of individual items. The process of item clustering is described, the cluster-based test construction model is outlined, and the computational procedure and results are given. Results indicate that this method produces accurate results in small amounts of time. Index terms: information functions, item banking, item response theory, linear programming, test construction.Item A comparison of item- and person-fit methods of assessing model-data fit in IRT(1990) Reise, Steven P.Many item-fit statistics have been proposed for assessing whether the responses to test items aggregated across examinees conform to IRT test models. Conversely, person-fit statistics have been proposed for assessing whether an examinee’s responses aggregated across items are congruent with a specified IRT model. Statistical procedures to assess item fit have differed from those to assess person fit. This research compared a x² item-fit index with a likelihood-based person-fit index. Eight 0,1 data matrices were simulated under the three-parameter logistic test model. Both the likelihood-based and x² fit statistics were then computed for examinees and items, and Type I and Type II error rates were analyzed. With data simulated to fit the IRT model, the x² test overidentified examinees and items as being misfitting, while the likelihood-based fit index held closer to the specified α levels. The two fit indices gave consistent (mis)fit-to-model results in 94 and 97 percent of cases for items and examinees, respectively, across simulations. Under simulated conditions of data misfit, the x² statistic detected misfit at a higher rate than the likelihood-based statistic, indicating that the x² statistic was slightly more sensitive to response pattern aberrancy. However, other considerations led to a recommendation for employing the likelihood-based index in applied fit analyses to evaluate both examinee and item model-data (mis)fit. Index terms: chi-square index, item fit, item response theory, model fit, person fit, response aberrancy.Item Determining the significance of estimated signed and unsigned areas between two item response functions(1990) Raju, Nambury S.Asymptotic sampling distributions (means and variances) of estimated signed and unsigned areas between two item response functions (IRFS) are presented for the Rasch model, the two-parameter model, and the three-parameter model with fixed lower asymptotes. In item bias or differential item functioning research, it may be of interest to determine whether the estimated signed and unsigned areas between IRFS calibrated with two different groups are significantly different from 0. The usefulness of these sampling distributions in this context is discussed and illustrated. More empirical research with the proposed significance tests is necessary. Index terms: asymptotic mean and variance, differential item functioning, item bias, item response functions, item response theory.Item The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model(1990) Dodd, Barbara G.Real and simulated datasets were used to investigate the effects of the systematic variation of two major variables on the operating characteristics of computerized adaptive testing (CAT) applied to instruments consisting of polychotomously scored rating scale items. The two variables studied were the item selection procedure and the stepsize method used until maximum likelihood trait estimates could be calculated. The findings suggested that (1) item pools that consist of as few as 25 items may be adequate for CAT; (2) the variable stepsize method of preliminary trait estimation produced fewer cases of nonconvergence than the use of a fixed stepsize procedure; and (3) the scale value item selection procedure used in conjunction with a minimum standard error stopping rule outperformed the information item selection technique used in conjunction with a minimum information stopping rule in terms of the frequencies of nonconvergent cases, the number of items administered, and the correlations of CAT θ estimates with full scale estimates and known θ values. The implications of these findings for implementing CAT with rating scale items are discussed. Index terms: adaptive testing, attitude measurement, computerized adaptive testing, item response theory, rating scale model.Item Effect of scale adjustment on the comparison of item and ability parameters(1990) Liou, MichelleThe standardized mean-squared difference (SMSD) has been used for summarizing the bias of parameter estimates in the three-parameter logistic (3PL) model. Due to the indeterminacy problem of the 3PL model, researchers must select a common scale for comparing the theoretical and estimated parameters. The use of different scales can yield noncomparable SMSD values, which in turn can affect the comparison of bias between different parameters. This research used three methods for selecting the common scale. Through a simulation,the three scaling methods were used to numerically demonstrate their effect on SMSD values. Index terms: equating, indeterminacy problem, Samejima scale, standardized mean-squared difference, Stocking and Lord scale, three-parameter logistic model.Item Estimating item and ability parameters in homogeneous tests with the person characteristic function(1990) Carroll, John B.On the basis of monte carlo runs, in which item response data were generated for a variety of test characteristics, procedures for estimating item and ability parameters for homogeneous, unidimensional tests are developed on the assumption that values of the slope parameter a and the guessing parameter c are constant over items. The procedures focus on estimates of the a parameter, regarded as an important statistic for characterizing an ability. This parameter is estimated from person characteristic functions for different levels of the total raw score distribution. The procedures can be applied to datasets with relatively small or very large Ns and with either relatively small or large numbers of items. They are illustrated with data from several cognitive ability tests. Index terms: cognitive ability tests, homogeneous tests, item parameter estimation, item response theory, person characteristic function.Item Estimation problems in the block-diagonal model of the multitrait-multimethod matrix(1990) Brannick, Michael T.; Spector, Paul E.The most popular method used to analyze the multitrait-multimethod (MTMM) matrix has been confirmatory factor analysis (CFA). The block-diagonal model, in which trait effects, trait correlations, method effects, and method correlations are simultaneously estimated is examined in detail. Analysis of published data from 18 correlation matrices showed estimation problems in all but one case. Simulations were used to show how identification and specification difficulties may account for these problems. Even trivial misspecification of a single parameter can prevent program convergence. These problems render the CFA block-diagonal approach to analyzing MTMM data less useful than has generally been thought. Index terms: construct validity, covariance structure modeling, factor analysis, multitrait-multimethod matrix, parameter estimation in confirmatory factor analysis.Item Fitting a polytomous item response model to Likert-type data(1990) Muraki, EijiThis study examined the application of the MML-EM algorithm to the parameter estimation problems of the normal ogive and logistic polytomous response models for Likert-type items. A rating-scale model was developed based on Samejima’s (1969) graded response model. The graded response model includes a separate slope parameter for each item and an item response parameter. In the rating-scale model, the item response parameter is resolved into two parameters: the item location parameter, and the category threshold parameter characterizing the boundary between response categories. For a Likert-type questionnaire, where a single scale is employed to elicit different responses to the items, this item response model is expected to be more useful for analysis because the item parameters can be estimated separately from the threshold parameters associated with the points on a single Likert scale. The advantages of this type of model are shown by analyzing simulated data and data from the General Social Surveys. Index terms: EM algorithm, General Social Surveys, graded response model, item response model, Likert scale, marginal maximum likelihood, polytomous item response model, rating-scale model.Item Fitting the two-parameter model to personality data(1990) Reise, Steven P.; Waller, Niels G.The Multidimensional Personality Questionnaire (MPQ; Tellegen, 1982) was parameterized using the two-parameter logistic item response model. This entailed assessment of the suitability of personality data for item response analyses, including the assessment of dimensionality, monotonicity of item response, and data-model fit. The latter issue received special emphasis. Similarities and differences between maximum performance and typical performance data are discussed in relation to item response theory. Results suggest that the two-parameter model fits the MPQ data and that researchers engaged in the assessment of normal-range personality processes have much to gain from exploiting item response models. Index terms: item fit, item response theory, Multidimensional Personality Questionnaire, personality measurement, two-parameter model. Within the family of item response models, theItem A generative analysis of a three-dimensional spatial task(1990) Bejar, Isaac I.The feasibility of incorporating research results from cognitive science into the modeling of performance on psychometric tests and the construction of test items is considered, particularly the feasibility of modeling performance on a three-dimensional rotation task within the context of item response theory (IRT). Three-dimensional items were selected because of the rich literature on the mental models that are used in their solution. An 80-item, three-dimensional rotation test was constructed. An inexpensive computer system was also developed to administer the test and record performance, including response-time data. Data were collected on high school juniors and seniors. As expected, angular disparity was a potent determinant of item difficulty. The applicability of IRT to these data was investigated by dichotomizing response time at increasing elapsed times, and applying standard item parameter estimation procedures. It is concluded that this approach to psychometric modeling, which explicitly incorporates information on the mental models examinees use in solving an item, is workable and important for future developments in psychometrics. Index terms: cognitive psychology, continuous response, item response theory, mental rotation, response latency.Item Implications of three causal models for the measurement of halo error(1990) Fisicaro, Sebastiano A.; Lance, Charles E.The appropriateness of a traditional correlational measure of halo error (the difference between dimensional rating intercorrelations and dimensional true score intercorrelations) is reexamined in the context of three causal models of halo error. Mathematical derivations indicate that the traditional correlational measure typically will underestimate halo error in ratings and can suggest no halo error or even "negative" halo error when positive halo error actually occurs. A corrected correlational measure is derived that avoids these problems, and the traditional and corrected measures are compared empirically. Results suggest that use of the traditional correlational measure of halo error be discontinued. Index terms: halo, halo effect, halo error, performance ratings, rating accuracy, rating errors.Item Improving IRT item bias detection with iterative linking and ability scale purification(1990) Park, Dong-gun; Lautenschlager, Gary J.The effectiveness of several iterative methods of item response theory (IRT) item bias detection was examined in a simulation study. The situations employed were based on biased items created using a two-dimensional IRT model. Previous research demonstrated that the non-iterative application of some IRT parameter linking procedures produced unsatisfactory results in a simulation study involving unidirectional item bias. A modified form of Drasgow’s iterative item parameter linking method and an adaptation of Lord’s test purification procedure were examined in conditions that simulated unidirectional and mixed-directional forms of item bias. The results illustrate that iterative linking holds promise for differentiating biased from unbiased items under several item bias conditions. In addition, a combination of methods, involving cycles of iterative linking followed by ability scale purification, was found to be even more effective than iterative linking alone. This combination of procedures totally eliminated false positive misidentifications for the most pervasive item bias condition, and false negative misidentifications were also reduced. Combining iterative linking with ability scale purification appears to be a viable method for analyzing multidimensional IRT data with unidimensional IRT item-bias methods. Index terms: ability scale purification, item bias, item response theory, iterative linking, iterative methods, metric linking, multidimensional IRT model.Item individual differences in unfolding preference data: A restricted latent class approach(1990) Böckenholt, Ulf; Böckenholt, IngoA latent class scaling approach is presented for modeling paired comparison and "pick-any/t" data obtained in a preference study. Although the latent class part of the model identifies homogeneous subgroups that are characterized by their choice probabilities for a set of alternatives, the scaling part of the model describes the single peakedness structure of the choice data. Procedures are suggested for examining the unfolding structure in an unrestricted latent class solution. Two applications are presented to illustrate the technique. In the first application, scaling solutions obtained from a latent class scaling model and a marginal maximum likelihood latent trait model are compared. Index terms: latent class analysis, paired comparison data, pick any/t data, unfolding models.Item Longitudinal factor score estimation using the Kalman filter(1990) Oud, Johan H.; Van den Bercken, John H.; Essers, Raymond J.The advantages of the Kalman filter as a factor score estimator in the presence of longitudinal data are described. Because the Kalman filter presupposes the availability of a dynamic state space model, the state space model is reviewed first, and it is shown to be translatable into the LISREL model. Several extensions of the LISREL model specification are discussed in order to enhance the applicability of the Kalman filter for behavioral research data. The Kalman filter and its main properties are summarized. Relationships are shown between the Kalman filter and two well-known cross-sectional factor score estimators: the regression estimator, and the Bartlett estimator. The indeterminacy problem of factor scores is also discussed in the context of Kalman filtering, and the differences are described between Kalman filtering on the basis of a zero-means and a structured-means LISREL model. By using a structured-means LISREL model, the Kalman filter is capable of estimating absolute latent developmental curves. An educational research example is presented. Index terms: factor score estimation, indeterminacy of factor scores, Kalman filter, L,ISREL longitudinal LISREL modeling, longitudinal factor analysis, state space modeling.Item A method for the age standardization of test scores(1990) Schagen, I. P.A procedure is presented to generate standardized scores from raw test data that are, as far as possible, age independent and normally distributed. The model is fitted to the percentile points of the raw score distribution, and assumes a linear trend of each percentile with age. The fitted slopes can be constant or can vary quadratically with the percentiles. A nonlinear transformation of the data is also possible to allow for "ceiling effects." These models are described and the methods used to fit them to test data are discussed; examples are presented of their use in standardizing tests, and the use of the diagnostic plots produced by the program are discussed. Index terms: age standardization, linear regression, nonlinear regression, nonparallel regression, parallel linear regression, percentiles, score transformation.Item On the construct validity of multiple-choice items for reading comprehension(1990) Van den Bergh, HuubIn this study 590 third-grade students took one of four reading comprehension tests with either multiple-choice items or open-ended items. Each also took 32 tests indicating 16 semantic Structure-of-Intellect (si) abilities. Four conditions or groups were distinguished on the basis of the reading comprehension tests. The four 33 x 33 correlation matrices were analyzed simultaneously with a four-group LISREL model. The 16 intellectual abilities explained approximately 62% of the variance in true reading comprehension scores. None of the SI abilities proved to be differentially related to item type. Therefore, it was concluded that item type for reading comprehension is congeneric with respect to the SI abilities measured. Index terms: construct validity, item format, free response, reading comprehension, Structure-of-Intellect model.Item Problems in the measurement of latent variables in structural equations causal models(1990) Cohen, Patricia; Cohen, Jacob; Teresi, Jeanne; Marchi, Margaret L.; Velez, C. NoemiSome problems in the measurement of latent variables in structural equations causal models are presented, with examples from recent empirical studies. Latent variables that are theoretically the source of correlation among the empirical indicators are differentiated from unmeasured variables that are related to the empirical indicators for other reasons. It is pointed out that these should also be represented by different analytical models, and that much published research has treated this distinction as if it had no analytic consequences. The connection between this theoretical distinction and disattenuation effects in latent variable models is shown, and problems with these estimates are discussed. Finally, recommendations are made for decisions about whether and how to measure latent variables when manifest variables are potentially available. Index terms: causal models, disattenuation, emergent variables, latent variable measurement, latent variables, structural equations modeling.Item Rasch models in latent classes: An integration of two approaches to item analysis(1990) Rost, JürgenA model is proposed that combines the theoretical strength of the Rasch model with the heuristic power of latent class analysis. It assumes that the Rasch model holds for all persons within a latent class, but it allows for different sets of item parameters between the latent classes. An estimation algorithm is outlined that gives conditional maximum likelihood estimates of item parameters for each class. No a priori assumption about the item order in the latent classes or the class sizes is required. Application of the model is illustrated, both for simulated data and for real data. Index terms: conditional likelihood, EM algorithm, latent class analysis, Rasch model.Item The relationship of expert-system scored constrained free-response items to multiple-choice and open-ended items(1990) Bennett, Randy Elliot; Rock, Donald A.; Braun, Henry I.; Douglas, Frye; Spohrer, James C.; Soloway, ElliotThis study examined the relationship of an expert-system scored constrained free-response item (requiring the student to debug a faulty computer program) to two other item types: (1) multiple-choice and (2) free-response (requiring production of a program). Confirmatory factor analysis was used to test the fit of a three-factor model to these data and to compare the fit of the model to three alternatives. These models were fit using two random-half samples, one given a faulty program containing one bug and the other a program with three bugs. A single-factor model best fit the data for the sample taking the one-bug constrained free response and a two-factor model fit the data somewhat better for the second sample. In addition, the factor intercorrelations showed this item type to be highly related to both the free-response and multiple-choice measures. Index terms: artificial intelligence, constructed-response items, expert-system scoring, free-response items, open-ended items.