Applied Psychological Measurement, Volume 19, 1995
Persistent link for this collectionhttps://hdl.handle.net/11299/114835
Browse
Browsing Applied Psychological Measurement, Volume 19, 1995 by Title
Now showing 1 - 20 of 27
- Results Per Page
- Sort Options
Item An alternative approach for IRT observed-score equating of number-correct scores(1995) Zeng, Lingjia; Kolen, Michael J.An alternative approach for item response theory observed-score equating is described. The number-correct score distributions needed in equating are found by numerical integration over the theoretical or empirical distributions of examinees’ traits. The item response theory true-score equating method and the observed-score equating method described by Lord, in which the number-correct score distributions are summed over a sample of trait estimates, are compared in a real test example. In a computer simulation, the observed-score equating methods based on numerical integration and summation were compared using data generated from standard normal and skewed populations. The method based on numerical integration was found to be less biased, especially at the two ends of the score distribution. This method can be implemented without the need to estimate trait level for individual examinees, and it is less computationally intensive than the method based on summation. Index terms: equating, item response theory, numerical integration, observed-score equating.Item Analysis of differential item functioning in translated assessment instruments(1995) Budgell, Glen R.; Raju, Nambury S.; Quartetti, Douglas A.The usefulness of three IRT-based methods and the Mantel-Haenszel technique in evaluating the measurement equivalence of translated assessment instruments was investigated. A 15-item numerical test and an 18-item reasoning test that were originally developed in English and then translated to French were used. The analyses were based on four groups, each containing 1,000 examinees. Two groups of English-speaking examinees were administered the English version of the tests; the other two were French-speaking examinees who were administered the French version of the tests. The percent of items identified with significant differential item functioning (DIF) in this study was similar to findings in previous large-sample studies. The four DIF methods showed substantial consistency in identifying items with significant DIF when replicated. Suggestions for future research are provided. Index terms: area measures, differential item functioning, item response theory, language translations, Lord’s X², Mantel-Haenszel procedure.Item Analyzing homogeneity and heterogeneity of change using Rasch and latent class models: A comparative and integrative approach(1995) Meiser, Thorsten; Hein-Eggers, Monika; Rompe, Pamela; Rudinger, GeorgThe application of unidimensional Rasch models to longitudinal data assumes homogeneity of change over persons. Using latent class models, several classes with qualitatively distinct patterns of development can be taken into account; thus, heterogeneity of change is assumed. The mixed Rasch model integrates both the Rasch and the latent class approach by dividing the population of persons into classes that conform to Rasch models with class-specific parameters. Thus, qualitatively different patterns of change can be modeled with the homogeneity assumption retained within each class, but not between classes. In contrast to the usual latent class approach, the mixed Rasch model includes a quantitative differentiation among persons in the same class. Thus, quantitative differences in the level of the latent attribute are disentangled from the qualitative shape of development. A theoretical comparison of the formal approaches is presented here, as well as an application to empirical longitudinal data. In the context of personality development in childhood and early adolescence, the existence of different developmental trajectories is demonstrated for two aspects of personality. Relations between the latent trajectories and discrete exogenous variables are investigated. Index terms: latent class analysis, latent structure analysis, measurement of change, mixture distribution models, Rasch model, rating scale model.Item Complex composites: Issues that arise in combining different modes of assessment(1995) Wilson, Mark; Wang, Wen-chungData from the California Learning Assessment System are used to examine certain characteristics of tests designed as the composites of items of different modes. The characteristics include rater severity, test information, and definition of the latent variable. Three different assessment modes-multiple-choice, open-ended, and investigation items (the latter two are referred to as performance- based modes)-were combined in a test across three different test forms. Rater severity was investigated by incorporating a rater parameter for each rater in an item response model that then was used to analyze the data. Some rater severities were found to be quite extreme, and the impact of this variation in rater severities on both total scores and trait level estimates was examined. Within-rater variation in rater severity also was examined and was found to have significant variation. The information contribution of the three modes was compared. Performance-based items provided more information than multiple-choice items and also provided greatest precision for higher levels of the latent variable. A projection-like method was applied to investigate the effects of assessment mode on the definition of the latent variable. The multiple-choice items added information to the performance-based variable. The results of the analysis also showed that the projection-like method did not practically differ from the results when the latent trait was defined jointly by both the multiple-choice and the performance-based items. Index terms: equating, linking, multiple assessment modes, polytomous item response models, rater effects.Item Computerized adaptive testing with polytomous items(1995) Dodd, Barbara G.; De Ayala, R. J.; Koch, William R.Polytomous item response theory models and the research that has been conducted to investigate a variety of possible operational procedures for polytomous model-based computerized adaptive testing (CAT) are reviewed. Studies that compared polytomous CAT systems based on competing item response theory models that are appropriate for the same measurement objective, as well as applications of polytomous CAT in marketing and educational psychology, also are reviewed. Directions for future research using polytomous model-based CAT are suggested. Index terms: computerized adaptive testing, polytomous item response theory, polytomous scoring.Item Conceptual notes on models for discrete polytomous item responses(1995) Mellenbergh, Gideon J.The following types of discrete item responses are distinguished : nominal-dichotomous, ordinal-dichotomous, nominal-polytomous, and ordinal-polytomous. Bock (1972) presented a model for nominal-polytomous item responses that, when applied to dichotomous responses, yields Birnbaum’s (1968) two-parameter logistic model. Applying Bock’s model to ordinal-polytomous items leads to a conceptual problem. The ordinal nature of the response variable must be preserved; this can be achieved using three different methods. A number of existing models are derived using these three methods. The structure of these models is similar, but they differ in the interpretation and qualities of their parameters. Information, parameter invariance, log-odds differences invariance, and model violation also are discussed. Information and parameter invariance of dichotomous item response theory (IRT) also apply to polytomous IRT. Specific objectivity of the Rasch model for dichotomous items is a special case of log-odds differences invariance of polytomous items. Differential item functioning of dichotomous IRT is a special case of measurement model violation of polytomous IRT. Index terms: adjacent categories, continuation ratios, cumulative probabilities, differential item functioning, log-odds differences invariance, measurement model violation, parameter invariance, polytomous IRT models.Item DIF assessment for polytomously scored items: A framework for classification and evaluation(1995) Potenza, Maria T.; Dorans, Neil J.Increased use of alternatives to the traditional dichotomously scored multiple-choice item yield complex responses that require complex scoring rules. Some of these new item types can be polytomously scored. DIF methodology is well-defined for traditional dichotomously scored multiple-choice items. This paper provides a classification scheme of DIF procedures for dichotomously scored items that is applicable to new DIF procedures for polytomously scored items. In the process, a formal development of a polytomous version of a dichotomous DIF technique is presented. Several polytomous DIF techniques are evaluated in terms of statistical and practical criteria. Index terms: DIF methodology, differential item functioning, item bias, polytomous scoring, statistical criteria for differential item functioning.Item Distinctive and incompatible properties of two common classes of IRT models for graded responses(1995) Andrich, DavidTwo classes of models for graded responses, the first based on the work of Thurstone and the second based on the work of Rasch, are juxtaposed and shown to satisfy important, but mutually incompatible, criteria and to reflect different response processes. Specifically, in the Thurstone models if adjacent categories are joined to form a new category, either before or after the data are collected, then the probability of a response in the new category is the sum of the probabilities of the responses in the original categories. However, the model does not have the explicit property that if the categories are so joined, then the estimate of the location of the entity or object being measured is invariant before and after the joining. For the Rasch models, if a pair of adjacent categories are joined and then the data are collected, the estimate of the location of the entity is the same before and after the joining, but the probability of a response in the new category is not the sum of the probabilities of the responses in the original categories. Furthermore, if data satisfy the model and the categories are joined after the data are collected, then they no longer satisfy the same Rasch model with the smaller number of categories. These differences imply that the choice between these two classes of models for graded responses is not simply a matter of preference; they also permit a better understanding of the choice of models for graded response data as a function of the underlying processes they are intended to represent. Index terms: graded responses, joining assumption, polytomous IRT models, Rasch model, Thurstone model.Item The distribution of person fit using true and estimated person parameters(1995) Nering, Michael L.A variety of methods have been developed to determine the extent to which a person’s response vector fits an item response theory model. These person-fit methods are statistical methods that allow researchers to identify nonfitting response vectors. The most promising method has been the lz statistic, which is a standardized person-fit index. Reise & Due (1991) concluded that under the null condition (i.e., when data were simulated to fit the model) lz performed reasonably well. The present study extended the findings of past researchers (e.g., Drasgow, Levine, & McLaughlin, 1987; Molenaar & Hoijtink, 1990; Reise and Due). Results show that lz may not perform as expected when estimated person parameters (θˆ) are used rather than true θ. This study also examined the influence of the pseudo-guessing parameter, the method used to identify nonfitting response vectors, and the method used to estimate θ. When θ was better estimated, lz, was more normally distributed, and the false positive rate for a single cut score did not characterize the distribution of lz. Changing the c parameter from .20 to 0.0 did not improve the normality of the lz. distribution. Index terms: appropriateness measurement, Bayesian estimation, item response theory, maximum likelihood estimation, person fit.Item The effects of correlated errors on generalizability and dependability coefficients(1995) Bost, James E.This study investigated the effects of correlated errors on the person x occasion design in which the confounding effect of equal time intervals results in correlated error terms in the linear model. Two specific error correlation structures were examined: the first-order stationary autoregressive (SARI), and the first-order nonstationary autoregressive (NARI) with increasing variance parameters. The effects of correlated errors on the existing generalizability and dependability coefficients were assessed by simulating data with known variances (six different combinations of person, occasion, and error variances), occasion sizes, person sizes, correlation parameters, and increasing variance parameters. Estimates derived from the simulated data were compared to their true values. The traditional estimates were acceptable when the error terms were not correlated and the error variances were equal. The coefficients were underestimated when the errors were uncorrelated with increasing error variances. However, when the errors were correlated with equal vanances the traditional formulas overestimated both coefficients. When the errors were correlated with increasing variances, the traditional formulas both overestimated and underestimated the coefficients. Finally, increasing the number of occasions sampled resulted in more improved generalizability coefficient estimates than dependability coefficient estimates. Index terms: changing error variances, computer simulation, correlated errors, dependability coefficients, generalizability coefficients.Item Effects of differing item parameters on closed-interval DIF statistics(1995) Feinstein, Zachary S.The closed-interval signed area (CSA) and closed-interval unsigned area (CUA) statistics were studied by monte carlo simulation to detect differential item functioning when the reference and focal groups had different parameter distributions. When the pseudo-guessing parameter was varied, the CSA was better able to detectmoderate to large differences between the groups than the CUA. However, the effect of the pseudo-guessing parameter varied depending on item discriminations. Index terms: closed-interval measures, differential item functioning, item response theory, monte carlo simulation, signed area measures, unsigned area measures.Item Fitting polytomous item response theory models to multiple-choice tests(1995) Drasgow, Fritz; Levine, Michael V.; Tsien, Sherman; Williams, Bruce; Mead, Alan D.This study examined how well current software implementations of four polytomous item response theory models fit several multiple-choice tests. The models were Bock’s (1972) nominal model, Samejima’s (1979) multiple-choice Model C, Thissen & Steinberg’s (1984) multiple-choice model, and Levine’s (1993) maximum-likelihood formula scoring model. The parameters of the first three of these models were estimated with Thissen’s (1986) MULTILOG computer program; Williams & Levine’s (1993) FORSCORE program was used for Levine’s model. Tests from the Armed Services Vocational Aptitude Battery, the Scholastic Aptitude Test, and the American College Test Assessment were analyzed. The models were fit in estimation samples of approximately 3,000; cross-validation samples of approximately 3,000 were used to evaluate goodness of fit. Both fit plots and X² statistics were used to determine the adequacy of fit. Bock’s model provided surprisingly good fit; adding parameters to the nominal model did not yield improvements in fit. FORSCORE provided generally good fit for Levine’s nonparametric model across all tests. Index terms: Bock’s nominal model, FORSCORE, maximum likelihood formula scoring, MULTILOG, polytomous IRT.Item Full-information factor analysis for polytomous item responses(1995) Muraki, Eiji; Carlson, James E.A full-information item factor analysis model for multidimensional polytomously scored item response data is developed as an extension of previous work by several authors. The model is expressed both in factor-analytic and item response theory parameters. Reckase’s multidimensional parameters for the model also are discussed as well as the related geometry. An EM algorithm for estimation of the model parameters is presented and results of the analysis of item response data by a computer program incorporating this algorithm are presented. Index terms: EM algorithm, full-information item factor analysis, multidimensional item response theory, polytomous response data.Item Hyperbolic cosine latent trait models for unfolding direct responses and pairwise preferences(1995) Andrich, DavidThe hyperbolic cosine unfolding model for direct responses of persons to individual stimuli is elaborated in three ways. First, the parameter of the stimulus, which reflects a region within which people located there are more likely to respond positively than negatively, is shown to be a property of the data and not arbitrary as first supposed. Second, the model is used to construct a related model for pairwise preferences. This model, for which joint maximum likelihood estimates are derived, satisfies strong stochastic transitivity. Third, the role of substantive theory in evaluating the fit between the data and the models, in which unique solutions for the estimates are not guaranteed, is explored by analyzing responses of one group of persons to a single set of stimuli obtained both as direct responses and pairwise preferences. Index terms: direct responses, hyberbolic cosine model, item response theory, latent trait models, pair comparisons, pairwise preferences, unfolding models.Item Introduction to the Polytomous IRT Special Issue(1995) Drasgow, FritzItem IRT-based internal measures of differential functioning of items and tests(1995) Raju, Nambury S.; Van der Linden, Wim J.; Fleer, Paul F.Internal measures of differential functioning of items and tests (DFIT) based on item response theory (IRT) are proposed. Within the DFIT context, the new differential test functioning (DTF) index leads to two new measures of differential item functioning (DIF) with the following properties: (1) The compensatory DIF (CDIF) indexes for all items in a test sum to the DTF index for that test and, unlike current DIF procedures, the CDIF index for an item does not assume that the other items in the test are unbiased ; (2) the noncompensatory DIF (NCDIF) index, which assumes that the other items in the test are unbiased, is comparable to some of the IRT-based DIP indexes; and (3) CDIF and NCDIF, as well as DTF, are equally valid for polytomous and multidimensional IRT models. Monte carlo study results, comparing these indexes with Lord’s X² test, the signed area measure, and the unsigned area measure, demonstrate that the DFIT framework is accurate in assessing DTF, CDIF, and NCDIF. Index Terms: area measures of DIF, compensatory DIF, differential functioning of items and tests (DFIT), differential item functioning, differential test functioning, Lord’s X²; noncompensatory DIF, nonuniform DIF, uniform DIF.Item Item response theory for scores on tests including polytomous items with ordered responses(1995) Thissen, David; Pommerich, Mary; Billeaud, Kathleen; Williams, Valerie S. L.Item response theory (IRT) provides procedures for scoring tests including any combination of rated constructed-response and keyed multiple-choice items, in that each response pattern is associated with some modal or expected a posteriori estimate of trait level. However, various considerations that frequently arise in large-scale testing make response-pattern scoring an undesirable solution. Methods are described based on IRT that provide scaled scores, or estimates of trait level, for each summed score for rated responses, or for combinations of rated responses and multiple-choice items. These methods may be used to combine the useful scale properties of IR’r-based scores with the practical virtues of a scale based on a summed score for each examinee. Index terms: graded response model, item response theory, ordered responses, polytomous models, scaled scores.Item A minimum X² method for equating tests under the graded response model(1995) Kim, Seock-Ho; Cohen, Allan S.The minimum X² method for computing equating coefficients for tests with dichotomously scored items was extended to the case of Samejima’s graded response items. The minimum X² method was compared with the test response function method (also referred to as the test characteristic curve method) in which the equating coefficients were obtained by matching the test response functions of the two tests. The minimum X² method was much less demanding computationally and yielded equating coefficients that differed little from those obtained using the test response function approach. Index terms: equating, graded response model, item response theory, minimum X² method, test response function method.Item The optimal degree of smoothing in equipercentile equating with postsmoothing(1995) Zeng, LingjiaThe effects of different degrees of smoothing on the results of equipercentile equating in the random groups design using a postsmoothing method based on cubic splines were investigated. A computer-based procedure was introduced for selecting a desirable degree of smoothing. The procedure was based on two criteria: (1) that the equating function is reasonably smooth, as evaluated by the second derivatives of the cubic spline functions, and (2) that the equated score distributions are close to that of the old form. The equating functions obtained from smoothing the equipercentile equivalents by a fixed smoothing degree and a degree selected by the computer-based procedure were evaluated in computer simulations for four tests. The results suggest that no particular fixed degree of smoothing always led to an optimal degree of smoothing. The degrees of smoothing selected by the computer-based procedure were better than the best fixed degrees of smoothing for two of the four tests studied; for one of the other two tests, the degrees selected by the computer procedure performed better or nearly as well as the best fixed degrees. Index terms: computer simulation, cubic spline, equating, equipercentile equating, smoothing.Item Pairwise parameter estimation in Rasch models(1995) Zwinderman, Aeilko H.Rasch model item parameters can be estimated consistently with a pseudo-likelihood method based on comparing responses to pairs of items irrespective of other items. The pseudo-likelihood method is comparable to Fischer’s (1974) Minchi method. A simulation study found that the pseudo-likelihood estimates and their (estimated) standard errors were comparable to conditional and marginal maximum likelihood estimates. The method is extended to estimate parameters of the linear logistic test model allowing the design matrix to vary between persons. Index terms: item parameter estimation, linear logistic test model, Minchi estimation, pseudo-likelihood, Rasch model.