Applied Psychological Measurement, Volume 15, 1991
Persistent link for this collectionhttps://hdl.handle.net/11299/103307
Browse
Browsing Applied Psychological Measurement, Volume 15, 1991 by Issue Date
Now showing 1 - 20 of 32
- Results Per Page
- Sort Options
Item Confirmatory factor analyses of multitrait-multimethod data: A comparison of alternative models(1991) Marsh, Herbert W.; Bailey, MichaelAlternative models for confirmatory factor analysis of multitrait-multimethod (MTMM) data were evaluated by varying the number of traits and methods and sample size for 255 MTMM matrices constructed from real data (Study 1), and for 180 MTMM matrices constructed from simulated data (Study 2). The correlated uniqueness model converged to proper solutions for 99% (Study 1) and 96% (Study 2) of the MTMM matrices, whereas the general model typically used converged to proper solutions for only 24% (Study 1) and 22% (Study 2) of the MTMM matrices. The general model was usually ill-defined (100% in Study 1, 90% in Study 2) for small MTMM matrices with small Ns, but performed better when the size of the MTMM matrix and N were larger. Even when both models converged to proper solutions, however, parameter estimates for the correlated uniqueness model were more accurate and precise in relation to known population parameters in Study 2. Index terms: confirmatory factor analysis, construct validity, discriminant validity, LISREL, method effects, multitrait-multimethod analysis, underidentified models.Item An equal-level approach to the investigation of multitrait-multimethod matrices(1991) Schweizer, KarlAn equal-level approach that yields new information for the evaluation of multitrait-multimethod (MTMM) matrices is described. The procedure is based on the analysis of item-composite relations, composite-composite relations, composites, and facets. A main characteristic of the equal-level approach is the induction of equality in data-level prior to carrying out comparisons between coefficients, because in many cases such inequalities may lead to inaccurate conclusions. Methods are proposed for ensuring comparability of coefficients even if an MTMM design includes different numbers of items for traits and methods. The concept of disaggregation is assigned a key position in the investigation of convergent and discriminant validity. In addition, measures are proposed for avoiding other distortions resulting from partial self-correlations. Index terms: disaggregated correlations, equal-level approach, multitrait-multimethod analysis, partial self-correlations, Spearman-Brown formula.Item Expert-system scores for complex constructed-response quantitative items: A study of convergent validity(1991) Bennett, Randy Elliot; Sebrechts, Marc M.; Rock, Donald A.This study investigated the convergent validity of expert-system scores for four mathematical constructed-response item formats. A five-factor model comprised of four constructed-response format factors and a Graduate Record Examination (GRE) General Test quantitative factor was posed. Confirmatory factor analysis was used to test the fit of this model and to compare it with several alternatives. The five-factor model fit well, although a solution comprised of two highly correlated dimensions-GRE-quantitive and constructed-response represented the data almost as well. These results extend the meaning of the expert system’s constructed-response scores by relating them to a well-established quantitative measure and by indicating that they signify the same underlying proficiency across item formats. Index terms: automatic scoring, constructed response, expert system, free-response items, open-ended items.Item Coefficients for interrater agreement(1991) Zegers, Frits E.The degree of agreement between two raters who rate a number of objects on a certain characteristic can be expressed by means of an association coefficient (e.g., the product-moment correlation). A large number of association coefficients have been proposed, many of which belong to the class of Euclidean coefficients (ECs). A discussion of desirable properties of ECs demonstrates how the identity coefficient and its generalizations, which constitute a family of ECs, can be used to assess interrater agreement. This family of ECs contains coefficients for both nominal and non-nominal (ordinal and metric) data. In particular, it is pointed out which information contained in the data is accounted for by the various coefficients and which information is ignored. Index terms: association coefficients, correlation, Euclidean coefficients, generalized identity coefficients, interrater agreement.Item The relationship of power of statistical tests to range of talent: A correction and amplification(1991) Humphreys, Lloyd G.Item Item-rest regressions, item response functions, and the relation between test forms(1991) De Gruijter, Dato N. M.; De Jong, John H. A. L.Levine (1982) used item-rest regressions for the estimation of item parameters, and this relationship was exploited in this research in the context of vertical equating. Results from a simulation and an empirical dataset were used to demonstrate that item-rest regressions were useful in verifying the relationship between two tests obtained from item parameter estimates. It is shown that in vertical equating designs the Rasch model cannot replicate the relationship between tests at the lower score levels when guessing is present. At higher score levels, however, the correct transformation function can be estimated, irrespective of the IRT model used. Index terms: equating, guessing parameter, item response functions, item-rest regression, Rasch model.Item An investigation of ordinal true score test theory(1991) Donoghue, John R.; Cliff, NormanThe validity of the assumptions underlying Cliff’s (1989) ordinal true score theory (OTST) were investigated in a three-stage study. OTST makes only ordinal assumptions about the data, and provides a means of converting ordinal item information into summary ordinal information about examinees. Stage 1 was a simulation based on a classical (weak true score) test theory model. Stage 2 used a long empirical test to approximate the true order. Stage 3 was an extensive simulation based on the three-parameter logistic model. The results of all three studies were consistent; the assumption of local ordinal uncorrelatedness was violated in that partial item-item gamma (γ) correlations were positive instead of 0. The assumption of proportional distribution of ties was violated-pairs tied on one item were not distributed on the other as prescribed. The item-true order tau (τ) correlation was consistently overestimated, although the estimated τ correlated highly with the true τ. The τ correlation between total score and true order was also consistently overestimated. Stage 3 showed that these effects occurred under all conditions, although they were smaller under some conditions. Index terms: classical test theory, item response models, local independence, monte carlo simulation, nonparametric test models, ordinal regression, ordinal test models, test theory.Item Influence of the criterion variable on the identification of differentially functioning test items using the Mantel-Haenszel statistic(1991) Clauser, Brian E.; Mazor, Kathleen; Hambleton, Ronald K.This study investigated the effectiveness of the Mantel-Haenszel (MH) statistic in detecting differentially functioning (DIF) test items when the internal criterion was varied. Using a dataset from a statewide administration of a life skills examination, a sample of 1,000 Anglo-American and 1,000 Native American examinee item response sets were analyzed. The MH procedure was first applied to all the items involved. The items were then categorized as belonging to one or more of four subtests based on the skills or knowledge needed to select the correct response. Each subtest was then analyzed as a separate test, using the MH procedure. Three control subtests were also established using random assignment of test items and were analyzed using the MH procedure. The results revealed that the choice of criterion, total test score versus subtest score, had a substantial influence on the classification of items as to whether or not they were differentially functioning in the American and Native American groups. Evidence for the convergence of judgmental and statistical procedures was found in the unusually high proportion of DIF items within one of the classifications and in the results of the reanalysis of this group of items. Index terms: differential item functioning, item bias, Mantel-Haenszel statistic, test bias.Item Appropriateness measurement for some multidimensional test batteries(1991) Drasgow, Fritz; Levine, Michael V.; McLaughlin, Mary E.Model-based methods for the detection of individuals inadequately measured by a test have generally been limited to unidimensional tests. Extensions of unidimensional appropriateness indices are developed here for multi-unidimensional tests (i.e., multidimensional tests composed of unidimensional subtests). Simulated and real data were used to evaluate the effectiveness of the multitest appropriateness indices. Very high rates of detection of spuriously high and spuriously low response patterns were obtained with the simulated data. These detection rates were comparable to rates obtained for long unidimensional tests (both simulated and real) with approximately the same number of items. For real data, similarly high detection rates were obtained in the spuriously high condition; slightly lower detection rates were observed for the spuriously low condition. Several directions for future research are described. Index terms: appropriateness measurement, item response theory, multidimensional tests, optimal appropriateness measurement, polychotomous measurement.Item Three approaches to determining the dimensionality of binary items(1991) Roznowski, Mary; Tucker, Ledyard R.; Humphreys, Lloyd G.A monte carlo investigation of three approaches to assessing the dimensionality of binary items used a population model that allowed sampling of items and examinees and provided for variation and control of important parameters. The model was realistic of performance of binary items in current tests of cognitive abilities. Three indices were investigated : one based on the property of local independence of unidimensional tests (the independence index), one based on patterns of second factor loadings derived from simplex theory (the pattern index), and one that reflects the shape of the curve of successive eigenvalues (the ratio of differences index). The last index was used for matrices of phi coefficients, tetrachoric correlations, and variances-covariances. The local independence index reported here was the most accurate dimensionality index. The pattern index was accurate under many combinations of parameters, but decreased substantially at the highest level of factor correlations and the widest dispersion of item difficulties. None of the eigenvalue indices produced satisfactory accuracy, except under the most favorable combinations of parameters. Nonetheless, the eigenvalues of variance-covariance matrices provided a more accurate basis for dimensionality decisions than tetrachoric correlations, which have been the statistic of choice of many investigators. Recommendations for use are also given. Index terms: binary items, dimensionality, factor analysis, phi correlations, tetrachoric correlations.Item Effects of passage and item scrambling on equating relationships(1991) Harris, Deborah J.This study investigated the effects of passage and item scrambling on equipercentile and item response theory equating using a random groups design. For all four tests and for both scramblings used, differences in item and examinee statistics were found to exist between all three forms used (the base form and the two scrambled forms). Up to 50% of the examinees administered a scrambled form would have received a different scale score if the base form equating, rather than the scrambled form equating, had been used to convert their number-correct scores. It is, therefore, suggested that caution be used when scrambled forms are being administered, because in applications such as that studied here, the effects of applying the equating results obtained using a base form to the number-correct scores obtained on a scrambled form can be quite substantial in terms of the numbers of examinees who would receive different scores. Index terms: context effects, equating, item scrambling.Item A generative approach to the modeling of isomorphic hidden-figure items(1991) Bejar, Isaac I.; Yocom, PeterA generative approach to psychometric modeling consists of encoding information about the cognitive processes and structures that underlie test performance into an item-generation algorithm in such a way that the generated items have known psychometric parameters. An important by-product of the approach is that the knowledge about the response process is tested every time a test is administered. Validation thus becomes an ongoing process rather than an occasional event. This approach is illustrated through an analysis of hidden-figure items, and is shown to be feasible. Index terms: construct validity, generative modeling, isomorphic problems, item difficulty, spatial ability, validation.Item An empirical study of the effects of small datasets and varying prior variances on item parameter estimation in BILOG(1991) Harwell, Michael R.; Janosky, Janine E.Long-standing difficulties in estimating item parameters in item response theory (IRT) have been addressed recently with the application of Bayesian estimation models. The potential of these methods is enhanced by their availability in the BILOG computer program. This study investigated the ability of BILOG to recover known item parameters under varying conditions. Data were simulated for a two-parameter logistic IRT model under conditions of small numbers of examinees and items, and different variances for the prior distributions of discrimination parameters. The results suggest that for samples of at least 250 examinees and 15 items, BILOG accurately recovers known parameters using the default variance. The quality of the estimation suffers for smaller numbers of examinees under the default variance, and for larger prior variances in general. This raises questions about how practitioners select a prior variance for small numbers of examinees and items. Index terms: BILOG, item parameter estimation, item response theory, parameter recovery, prior distributions, simulation.Item Appropriate moderated regression and inappropriate research strategy: A demonstration of information loss due to scale coarseness(1991) Russell, Craig J.; Pinto, Jeffrey K.; Bobko, PhilipPaunonen and Jackson (1988) demonstrated that stepwise moderated regression provides a test of interaction effects that protects the nominal Type I error rate. However, the stepwise procedure has also been characterized as failing to detect interaction effects in empirical studies. This issue has led to questions regarding the method’s statistical power (Bobko, 1986; Zedeck, 1971) in applied research. It is demonstrated that, because of a research strategy frequently used in empirical investigations, the probability of Type II error in detecting a true interaction effect is unknown. Specifically, the number of scale steps used in measuring the dependent variable is shown to result in a form of systematic error that can spuriously increase or decrease the expected effect size of the interaction. The problem is also discussed in the context of testing more complex models. Recommendations for eliminating this problem in future research designs are provided. Index terms: information loss, interaction effects, Likert scales, moderated regression, response transformation.Item The use of prior distributions in marginalized Bayesian item parameter estimation: A didactic(1991) Harwell, Michael R.; Baker, Frank B.The marginal maximum likelihood estimation (MMLE) procedure (Bock & Lieberman, 1970; Bock & Aitkin, 1981) has led to advances in the estimation of item parameters in item response theory. Mislevy (1986) extended this approach by employing the hierarchical Bayesian estimation model of Lindley and Smith (1972). Mislevy’s procedure posits prior probability distributions for both ability and item parameters, and is implemented in the PC-BILOG computer program. This paper extends the work of Harwell, Baker, and Zwarts (1988), who provided the mathematical and implementation details of MMLE in an earlier didactic paper, by encompassing Mislevy’s marginalized Bayesian estimation of item parameters. The purpose was to communicate the essential conceptual and mathematical details of Mislevy’s procedure to practitioners and to users of PC-BILOG, thus making it more accessible. Index terms: Bayesian estimation, BILOG, item parameter estimation, item response theory.Item A comparison of bivariate smoothing methods in common-item equipercentile equating(1991) Hanson, Bradley A.The effectiveness of smoothing the bivariate distributions of common and noncommon item scores in the frequency estimation method of common-item equipercentile equating was examined. The mean squared error of equating was computed for several equating methods and sample sizes, for two sets of population bivariate distributions of equating and nonequating item scores defined using data from a professional licensure exam. Eight equating methods were compared: five equipercentile methods and three linear methods. One of the equipercentile methods was unsmoothed equipercentile equating. Four methods of smoothed equipercentile (SEP) equating were considered : two based on log-linear models, one based on the four-parameter beta binomial model, and one based on the four-parameter beta compound binomial model. The three linear equating methods were the Tucker method, the Levine Equally Reliable method, and the Levine Unequally Reliable method. The results indicated that smoothed distributions produced more accurate equating functions than the unsmoothed distributions, even for the largest sample size. Tucker linear equating produced more accurate results than SEP equating when the systematic error introduced by assuming a linear equating function was small relative to the random error of the methods of SEP equating. Index terms: common-item equating, equating, log-linear models, smoothing, strong true score models.Item Postscript to "The reliability of a linear composite of nonequivalent subtests"(1991) Rozeboom, William W.Some practicality clarifications for use of the composite-reliability formula developed in Rozeboom (1989) are described. A microcomputer program is announced to implement the calculations, and a prior source of related work is identified. Index terms: composite reliability, item weighting, nonequivalent subtests, non-homogeneous item composites.Item On the efficiency of IRT models when applied to different sampling designs(1991) Berger, Martijn P. F.The problem of obtaining designs that result in the greatest precision of the parameter estimates is encountered in at least two situations in which item response theory (IRT) models are used. In so-called two-stage testing procedures, certain designs may be specified that match difficulty levels of test items with abilities of examinees. The advantage of such designs is that the variance of the estimated parameters can be controlled. In situations in which IRT models are applied to different groups, efficient multiple-matrix sampling designs are applicable. The choice of matrix sampling designs will also influence the variance of the estimated parameters. Heuristic arguments are given here to formulate the efficiency of a design in terms of an asymptotic generalized variance criterion, and a comparison is made of the efficiencies of several designs. It is shown that some designs may be found to be most efficient for the one- and two- parameter model, but not necessarily for the three-parameter model. Index terms: efficiency, generalized variance, item response theory, optimal design.Item The discriminating power of items that measure more than one dimension(1991) Reckase, Mark D.; McKinley, Robert L.Determining a correct response to many test items frequently requires more than one ability. This paper describes the characteristics of items of this type by proposing generalizations of the item response theory concepts of discrimination and information. The conceptual framework for these statistics is presented, and the formulas for the statistics are derived for the multidimensional extension of the two-parameter logistic model. Use of the statistics is demonstrated for a form of the ACT Mathematics Usage Test. Index terms: item discrimination, item information, item response theory, multidimensional item response theory.Item Reliability of ratings for multiple judges: Intraclass correlation and metric scales(1991) Fagot, Robert F.Scale-dependent procedures are presented for assessing the reliability of ratings for multiple judges using intraclass correlation. Scale type is defined in terms of admissible transformations, and standardizing transformations for ratio and interval scales are presented to solve the problem of adjusting ratings for "arbitrary scale factors" (unit and/or origin of the scale). The theory of meaningfulness of numerical statements is introduced and the coefficient of relational agreement (Stine, 1989b) is defined as the degree of agreement among judges, with respect to (scale-dependent) empirically meaningful relationships. Other topics discussed include the treatment of variability due to judges in relation to scale type, and the reliability of magnitude estimates in psychophysics. Index terms: coefficient of agreement, intraclass correlation, meaningfulness, metric scales, reliability of rating scales.