# Applied Psychological Measurement, Volume 11, 1987

Persistent link for this collectionhttps://hdl.handle.net/11299/103298

Search within Applied Psychological Measurement, Volume 11, 1987

## Browse

### Recent Submissions

Item Power and robustness in product-moment correlation(1987) Fowler, Robert L.The power of statistical tests based on four popular product-moment correlation coefficients was examined when relatively small samples (10 ≤ N ≤ 100) are drawn from bivariate populations of several different distributional shapes. Analytical procedures for determining theoretical power under conditions of bivariate normality are presented for the Pearson (r[subscript p]), Spearman (r[subscript s]), point-biserial (r[subscript pb]), and phi (r[subscript fp]) coefficients. A monte carlo study supported previous conclusions that t as a test of H[subscript 0]: ρ=0, with r[subscript p] estimating ρ, is robust over a wide range of non-normality; however, frequent use of r[subscript s] leads to greater power under identical distributional assumption violations. The proportion of power due to Type III errors was also specified both analytically and empirically, and revealed the relative invulnerability of most statistical tests to directional misinterpretation.Item A stochastic three-way unfolding model for asymmetric binary data(1987) DeSarbo, Wayne S.; Lehmann, Donald R.; Holbrook, Morris B.; Havlena, William J.; Gupta, SunilThis paper presents a new stochastic three-way unfolding method designed to analyze asymmetric three-way, two-mode binary data. As in the metric three-way unfolding models presented by DeSarbo (1978) and by DeSarbo and Carroll (1980, 1981, 1985), this procedure estimates a joint space of row and column objects, as well as weights reflecting the third way of the array, such as individual differences. Unlike the traditional metric three-way unfolding model, this new methodology is based on stochastic assumptions using an underlying threshold model, generalizing the work of DeSarbo and Hoffman (1986) to three-way and asymmetric binary data. The literature concerning the spatial treatment of such binary data is reviewed. The nonlinear probit-like model is described, as well as the maximum likelihood algorithm used to estimate its parameter values. Results of a monte carlo study applying this new method to synthetic datasets are presented. The new method was also applied to real data from a study concerning word (emotion) associations in consumer behavior. Possibilities for future research and applications are discussed.Item Open-ended versus multiple-choice response formats--it does make a difference for diagnostic purposes(1987) Birenbaum, Menucha; Tatsuoka, Kikumi K.The purpose of the present study was to examine the effect of response format—open-ended (OE) versus multiple-choice (MC)—on the diagnosis of examinee misconceptions in a procedural task. A test in fraction addition arithmetic was administered to 285 eighth-grade students, 148 of whom responded to the OE version of the test and 137 to the MC version. The two datasets were compared with respect to the underlying structure of the test, the number of different error types, and the diagnosed sources of misconception (bugs) reflected in the response patterns. The overall results indicated considerable differences between the two formats, with more favorable results for the OE format. The effect of item format on examinee responses has been studied extensively in the past decade. The equivalence of open-ended (OE) items (also known as free-response or recall items) and multiple- choice (MC)items(also known as recognition items) has addressed by psychometricians and cognitive psychologists. From an information-processing point of view, different models for the two formats have been suggested (e. g., Bender, 1980). The commonly held view suggests that recall items require examinees to both search for and retrieve information, whereas recognition items require them only to discriminate among the presented information.Item Effects of variations in item step values on item and test information in the partial credit model(1987) Dodd, Barbara G.; Koch, William R.Simulated data were used to investigate systematically the impact of various orderings of step difficulties on the distribution of item information for the partial credit model. It was found that the distribution of information for an item was a function of (1) the range of the step difficulty values, (2) the number of step difficulties that were out of sequential order, and (3) the distance between the step values that were out of order. Also, by using relative efficiency comparisons, the relationship between the step estimates and the distribution of item information was used to demonstrate the effects of various test revisions (through the addition and/or deletion of items with specific step characteristics) on the resulting test’s precision of measurement. The usefulness of item and test information functions for specific measurement applications of the partial credit model is also discussed.Item Methodology review: Clustering methods(1987) Milligan, Glenn W.; Cooper, Martha C.A review of clustering methodology is presented, with emphasis on algorithm performance and the resulting implications for applied research. After an overview of the clustering literature, the clustering process is discussed within a seven-step framework. The four major types of clustering methods can be characterized as hierarchical, partitioning, overlapping, and ordination algorithms. The validation of such algorithms refers to the problem of determining the ability of the methods to recover cluster configurations which are known to exist in the data. Validation approaches include mathematical derivations, analyses of empirical datasets, and monte carlo simulation methods. Next, interpretation and inference procedures in cluster analysis are discussed. inference procedures involve testing for significant cluster structure and the problem of determining the number of clusters in the data. The paper concludes with two sets of recommendations. One set deals with topics in clustering that would benefit from continued research into the methodology. The other set offers recommendations for applied analyses within the framework of the clustering process.Item A generalized logistic item response model parameterizing test score inappropriateness(1987) Strandmark, Nancy L.; Linn, Robert L.The person response curve has been suggested as a possible model for test score inappropriateness (Lumsden, 1977, 1978; Weiss, 1973). The two-parameter person response curve proposed by Lumsden includes a person slope parameter but abandons the notion of differential item relatedness to the underlying trait. As an alternative, a generalized logistic model is considered that includes all item parameters of the three-parameter logistic model (Birnbaum, 1968). In addition to the usual person location parameter, the model has extra person parameters representing two possible characterizations of test score inappropriateness: a slope parameter indicating the degree to which a person responds differently to items of varying difficulty, and an asymptote parameter measuring a person’s proclivity to engage in effective guessing or to omit items in the presence of partial information. To assess the model’s feasibility, statistical comparisons were made between parameter estimates from data simulated according to the model and the original simulation parameters. The results seem encouraging, but additional empirical study is needed before firm conclusions can be drawn.Item Maximum likelihood estimation of multiple correlations and canonical correlations with categorical data(1987) Lee, Sik-Yum; Poon, Wai-YinIn the behavioral and social sciences, investigators frequently encounter latent continuous variables which are observable only in polytomous form. This paper considers the estimation of multiple correlations and canonical correlations for these variables. Two approaches, the maximum likelihood and the partitioned maximum likelihood, are established based on the corresponding multivariate polyserial and polychoric correlations. A simulation study was conducted to compare the various kinds of estimators.Item Use of the log odds ratio to assess the reliability of dichotomous questionnaire data(1987) Sprott, D. A.; Vogel-Sprott, M. D.The use of the log odds ratio to measure test-retest reliability of dichotomous questionnaire response data is discussed. Its application is illustrated using questionnaire data on family history of problem drinking. The superiority of the log odds ratio as a measure of reliability of such data is discussed. Uninformative datasets are characterized.Item "Technical and practical issues in equating: A discussion of four papers": Reply(1987) Brennan, Robert L.; Kolen, Michael J.We would like to thank Angoff (1987) for his thoughtful and extensive review of the Kolen and Brennan (1987) and Brennan and Kolen (1987) papers. His comments were very helpful to us in clarifying our thinking about a number of issues. Although we find ourselves in agreement with most of his comments, there are two issues that we believe merit further consideration-synthetic population weights and the circular equating paradigm. In retrospect, our initial discussion of these topics probably should have been more extensive. We hope that the following reply will clarify our position with respect to these two issues.Item Technical and practical issues in equating: A discussion of four papers(1987) Angoff, William H.Many of the articles on equating that have appeared during the last 35 years have been concerned with the development and exposition of alternative models of equating, their error functions, and their robustness in the face of violations of the assumptions basic to their development. The four papers presented here are somewhat different. Their purpose, generally, is to go beyond theory, to examine the implications of special problems observed in the application of equating methodology, to search for clarifications and improvements in technique, and to investigate ways in which equating methods may be applied to practical testing problems. Each paper addresses a different set of problems; the present discussion will not attempt to find common issues among them, but will consider each separately in serial order.Item Some practical issues in equating(1987) Brennan, Robert L.; Kolen, Michael J.The practice of equating frequently involves not only the choice of a statistical equating procedure but also consideration of practical issues that bear upon the use and/or interpretation of equating results. In this paper, major emphasis is given to issues involved in identifying, quantifying, and (to the extent possible) eliminating various sources of error in equating. Other topics considered include content specifications and equating, equating in the context of cutting scores, reequating, and the effects of a security breach on equating. To simplify discussion, some issues are treated from the linear equating perspective in Kolen and Brennan (1987).Item Linear equating models for the common-item nonequivalent-populations design(1987) Kolen, Michael J.; Brennan, Robert L.The Tucker and Levine equally reliable linear methods for test form equating in the common-item nonequivalent- populations design are formulated in a way that promotes understanding of the methods. The formulation emphasizes population notions and is used to draw attention to the practical differences between the methods. It is shown that the Levine method weights group differences more heavily than the Tucker method. A scheme for forming a synthetic population is suggested that is intended to facilitate interpretation of equating results. A procedure for displaying form and group differences is developed that also aids interpretation.Item The use of presmoothing and postsmoothing to increase the precision of equipercentile equating(1987) Fairbank, Benjamin A.The effectiveness of smoothing in reducing sample-dependent errors in equipercentile equating of short ability or achievement tests is examined. Fourteen smoothers were examined, 7 applied to the distributions of scores before equating and 7 applied to the resulting equipercentile points. The data for the study included both results of simulations and results obtained in the operational administration of a large testing program. Negative hypergeometric presmoothing was more effective than the other presmoothers. Among the postsmoothers, both orthogonal regression and cubic splines were effective, especially the latter. The use of smoothing methods must be considered in light of their costs (increases in average signed deviations) and benefits (decreases in root mean square deviations). For many purposes, the benefits of smoothing with the negative hypergeometric may outweigh its costs.Item Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances(1987) Cook, Linda L.; Petersen, Nancy S.This paper focuses on a discussion of how various equating methods are affected by (1) sampling error, (2) sample characteristics, and (3) characteristics of anchor test items. Studies that examine the effect of analytic techniques for smoothing or modeling marginal and bivariate frequency distributions on the accuracy of equipercentile equating are reviewed. A need for simulation and empirical studies designed to evaluate the effectiveness of analytic smoothing techniques for recovering the underlying distribution when sample size, test length, and distributional shape are varied is identified. Studies that examine the question of whether an equating transformation remains the same regardless of the group used to define it are also reviewed. The results of some studies suggested that this may not be a problem for forms of a homogeneous test constructed to be similar in all respects. Results of other studies indicated that examinees who take a test on different administration dates may vary in systematic ways and thus affect equating results. Finally, studies which examine the characteristics of anchor test items are reviewed. It is concluded that whenever groups differ in level and dispersion of ability, special care must be taken to assure that the anchor test is a miniature of the total test.Item Introduction to Problems, Perspectives, and Practical lssues in Equating(1987) Brennan, Robert L.Item Cross-validation of the WISC-R factorial structure using three-mode principal components analysis and perfect congruence analysis(1987) Kroonenberg, Pieter M.; Ten Berge, Jos M. F.By using three-mode principal components analysis and perfect congruence analysis in conjunction, the factorial structure of the 11 correlation matrices of the Wechsler Intelligence Scale for Children-Revised was analyzed within a single framework. This allows a unified description showing both the strong similarities between the standardization samples and some small differences related to age. Furthermore, claims about comparability between the WISC-R factorial structure, the structures of other independently conducted studies, and those of several translations of the WISC-R were evaluated. Again the overall similarity was striking, albeit most studies showed lower explained variances. Some age effects seemed to be present here as well. The contribution of three-mode principal components analysis was found to lie primarily in the simultaneous analysis of the standardization samples, while perfect congruence analysis allowed the evaluation of the strengths and the correlations of the common WISC-R components in all studies without encountering rotation problems.Item The correction for restriction of range and nonlinear regressions: An analytic study(1987) Gross, Alan L.; Fleischman, Lynn E.The effect of a nonlinear regression function on the accuracy of the restriction of range correction formula was investigated using analytic methods. Expressions were derived for the expected mean square error (EMSE) of both the correction formula and the squared correlation computed in the selected group, with respect to their use as estimators of the population relationship. The relative accuracy of these two estimators was then studied as a function of the form of the regression, the form of the marginal distribution of x scores, the strength of the relationship, sample size, and the degree of selection. Although the relative accuracy of the correction formula was comparable for both linear and concave regression forms, the correction formula performed poorly when the regression form was convex. Further, even when the regression is linear or concave, it may not be advantageous to employ the correction formula unless the xy relationship is strong and sample size is large.Item Component latent trait models for paragraph comprehension tests(1987) Embretson, Susan E.; Wetzel, C. DouglasThe cognitive characteristics of paragraph comprehension items were studied by comparing models that deal with two general processing stages: text representation and response decision. The models that were compared included the prepositional structure of the text (Kintsch & van Dijk, 1978), various counts of surface structure variables and word frequency (Drum et al., 1981), a taxonomy of levels of text questions (Anderson, 1972), and some new models that combine features of these models. Calibrations from the linear logistic latent trait model allowed evaluation of the impact of the cognitive variables on item responses. The results indicate that successful prediction of item difficulty is obtained from models with wide representation of both text and decision processing. This suggests that items can be screened for processing difficulty prior to being administered to examinees. However, the results also have important implications for test validity in that the two processing stages involve two different ability dimensions.Item Lord's chi-square test of item bias with estimated and with known person parameters(1987) McLaughlin, Mary E.; Drasgow, FritzProperties of Lord’s chi-square test of item bias were studied in a computer simulation. 0 parameters were drawn from a standard normal distribution and responses to a 50-item test were generated using SAT-v item parameters estimated by Lord. One hundred independent samples were generated under each of the four combinations of two sample sizes (N = 1,000 and N = 250) and two logistic models (two- and three-parameter). LOGIST was used to estimate item and person parameters simultaneously. For each of the 50 items, 50 independent chi-square tests of the equality of item parameters were calculated. Proportions of significant chi-squares were calculated over items and samples, at alpha levels of .0005, .001, .005, .01, .05, and .10. The overall proportions significant were as high as 11 times the nominal alpha level. The proportion significant for some items was as high as .32 when the nominal alpha level was .05. When person parameters were held fixed at their true values and only item parameters were estimated, the actual rejection rates were close to the nominal rates.Item An application of the three-parameter IRT model to vertical equating(1987) Harris, Deborah J.; Hoover, H. D.This study examined the effectiveness of the three-parameter IRT model in vertically equating five overlapping levels of a mathematics computation test. One to four test levels were administered within intact classrooms to randomly equivalent groups of third through eighth grade students. Test characteristic curves were derived for each grade/test level combination. It was generally found that an examinee would receive a higher ability estimate if the test level administered had been calibrated on less able examinees. Practical implications for "out-of-level" and adaptive testing are discussed.