Applied Psychological Measurement, Volume 05, 1981
Persistent link for this collectionhttps://hdl.handle.net/11299/97276
Search within Applied Psychological Measurement, Volume 05, 1981
Browse
Recent Submissions
Item A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model(1981) Wilcox, Rand R.Based on recently published papers, it might be tempting to routinely apply the beta-binomial model to obtain a single administration estimate of the reliability of a mastery test. Using real data, the paper illustrates two practical problems with estimating reliability in this manner. The first is that the model might give a poor fit to data, which can seriously affect the reliability estimate, and the second is that inadmissible estimates of the parameters in the beta-binomial model might be obtained. Two possible solutions are described and illustrated.Item Estimating the parameters of Emrick's mastery testing mode(1981) Van der Linden, Wim J.Emrick’s model is a latent class or state model for mastery testing that entails a simple rule for separating masters from nonmasters with respect to a homogeneous domain of items. His method for estimating the model parameters has only restricted applicability inasmuch as it assumes a mixing parameter equal to .50 and an a priori known ratio of the two latent success probabilities. The maximum likelihood method is also available but yields an intractable system of estimation equations which can only be solved iteratively. The emphasis in this paper is on estimates to be computed by hand but nonetheless accurate enough for most practical situations. It is shown how the method of moments can be used to obtain such "quick and easy" estimates. In addition, an endpoint method is discussed that assumes that the parameters can be estimated from the tails of the sample distribution. A monte carlo experiment demonstrated that for a great variety of parameter values, test lengths, and sample sizes, the method of moments yields excellent results and is uniformly much better than the endpoint method.Item Analysis of test results via log-linear models(1981) Baker, Frank B.; Subkoviak, Michael J.The recently developed log-linear model procedures are applied to three types of data arising in a measurement context. First, because of the historical intersection of survey methods and test norming, the log-linear model approach should have direct utility in the analysis of norm-referenced test results. Several different schemes for analyzing the homogeneity of test score distributions are presented that provide a finer analysis of such data than was previously available. Second,the analysis of a contingency table resulting from the cross-classification of students on the basis of criterion-referenced test results and instructionally related variables is presented. Third, the intersection of log-linear models and item parameter estimation procedures under latent trait theory are shown. The illustrative examples in each of these areas suggest that log-linear models can be a versatile and useful data analysis technique in a measurement context.Item A note on a statistical paradigm for the evaluation of cognitive structure in physics instruction(1981) Gliner, Gail S.Data from Shavelson’s (1972) study investigating the change in cognitive structure due to instruction in Newtonian mechanics were reanalyzed using the quadratic assignment (QA) approach. This application of the QA technique involves a nonparametric confirmatory procedure to evaluate whether a hypothesized structure is present in a proximity matrix representing cognitive structure. The proximity matrices in Shavelson’s study were obtained from multiple-response word association tests at a pretest and at the end of each day of a 5-day instructional sequence. The Euclidean distance measure Shavelson used to evaluate change toward a representation of the instructional content, however, did not measure structural change in the proximity matrices for cognitive structure. The present reanalysis using the QA paradigm showed that cognitive structure after instruction was similar to content structure before instruction, and word association tests did not measure any subtle changes toward greater similarity to content structure. However, the QA results provided some evidence for Shavelson’s contention that the experimental group’s cognitive structure changed toward greater homogeneity later in instruction.Item Individual differences in the validity of a cognitive processing model for responses to personality inventories(1981) De Boeck, PaulAn individual difference hypothesis was tested with respect to the validity of a vector type cognitive processing model for inventory responses. The validity index may be also considered an index of the conformity of inventory responses to the meaning structure of the items. Three short adjective and sentence type inventories were used, two of them consisting of mixed sets of items and one consisting of positive sets. It was concluded from the only low positive intercorrelations of the validities that individual differences are minor. Furthermore, evidence was found for an effect of the order of presentation of the inventories and for an effect of the inventory composition (positive or mixed), but not for a higher validity of the model for adjective than for sentence types of inventories.Item Measuring equity in intimate relations(1981) Traupmann, Jane; Petersen, Robert; Utne, Mary; Hatfield, ElaineIt has been suggested that equity theory, a social psychological theory concerned with the fairness in casual relationships, should be applicable to intimate relations as well. As a first step in that direction, this report describes the development of the Traupmann-Utne-Walster Equity/Inequity Scales, which measure the level of equity that intimate couples perceive in their relationships. The scales, which include items from four areas of concern for intimates-personal concerns, emotional concerns, day-to-day concerns, and opportunities gained or lost-are described, and data from two empirical studies are reported. The first study demonstrates the internal consistency reliability of the scales. The second study reports data relevant to the construct validity of the scales. Two constructs derived from equity theory-affect and satisfaction-shown to behave in the predicted way when the Traupmann- Utne-Walster Scales are used as the measure of inequity.Item O-factor analysis of mood ratings(1981) Edvardsson, Bo; Vegelius, JanThis paper demonstrates the usefulness of O-factor analysis in studying how feelings are structured and how situational factors influence feelings. Four subjects made self-ratings on several occasions using variables they had chosen themselves. Data were O-factor analyzed, and factors were interpreted in terms of feelings common for situations which loaded on the factors. Problems and applicability of the method in psychotherapy research are discussed.Item A contribution to the construct validity of the Tennessee Self-Concept Scale: A confirmatory factor analysis(1981) McGuire, Beth; Tinsley, Howard E.Non-statistical confirmatory factor analyses of the items on the Tennessee Self-Concept Scale (TSCS) were performed on samples of 678 university students and 341 male juvenile offenders to test hypotheses regarding the internal structure of the instrument. For the college sample, good confirmation of the external and internal frames of reference postulated by Fitts (1965) were obtained, but support for the internal x external cross-classification was not obtained. No support for any of the hypotheses was found for the juvenile sample; rather, one major factor emerged. These findings are related to Super’s theory of self-concept development, and implications of these findings regarding the psychometric properties of the TSCS and its use are discussed.Item Scaling Interpersonal Checklist items to a circular model(1981) McCormick, Clarence C.; Kavanagh, Jack A.The items of the Interpersonal Checklist (ICL) were scaled to a circular model by two different procedures: (1) the items were sorted by two different samples into categories corresponding to the labels of the eight ICL octants and (2) each item was scaled twice by a third sample, first on the 9-point bipolar scale Hate-Love, and second on the 9-point bipolar scale Dominance-Submissiveness. The two sets of ratings were found to correlate -.09, indicating that the two postulated dimensions presumed to underly the circular order are orthogonal. The items were then plotted into the plane formed by using the two scales as axes. The circular scale values calculated for the two sorting procedures correlated .95. The scale values obtained from the sorting procedure and those from the two-dimensional procedure correlated .89. In general, the plotted items followed a circular order from close synonymity to antonymity and back. Several gaps on the circle were found, indicating an inadequate sampling of items. Scale values were calculated for each of the eight ICL scales using the circular scale values as item weights. When these values were plotted and compared with a factor plot of the eight ICL scales, the plots were remarkably similar. Many items were found to be displaced by the scaling procedures from the placements given by the authors of the ICL. Most of these displacements were found to be related to an intensity dimension postulated by the authors of the ICL. In general, the mild/moderate items were scaled toward the Love and Dominance poles, and the strong/extreme items were scaled toward the Hate and Submissive poles, thus pulling the items away from the scales they were intended to represent. Some of the major implications of the use of these procedures in the construction of personality instruments are discussed.Item Solving measurement problems with an answer-until-correct scoring procedure(1981) Wilcox, Rand R.Answer-until-correct (AUC) tests have been in use for some time. Pressey (1950) pointed to their advantages in enhancing learning, and Brown (1965) proposed a scoring procedure for AUC tests that appears to increase reliability (Gilman & Ferry, 1972; Hanna, 1975). This paper describes a new scoring procedure for AUC tests that (1) makes it possible to determine whether guessing is at random, (2) gives a measure of how "far away" guessing is from being random, (3) corrects observed test scores for partial information, and (4) yields a measure of how well an item reveals whether an examinee knows or does not know the correct response. In addition, the paper derives the optimal linear estimate (under squared-error loss) of true score that is corrected for partial information, as well as another formula score under the assumption that the Dirichlet-multinomial model holds. Once certain parameters are estimated, the latter formula score makes it possible to correct for partial information using only the examinee’s usual number-correct observed score. The importance of this formula score is discussed. Finally, various statistical techniques are described that can be used to check the assumptions underlying the proposed scoring procedure.Item Information structure for geometric analogies: A test theory approach(1981) Whitely, Susan E.; Schneider, Lisa M.Although geometric analogies are popular items for measuring intelligence, the information processes that are involved in their solution have not been studied in a test theory context. In the current study, processing is examined by testing alternative models of information structure on geometric analogies. In contrast to the treatment of models in other studies that have appeared in the cognitive literature, the models are tested jointly as mathematical models of processing and as latent trait models of individual differences. The joint modeling was achieved by applying the one-parameter linear logistic latent trait model to predict response accuracy from information structure. The results supported the model that distinguished between spatial distortion and spatial displacement transformations, which have opposite effects on item difficulty. Further, no significant sex difference in overall accuracy or processing were observed. Implications of the results for processing mechanisms and test design are discussed.Item Evaluating goodness of fit in nonmetric multidimensional scaling by ALSCAL.(1981) MacCallum, Robert C.Two types of information are provided to aid users of ALSCAL in evaluating goodness of fit in nonmetric two-way and three-way multidimensional scaling analyses. First, equations are developed for estimating the expected values of SSTRESS and STRESS for random data. Second, a table is provided giving mean values of SSTRESS and STRESS for structured artificial data. This information provides the empirical investigator with a second comparative basis for evaluating values of these indices.Item The Rasch model as a loglinear model(1981) Mellenbergh, Gideon J.; Vijn, PieterThe Rasch model is formulated as a loglinear model. The goodness of fit and parameter estimates of the Rasch model can be obtained using the iterative proportional fitting algorithm for loglinear models. It is shown in an example that the relation between the estimates of the iterative proportional fitting algorithm and the unconditional maximum likelihood Rasch algorithm are almost perfectly linear. The Rasch model can be extended with a design for the items, which can be formulated as a loglinear model.Item A cross-cultural analysis of the fairness of the Cattell Culture Fair Intelligence Test using the Rasch model(1981) Nenty, H. Johnson; Dinero, Thomas E.Logistic models can be used to estimate item parameters of a unifactor test that are free of the examinee groups used. The Rasch model was used to identify items in the Cattell Culture Fair Intelligence Test that did not conform to this model for a group of Nigerian high school students and for a group of American students, groups believed to be different with respect to race, culture, and type of schooling. For both groups a factor analysis yielded a single factor accounting for 90% of the test’s variance. Although all items conformed to the Rasch model for both groups, 13 of the 46 items had significant between score group fit in either the American or the Nigerian sample or both. These were removed from further analyses. Bias was defined as a difference in the estimation of item difficulties. There were six items biased in "favor" of the American group and five in "favor" of the Nigerian group; the remaining 22 items were not identified as biased. The American group appeared to perform better on classification of geometric forms, while the Nigerians did better on progressive matrices. It was suggested that the replicability of these findings be tested, especially across other types of stimuli.Item Nonmetric interactive multidimensional scaling with multiple subjects(1981) Hamer, Robert M.A method for nonmetric interactive multidimensional scaling (MDS) of similarity judgments is described which is also capable of using responses from previous judges to supplement the judgments of a current subject. The method combines recent advances in interactive MDS with recent advances in numerical methods in MDS to produce a program capable (1) of performing nonmetric interactive MDS and (2) of fitting a wide variety of models, such as the individual differences model. The empirical investigation compared three versions of the system: (1) a metric simple Euclidean model-fitting version (similar to previous interactive scaling programs); (2) a metric individual differences version; and (3) a nonmetric individual differences version. There were no statistically significant differences among the three versions.Item Analogical reasoning under different methods of test administration(1981) Dillon, Ronna F.One hundred eighty-five college undergraduates were given the Advanced Progressive Matrices under one of five conditions of testing: standard, simple feedback, examinee verbalization during problem solution, elaborated feedback, and full elaboration. The Group Embedded Figures Test, Paragraph Completion Test, and Zelniker and Jeffrey’s revision of the Matching Familiar Figures Test were also administered. The study was designed (1) to investigate the differential effects of method of test administration on performance for college students and (2) to examine the relationship of individual differences dimensions and varying conditions of testing. Analysis of variance coupled with orthogonal comparisons revealed higher levels of performance under the more elaborative testing conditions. The cognitive style variables were differentially related to performance in the different testing conditions. The processing dimensions were related to performance to a higher degree under partially elaborative conditions than under either nonelaborative procedures or full elaboration. Results are discussed in terms of an activation model.Item The Remote Associates Test as a predictor of productivity in brainstorming groups(1981) Forbach, Gary B.; Evans, Ronald G.Two studies investigated the validity of the Remote Associates Test (RAT) in predicting productivity in brainstorming groups. In Study 1 groups of high and low RAT scorers discussed two problems relevant to social concerns (energy conservation, rape prevention). In Study 2 Alternate Uses and Consequences problems were discussed by groups composed of heterogeneous RAT scorers. In each study the RAT was significantly related to fluency, flexibility, and originality of ideas generated by group members, with these effects appearing consistently across problems. In addition, Study 2 indicated that the RAT relationships to creativity indices were independent of verbal intelligence. Preliminary data were also gathered regarding RAT relationships to idea generation while working individually and to the potential value of the Marlowe- Crowne Scale as a predictor of brainstorming productivity.Item Nonverbal communication tests as predictors of success in psychology and counseling(1981) Livingston, Samuel A.Six tests of nonverbal communication skills were investigated in an attempt to improve prediction of success for psychologists and counselors. The subjects were graduate students at two different schools; the criterion variables were faculty members’ judgments of the students’ academic work, interpersonal relations, personal characteristics, and "predicted effectiveness" in the profession. Faculty ratings were collected several months after students were tested. One of the six nonverbal communication tests predicted faculty ratings of several characteristics at both schools. This test was uncorrelated with the Graduate Record Examinations and only weakly correlated with the Group Embedded Figures Test, as were most of the other nonverbal communication tests.Item The role of noncognitive measures in medical school admissions(1981) Stricker, Lawrence J.The value of noncognitive measures in medical school admissions was assessed in light of the existing literature. These measures appear to have limited usefulness in predicting success in academic work but may be valuable in forecasting both performance in clinical training and performance as a physician, as well as forecasting choice of the type of practice and its location. Noncognitive measures are useful as predictors of such criteria and may be valuable in forecasting the decisions of admissions committees; their use as moderator variables, however, is problematic. Newer personality and interest inventories, along with biographical questionnaires, are the most promising measures. Older interest inventories may have some value; but traditional personality inventories and projective techniques, as well as interviews, seem to have limited usefulness. The merit of the other measures is uncertain: Letters of recommendation are probably of little use; but cognitive style tests, objective performance devices, and special adaptations of projective techniques deserve more attention. The evaluation of noncognitive measures is hampered by inadequate criteria. Distortion by examinees threatens all self-report measures but can be controlled.Item Factorial invariance in student ratings on instruction(1981) Bejar, Isaac I.; Doyle, Kenneth O.The factorial invariance of student ratings of instruction across three curricular areas was investigated by means of maximum likelihood factor analysis. The results indicate that a one-factor model was not completely adequate from a statistical point of view. Nevertheless, a single factor was accepted as reasonable from a practical point of view. It was concluded that the single factor was invariant across three curricular groups. The reliability of the single factor was essentially the same in the three groups, but in every case it was very high. Some of the theoretical and practical implications of the study were discussed.
- «
- 1 (current)
- 2
- 3
- »