Applied Psychological Measurement, Volume 03, 1979

Persistent link for this collection

https://hdl.handle.net/11299/97274

Browse

Now showing 1 - 20 of 44

WISC-R information and Digit Span scores of American and Canadian children
(1979) Beauchamp, David P.; Samuels, Douglas D.; Griffore, Robert J.
This study investigated differences in performance between third-grade American and Canadian children on two subtests of the Wechsler Intelligence Scale for Children-Revised (WISC-R). The Information subtest, which is culturally laden, and the Digit Span subtest, which is less subject to cultural factors, were administered to 30 American and 30 Canadian children. Mean Information scores in the two groups were not significantly different, but Canadian children scored significantly higher on Digit Span. This difference was attributable to their higher mean score on Digit Span Forward. Canadian and American Digit Span Backward mean scores did not differ. Results are discussed in terms of Canadian and American educational and test-taking experiences.
Internal invalidity in pretest-posttest self-report evaluations and a re-evaluation of retrospective pretests.
(1979) Howard, George S.; Ralph, Kenneth M.; Gulanick, Nancy A.; Maxwell, Scott E.; Nance, Don W.; Gerber, Sterling K.
True experimental designs (Designs 4, 5, and 6 of Campbell & Stanley, 1963) are thought to provide internally valid results. This paper describes five studies involving the evaluation of various treatment interventions and identifies a source of internal invalidity when self-report measures are used in a Pretest-Posttest manner. An alternative approach (Retrospective Pretest-Posttest design) to measuring change is suggested, and data comparing its accuracy with the traditional Pretest-Posttest design in measuring treatment effects is presented. Finally, the implications of these findings for evaluation research using self-report instruments and the strengths and limitations of retrospective measures are discussed.
The causal influence of anxiety on academic achievement for students of differing intellectual ability
(1979) Heinrich, Darlene L.
The present study examined the relationship between anxiety and learning within the context of drive theory and trait-state anxiety theory. It was hypothesized that trait anxiety (A-trait) would influence state anxiety (A-state), which in turn would influence academic achievement. The subjects were 86 students enrolled in a graduate education course for whom measures of A-state, A-trait, and achievement were obtained concurrently at three times during the course. GRE scores were used as measures of intellectual ability. Data were analyzed using the frequency-of-change-in-product-moment technique (Yee & Gage, 1968), a causal analysis statistic which permits the determination of source and direction of causal influence in lagged correlational data. Results showed that A-trait influenced A-state and achievement, but the relationship between A-state and achievement was ambiguous. When intellectual ability was considered, A-trait was found to influence A-state and achievement, but only for high-ability students.
Validity and cross-validity of metric and nonmetric multiple regression
(1979) MacCallum, Robert C.; Cornelius, Edwin T., III; Champney, Timothy
Several questions are raised concerning differences between traditional metric multiple regression, which assumes all variables to be measured on interval scales, and nonmetric multiple regression, which treats variables measured on any scale. Both models are applied to 30 derivation and cross-validation samples drawn from two sets of empirical data composed of ordinally scaled variables. Results indicate that the nonmetric model is, on the average, far superior in fitting derivation samples but that it exhibits much more shrinkage than the metric model. The metric technique fits better than the nonmetric in cross-validation samples. In addition, results produced by the nonmetric model are more unstable across repeated samples. A probable cause of these results is presented, and the need for further research is discussed. A common problem in data analysis involves
Pretesting as determinant of attitude change in evaluation research
(1979) Hoogstraten, Joh
Two experiments were done to study the biasing effects of a pretest on subsequent posttest results. The problem of the first experiment was the evaluation of a programmed textbook used by psychology freshmen. It used a separate-sample pretest-posttest design and showed that a pretest containing mostly negative statements on programmed instruction confounded posttest results. The second experiment, using a different treatment, studied the pretest effects of positive or negative statements. The positive version counteracted the development of negative feelings towards the treatment. The negative version did not show a similar sensitizing effect. This was considered a consequence of the rather controversial character of the treatment and the obligatory participation of subjects. The negative statements perhaps confirmed existing attitudes. Three suggestions to control for pretest sensitization effects were given: (1) use research designs with control conditions; (2) separate the pretest phase from the posttest phase; and (3) give more emphasis to designs without pretests.
Empirical versus random item selection in the design of intelligence test short forms-The WISC-R example
(1979) Goh, David S.
This study demonstrated that the design of current intelligence test short forms could be improved by employing a more effective method of item selection based on psychometric theory. Two short forms of the recently published WISC-R were developed, one employing a design determined by empirical item analysis results of the standard test battery and the other employing the well-known Yudin scheme determined by systematic random selection of test items. In all analyses the item analysis method of item selection was shown to yield more accurate results than the Yudin procedure. Practical usefulness as well as limitations of the present WISC-R Short form are discussed.
Assessing achievement motive of American and Israeli managers: Design and application of a three-facet measure
(1979) Elizur, Dov
A questionnaire to assess the presence of achievement motive in various populations was developed and its structure analyzed. A facet definition of achievement motive was suggested which provided guidelines for the creation of items and the formulation of hypotheses. The Achievement Motive Questionnaire was administered to 132 U.S. and 114 Israeli middle managers from various public and private organizations. The results support the main hypotheses. An empirical double-ordered conceptual system was obtained which reflects the two facets characteristic to this study: (1) kind of confrontation (being confronted or confronting an answer with a challenge) and (2) time perspective (before, during, or after) relative to task performance. The behavior modalities facet was found to order the conceptual space from instrumental to affective and cognitive modality. The systematic construction of the questionnaire based on the facet definition of achievement motive made it possible to distinguish differences in achievement tendencies between the U.S. and the Israeli samples.
Ordering power of separate versus grouped true-false tests: Interaction of type of test with knowledge levels of examinees
(1979) Hsu, Louis M.
The ordering power of an objective test was defined in terms of the probability that this test led to the correct ranking of examinees. A comparison of the relative ordering power of separate and grouped-items true-false (T-F) tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Instead, separate-items T-F tests were found to be superior in discriminating among examinees with medium and high levels of knowledge, and grouped-items T-F tests with two and three items per cluster were found to be superior for discriminating among examinees with low levels of knowledge. These findings do not support blanket recommendations such as Ebel’s (1978) that "test constructors should avoid constructing items in multiple-choice form which are essentially collections of T-F statements" (p. 43) or that, in general, "it is better to present such statements as independent T-F items" (p. 43). Rather, they are similar to Lord’s (1977) findings concerning the relative efficiency of multiple-choice tests with different numbers of options per question for examinees of differing ability levels.
The Rasch model as additive conjoint measurement
(1979) Perline, Richard; Wright, Benjamin D.; Wainer, Howard
The object of this paper is to present Rasch’s psychometric model as a special case of additive conjoint measurement. The connection between these two areas has been discussed before, but largely ignored. Because the theory of conjoint measurement has been formulated deterministically, there have been some difficulties in its application. It is pointed out in this paper that the Rasch model, which is a stochastic model, does not suffer from this fault. The exposition centers on the analyses of two data sets, each of which was analyzed using Rasch scaling methods as well as some of the methods of conjoint measurement. The results, using the different procedures, are compared.
The reliability of dichotomous judgments: Unequal numbers of judges per subject
(1979) Fleiss, Joseph L.; Cuzick, Jack
Consider a reliability study in which different subjects are judged on a dichotomous trait by different sets of judges, possibly unequal in number. A kappa-like measure of reliability is proposed, its correspondence to an intraclass correlation coefficient is pointed out, and a test for its statistical significance is presented. A numerical example is given.
Binomial test models and item difficulty
(1979) Van der Linden, Wim J.
In choosing a binomial test model, it is important to know exactly what conditions are imposed on item difficulty. In this paper these conditions are examined for both a deterministic and a stochastic conception of item responses. It appears that they are more restrictive than is generally understood and differ for both conceptions. When the binomial model is applied to a fixed examinee, the deterministic conception imposes no conditions on item difficulty but requires instead that all items have characteristic functions of the Guttman type. In contrast, the stochastic conception allows non- Guttman items but requires that all characteristic functions must intersect at the same point, which implies equal classically defined difficulty. The beta-binomial model assumes identical characteristic functions for both conceptions, and this also implies equal difficulty. Finally, the compound binomial model entails no restrictions on item difficulty.
Dimensions and clusters: A hybrid approach to classification
(1979) Skinner, Harvey A.
A hybrid strategy is described for integrating the dimensional and discrete clusters approaches to classification research. First, a parsimonious set of dimensions is sought through a multiple replications design. The computations employ a two-stage least squares solution that is based on a sequential application of the Eckart and Young (1936) decomposition. Second, relatively homogeneous subgroups are identified within this low dimensional space using a clustering or density search algorithm. To facilitate interpretation of the final solution, an ideal type concept is introduced that is similar to the "idealized individual" interpretation of multidimensional scaling. Depending upon the model chosen, the independent contribution of elevation, scatter, and shape parameters may be differentiated in defining profile similarity.
An empirical study of the accuracy of corrections for restriction in range due to explicit selection
(1979) Greener, Jack M.; Osburn, H. G.
An empirical study of the corrections for restriction in range due to explicit selection resulted in the following conclusions. (1) The corrected sample correlation was no more accurate than the uncorrected sample correlation for low unrestricted population correlations in the range .10 to .25. (2) For large unrestricted population correlations in the range .60 to .80, the corrected sample correlation was always more accurate than the uncorrected sample correlation. (3) For moderate (.30 to .55) unrestricted population correlations, the corrected sample correlation was typically more accurate than the uncorrected sample correlation. (4) The correction was very sensitive to moderate departures from linearity but was quite robust in the face of rather substantial departures from homoscedasticity.
On the robustness of a class of naive estimators
(1979) Wainer, Howard; Thissen, David
A class of naive estimators of correlation was tested for robustness, accuracy, and efficiency against Pearson’s r. Tukey’s r., and Spearman’s ro. It was found that this class of estimators seems in some respects to be superior being less affected by outliers, reasonably efficient, and frequently more easily calculated. The definition and details of the use of these naive estimators are the subject of this paper.
The hierarchical structure of formal operational tasks
(1979) Bart, William M.; Mertens, Donna M.
The hierarchical structure of the formal operational period of Piaget’s theory of cognitive development was explored through the application of ordering theoretic methods to a set of data that systematically tapped the various formal operational schemes. The results suggest that the tasks within some schemes are empirically equivalent. While the response patterns were quite varied, the results do suggest that some common structure may underlie performance on the tasks, thus supporting Piaget’s notion of the integrative structure of the period.
Scaling behavioral anchors
(1979) Landy, Frank J.; Barnes, Janet L.
Although behaviorally anchored rating scales (BARS) have both intuitive and empirical appeal, they have not always yielded superior results in contrast with graphic rating scales. The present study examined the issue of how behavioral descriptions are anchored. Subjects scaled anchors describing teaching performance in a college classroom using either a graphic rating procedure or a pair-comparison procedure. The two different methods resulted in scale anchors with different properties, particularly with respect to item dispersions. It was proposed that the choice of an anchoring procedure depends on the nature of the actual rating process.
Systematic Errors in Approximations to the Standard Error of Measurement and Reliability
(1979) Kleinke, David J.
Lord’s approximation to the standard error of measurement of a test uses only n, the number of items. Millman’s is based on n and p̄, the mean difficulty. Saupe has used Lord’s approximation to derive an approximation to the reliability. Through an empirical demonstration involving 200 classroom tests, all three approximations are shown to be biased. The Lord and Millman approximations overestimate s[subscript x]√(1-KR20), and thus Saupe’s underestimates r[subscript x, subscript x prime] for these tests. The unweighted mean of the tests’ mean item difficulties was .68, supporting Lord’s original warning that his approximation be used cautiously with tests that are either very difficult or very easy. Still, the approximations did correlate very highly with their criteria, supporting their continued limited use.
Academic achievement and individual differences in learning processes
(1979) Schmeck, Ronald R.; Echternacht, Gary J.
This study was concerned with the degree of relationship between academic achievement, as assessed by college grade-point average, and information-processing habits relevant to learning, as assessed by the scales of the Inventory of Learning Processes (ILP). The ILP scales of Synthesis-Analysis, Fact Retention, and Elaborative Processing were significantly related to GPA and scores on the American College Testing (ACT) Program Assessment. Thus, the successful student seems to process information in depth and encode it elaboratively, while simultaneously retaining the details of the original information. Unexpectedly, the Study Methods scale demonstrated a small but significant negative relationship with ACT scores. A path analysis suggested that the effects which Fact Retention and Elaborative Processing have upon GPA are mainly direct, while the effect of Synthesis-Analysis is mostly interpreted by ACT.
Application of a simplex process model to six years of cognitive development in four demographic groups
(1979) Humphreys, Lloyd G.; Park, Randolph D.; Parsons, Charles K.
A simplex process model of the cross-lagged correlation paradigm was applied to 16 tests administered to samples of black and white males and black and white females in Grades 5, 7, 9, and 11. Listening, a measure of aural comprehension, consistently anticipated individual differences on an intellectual composite in all four groups. The other achievement test of the STEP series anticipated individual differences on the so-called aptitude tests of SCAT, which in turn anticipated individual differences on the narrow information scores obtained from the Test of General Information (TGI). This model may be more powerful in revealing lags than the traditional methods of analyzing cross-lagged differences in longitudinal data. The model does not require stationarity and can produce a meaningful outcome in its absence.
Knowledge of results and the proportion of positive feedback on tests of ability
(1979) Prestwood, J. Stephen
Students were administered one of three conventional or one of three stradaptive vocabulary tests with or without knowledge of results (KR). The three tests of each type differed in the expected proportion of correct responses to the test items and thus in the expected proportion of positive feedback. Results indicated that the mean maximum- likelihood estimates of individuals’ abilities varied as a joint function of KR-provision and test difficulty. Students receiving KR scored highest on the most-difficult test and lowest on the least-difficult test; students receiving no KR scored highest on the least-difficult test and did most poorly on the most-difficult test. Although the students perceived the differences in test difficulty, there were no effects on mean student anxiety or motivation scores attributable to difficulty or proportion of positive feedback alone. Regardless of the proportion of positive feedback, students reacted very favorably to receiving KR, and its provision increased the mean level of reported motivation.