Applied Psychological Measurement

Persistent link for this community

https://conservancy.umn.edu/handle/11299/93227

Applied Psychological Measurement (APM), a research journal that publishes leading-edge methodological research on the measurement of individual differences in psychological, educational, and other social sciences variables, was begun in 1977 by Professor David J. Weiss of the Department of Psychology at the University of Minnesota. Professor Weiss edited the journal for its first 25 years. APM was first published by West Publishing Company of St. Paul, Minnesota, during 1977-1980. In 1980, Professor Weiss founded Applied Psychological Measurement Inc., a non-profit tax-exempt scientific publishing corporation, to continue publishing APM. APM Inc. acquired the copyrights from West Publishing for Volumes 1 through 4 and published the journal through its 20th volume (1996). APM was sold to Sage Publications in 1997, who continues to publish the journal. Funds from that sale and subsequent royalties remain with APM Inc. and are used to support graduate student research in psychological and educational measurement.

This archive includes all the regular articles published in Volumes 1 through 20 (1977 through 1996). Published material not included in these archives, or in the archived Tables of Contents, includes material such as Book Reviews, Computer Program Exchanges, Software Reviews, and announcements. This material is available in the printed copies of the journal and can be obtained through a library. In the few cases for which corrections were published after the original articles were published, the corrections have been inserted into the article files in this archive.

Browse

Now showing 1 - 20 of 734

Effects of computerized administration on scores on the Minnesota Multiphasic Personality Inventory
(1977) Biskin, Bruce H.; Kolotkin, Ronette L.
This study investigated the effects of administering a personality inventory by computer. Both the results of the initial study and a replication suggest that significant differences exist between paper-pencil and computer administrations of the MMPI on the cannot say (?) scale and scale 6 (Paranoia). However, there appears to be no set of items that would account for these scale differences. Differences on the ? scale were explained in terms of the different methods used to omit items in each condition. Differences on scale 6 were small, and the clinical significance of that difference needs to be investigated further. Implications for future research on computer-administered personality instruments are discussed.
The relationship between the perceived risk and attractiveness of gambles: A multidimensional analysis
(1977) Nygren, Thomas E.
Judgments of perceived risk and attractiveness for a set of 50 two-outcome gambles were obtained from 39 college students. The data were used to test various ordinal properties of the gambles implied by Pollatsek and Tversky’s theory of risk and Coombs’ Portfolio theory. In addition, the MDPREF multidimensional scaling procedure was used (1) to test the assumption that gambles are perceived and evaluated as multidimensional stimuli; (2) to determine the characteristics of gambles affecting perceived risk and attractiveness; (3) to assess the extent of individual differences in perception of gambles; and (4) to test the implication of Portfolio theory that attractiveness is a function of perceived risk and expected value. The results supported the multidimensional nature of gambles and the implications of Portfolio theory. In the MDPREF analyses large individual differences were found in perceived risk and attractiveness of gambles. Potential uses of multidimensional scaling techniques in further research on individual differences in gambling behavior are proposed and discussed.
Using computerized tests to measure new dimensions of abilities: An exploratory study
(1977) Cory, Charles H.; Rimland, Bernard; Bryson, Rebecca A.
A battery of Graphic Information Processing Tests (GRIP) was developed to utilize the display characteristics of computer terminals in measuring abilities important for processing visually presented information. The GRIP battery was especially intended to assess five "real world" personal attributes which have been difficult to measure with paper-and-pencil tests. The experimental tests were administered to 385 Navy enlisted men and evaluated in conjunction with paper-and-pencil tests of the same attributes as well as with operational cognitive tests and biographical variables. The GRIP tests were found to be useful for measuring short-term memory and sequential reasoning abilities.
On the relationships between short-term learning and fluid and crystallized intelligence.
(1977) Hundal, P. S; Horn, John L.
Eleven indicants of intelligence and 10 measures of short-term learning were studied in a sample of 265 fourteen-year-olds using the inter-battery methods developed by Tucker. The results indicated two broad factors of intelligence, interpreted as fluid intelligence (Gf) and crystallized intelligence (Gc), coordinate with two broad factors of shortterm learning, interpreted as indicating primary memory (PM) and secondary acquisition (SAC). To a considerable extent the learning variables were independent of the indicants of intelligence, thus suggesting (in conformance with previous findings) that intelligence should not be equated with learning over short periods of time. The major variance in common between short-term learning and intelligence variables is linked to meaningful associations and learning mediated by such associations, but to a lesser extent both Gf and Gc involve the span of apprehension of primary memory. The results suggest that acquisition mediated by meaningful associations is more nearly characteristic of Gc than of Gf, but this may mainly reflect the selection of variables used in this study.
Scoring field dependence: A methodological analysis of five rod-and-frame scoring systems
(1977) McGarvey, Bill; Maruyama, Geoffrey; Miller, Norman
The most consistently used scoring system for the rod-and-frame task has been the total (or average) number of degrees in error from the true vertical, regardless of the initial or final directions of the rod and frame. Since a logical case can be made for at least four alternative scoring systems, a thorough comparison of all five systems seemed appropriate. Comparisons consisted of: (1) an internal consistency/reliability analysis, with split-half and test-retest reliabilities and a multitrait-multimethod matrix analysis of each scoring system, chair, frame, and man position; (2) a repeated measures ANOVA, with ethnic group, sex, and grade as between factors and chair, frame, and man positions as within factors; and (3) correlations of each scoring system with a selected set of external criteria. Results suggest strong support for use of the natural logarithm of the sum of absolute errors as the preferred scoring system, that concern with the confounding of field dependence and the E effect is largely unwarranted, and that all but one of the scoring systems perform adequately.
Effects of individual optimization in setting the boundaries of dichotomous items on accuracy of estimation
(1977) Samejima, Fumiko
Applying the normal ogive model of latent trait theory, two sets of data, simulated and empirical, were analyzed. The objective was to determine how much accuracy of estimation of the subjects’ latent ability can be maintained by tailoring for each testee the order of presentation of the items and the border of dichotomization for each item. This was compared to the information provided by the original graded test items. Results indicated that tailored testing is promising especially when the number of items is not too small, and that a graded item can effectively be used as the initial item in tailored testing because of its branching effect.
On the equivalence of constructed-response and multiple-choice tests
(1977) Traub, Ross E.; Fisher, Charles W.
Two sets of mathematical reasoning and two sets of verbal comprehension items were cast into each of three formats-constructed response, standard multiple-choice, and Coombs multiple-choice-in order to assess whether tests with identical content but different formats measure the same attribute, except for possible differences in error variance and scaling factors. The resulting 12 tests were administered to 199 eighth-grade students. The hypothesis of equivalent measures was rejected for only two comparisons: the constructed-response measure of verbal comprehension was different from both the standard and the Coombs multiple-choice measures of this ability. Maximum likelihood factor analysis confirmed the hypothesis that a five-factor structure will give a satisfactory account of the common variance among the 12 tests. As expected, the two major factors were mathematical reasoning and verbal comprehension. Contrary to expectation, only one of the other three factors bore a (weak) resemblance to a format factor. Tests marking the ability to follow directions, recall and recognition memory, and risk-taking were included, but these variables did not correlate as expected with the three minor factors.
Bayesian tailored testing and the influence of item bank characteristics
(1977) Jensema, Carl J.
Owen’s (1969) Bayesian tailored testing method is introduced along with a brief review of its derivation. The characteristics of a good item bank are outlined and explored in terms of their influence on the Bayesian tailoring process. The results clearly demonstrate importance of a good item bank; one having a sufficient number of items with high discrimination, low guessing probability, and a uniform distribution of difficulty.
The CES-D Scale: A self-report depression scale for research in the general population
(1977) Radloff, Lenore Sawyer
The CES-D scale is a short self-report scale designed to measure depressive symptomatology in the general population. The items of the scale are symptoms associated with depression which have been used in previously validated longer scales. The new scale was tested in household interview surveys and in psychiatric settings. It was found to have very high internal consistency and adequate test-retest repeatability. Validity was established by patterns of correlations with other self-report measures, by correlations with clinical ratings of depression, and by relationships with other variables which support its construct validity. Reliability, validity, and factor structure were similar across a wide variety of demographic characteristics in the general population samples tested. The scale should be a useful tool for epidemiologic studies of depression.
Ability factor differentiation, Grades 5 through 11
(1977) Atkin, Robert; Bray, Robert; Davison, Mark L.; Herzberger, Sharon; Humphreys, Lloyd G.; Selzer, Uzi
Factor analyses have been computed in samples of white male and female and black male and female students for the same 16 cognitive variables at grade levels 5, 7, 9, and 11. Samples for each of the four independent groups remained constant at the four grade levels. The latent roots as analyzed in three ways show a clear but small increase in the number of common factors during this time period, particularly for the white groups. Rotated factor loadings also support the differentiation hypothesis. For the white males, who showed the clearest evidence for differentiation of abilities, rotated loadings provide descriptions of the emerging factors. Although the evidence for differentiation is less clear in white females, the emerging factors appear to become identical by the 11th grade. Data for black males and females, which are based on smaller Ns, are more ambiguous.
Some properties of a Bayesian adaptive ability testing strategy
(1977) McBride, James R.
Four monte carlo simulation studies of Owen’s Bayesian sequential procedure for adaptive mental testing were conducted. In contrast to previous simulation studies of this procedure which have concentrated on evaluating it in terms of the correlation of its test scores with simulated ability in a normal population, these four studies explored a number of additional properties, both in a normally distributed population and in a distribution-free context. Study 1 replicated previous studies with finite item pools, but examined such properties as the bias of estimate, mean absolute error, and correlation of test length with ability. Studies 2 and 3 examined the same variables in a number of hypothetical infinite item pools, investigating the effects of item discriminating power, guessing, and variable vs. fixed test length. Study 4 investigated some properties of the Bayesian test scores as latent trait estimators. The properties of interest included the conditional bias of the ability estimates, the information curve of the trait estimates, and the relationship of test length to ability level. The results of these studies indicated that the ability estimates derived from the Bayesian testing strategy were highly correlated with ability level. However, the ability estimates were also highly correlated with number of items administered, were non-linearly biased and provided measurements which were not of equal precision at all levels of ability.
Content validity: The source of my discontent.
(1977) Guion, Robert M
The concept of content validity takes on special importance where invoked to justify use of a test. The term 1) refers to psychological measurement, 2) using samples of behavior, sampling both stimulus and response components, and 3) implies representativeness in sampling. Examples are given to show that content sampling may be considered a form of operationalism in defining constructs. Five conditions are proposed as necessary if one is to accept the use of a measuring instrument as a valid operational definition on the basis of content sampling alone.
Intransivity on paired-comparisons instruments: The relationship of the total circular triad score to stimulus circular triads
(1977) Hendel, Darwin D.
Intransitivity associated with the method of paired comparisons for scaling stimulus objects has been hypothesized in previous research to relate to the psychological and/or physical distance between stimulus objects. The purpose of the present study was to determine whether paired-comparisons intransitivity is a function of intransitivity associated with specific stimulus objects rather than a function of the entire set of stimulus objects. Three 190-item, paired-comparisons instruments with diverse content (i.e., vocational needs, mate preferences, and food preferences) were designed to examine the relationship between Stimulus Circular Triads and the Total Circular Triad score and were administered to 276 high school and 358 college students. Results of univariate correlational analyses and multiple-regression techniques suggested that paired-comparisons intransitivity relates to individual differences variables associated with the respondent, although there were differences in the absolute level of intransitivity associated with each of the three sets of stimuli.
A quantitative method for separation of semantic subspaces
(1977) Tzeng, Oliver C. S.
A new method for separating affective and denotative meaning subsystems in semantic differential ratings of any homogeneous concept domain is developed and illustrated using personality ratings data. Results indicated that Osgood’s Evaluation, Potency and Activity dimensions were dominant in the underlying semantic structure of personality conceptions and that four dimensions in the "other" subspace, orthogonal to Affect, were clearly interpretable, "affect-free" descriptive features of personalities. Possible applications of this model to other social and psychological research are discussed.
Relative utility of computerized versus paper-and-pencil tests for predicting job performance
(1977) Cory, Charles H.
This article, the second of two, presents predictive validity data for on-job performance for a set of computerized Graphic and Interactive Processing (GRIP) tests in conjunction with data for both experimental paper-and-pencil and operational tests. Validity coefficients for job element and global criteria are reported for four different jobs. Experimental variables substantially enhanced the predictive accuracy of the operational battery for Sonar Technicians. Most experimental tests with significant validities were computer-administered. The GRIP tests were more useful than paper-and-pencil tests for identifying personnel skilled in Interpreting Visual Displays, Adjusting Equipment, and Working Under Distractions. They were useful supplements to paper-and-pencil tests for identifying skill in four additional job elements.
A use of the information function in tailored testing
(1977) Samejima, Fumiko
Several important and useful implications in latent trait theory, with direct implications for individualized adaptive or tailored testing, are pointed out. A way of using the information function in tailored testing in connection with the standard error of estimation of the ability level using maximum likelihood estimation is suggested. It is emphasized that the standard error of estimation should be considered as the major index of dependability, as opposed to the reliability of a test. The concept of weak parallel forms is expanded to testing procedures in which different sets of items are presented to different examinees. Examples are given. Researchers have tended to use latent trait theory rather than classical test theory in research on individualized adaptive or tailored testing. This is quite natural, since latent trait theory has definite merits over classical test theory in many crucial matters. Because of the lack of opportunities to really learn the theory, however, these researchers tend to overlook some important implications in latent trait theory. As a result, its full use has not yet materialized. Not only are information functions seldom used to maximum advantage, but also those who have tried to use latent trait theory still use some popular concepts in classical test theory, such as reliability. expanded to testing procedures in which different sets of items are presented to different examinees. Examples are given. In this paper, the author points out some important implications in latent trait theory which are not fully understood and appreciated among researchers, and gives some practical suggestions for its use.
Some item analysis and test theory for a system of computer-assisted test construction for individualized instruction
(1977) Lord, Frederic M.
Under given conditions, conventional testing and computer-generated repeatable testing (CGRT) are equally effective for estimating examinee ability; CGRT is more effective than conventional testing for estimating the mean ability level of a group; and CGRT is less effective for estimating ability differences among individuals. These conclusions are drawn from domain-referenced test theory as distinguished from norm-referenced test theory.
Test-free person measurement with the Rasch simple logistic model
(1977) Tinsley, Howard E.; Dawis, Rene V.
This research investigated the use of the Rasch simple logistic model in obtaining test-free ability estimates. Two tests each of word, picture, symbol, and number analogies were administered to college and high school students. Differences between scores on each pair of tests were analyzed to determine whether the ability estimates were independent of the tests employed. The results indicate that raw-score ability estimates are influenced by the difficulty of the items used in measurement but that Rasch ability estimates are relatively independent of the difficulty of these items. The need is discussed for additional research in which an individualized item-presentation procedure is used with the Rasch model.
Inter-inventory predictability and content overlap of the 16 PF and the CPI.
(1977) Campbell, John B; Chun, Ki-Taek
Each scale on the 16 PF and the CPI was predicted from the scales on the other inventory using both standard and stepwise multiple regression (N = 241 undergraduates). The discrepancy between the predictabilities obtained by these two methods was minimal. The cross-validational shrinkage of the stepwise regression predictabilities, when examined by both the conventional and the McNemar methods, was quite small. The mean predictability was .63 for the 16 PF and .64 for the CPI. Four 16 PF and five CPI scales were "highly predictable," while four 16 PF and three CPI scales were essentially non-predictable. Seven scales from each inventory appeared to have counterparts in the other inventory. Thus, despite major differences in philosophy and strategy of construction, the overall predictability remained the same whether the 16 PF scales were predicted from the CPI scales or vice versa. Furthermore, the pattern of predictabilities suggested a substantial overlap between the 16 PF "Adjustment vs. Anxiety" and CPI "Adjustment" factors, and between the 16 PF "Introversion vs. Extroversion" and CPI "Extroversion" factors.
Studies of voluntary visual attention: Theory, methods, and psychometric issues
(1977) Nunnally, Jum C.; Lemond, L. Charles; Wilson, William H.
The paper discusses the study of voluntary visual attention (VVA), a relatively new area of active experimentation. VVA concerns "natural" viewing behavior or visual browsing when the subject is under no constraints regarding the distribution of attention. This is contrasted with traditional studies of directed visual attention, such as the typical study of visual judgment in tachistoscopic research. Discussed are (1) the logic of investigating VVA, (2) a comprehensive set of constructs that are thought to be of theoretical importance, (3) methods for calibrating these variables in terms of treatment parameters, (4) the logic of scaling both independent and dependent variables, (5) a summary of salient findings, (6) some recent findings not previously reported, and (7) an overview of the psychometric issues in the study of VVA.