Applied Psychological Measurement, Volume 08, 1984
Persistent link for this collectionhttps://hdl.handle.net/11299/100664
Search within Applied Psychological Measurement, Volume 08, 1984
Browse
Recent Submissions
Item Eigenvalue shrinkage in principal components based factor analysis(1984) Bobko, Philip; Schemmer, F. MarkThe concept of shrinkage, as (1) a statistical phenomenon of estimator bias, and (2) a reduction in explained variance resulting from cross-validation, is explored for statistics based on sample eigenvalues. Analytic solutions and previous research imply that the magnitude of eigenvalue shrinkage is a function of the type of shrinkage, sample size, the number of variables in the correlation matrix, the ordinal root position, the population eigenstructure, and the choice of principal components analysis or principal factors analysis. Hypotheses relating these specific independent variables to the magnitude of shrinkage were tested by means of a monte carlo simulation. In particular, the independent variable of population eigenstructure is shown to have an important effect on shrinkage. Finally, regression equations are derived that describe the linear relation of population and cross-validated eigenvalues to the original eigenvalues, sample size, ordinal position, and the number of variables factored. These equations are a valuable tool that allows researchers to accurately predict eigenvalue shrinkage based on available sample information.Item The validity of item bias techniques with math word problems(1984) Ironson, Gail; Homan, Susan; Willis, Ruth; Signer, BarbaraItem bias research has compared methods empirically using both computer simulation with known amounts of bias and real data with unknown amounts of bias. This study extends previous research by "planting" biased items in the realistic context of math word problems. "Biased" items are those in which the reading level is too high for a group of students so that the items are unable to assess the students’ math knowledge. Of the three methods assessed (Angoff’s transformed difficulty, Camilli’s full chi-square, and Linn and Harnisch’s item response theory, IRT, approach), only the IRT approach performed well. Removing the biased items had a minor effect on the validity for the minority group.Item Relationships between the Thurstone, Coombs, and Rasch approaches to item scaling(1984) Jansen, Paul G. W.Andrich (1978) derived a formal equivalency between Thurstone’s Case V specialization of the law of comparative judgment for paired comparisons, with a logistic function substituted for the normal, and the Rasch model for direct responses. The equivalency was corroborated by a specific substantial-psychological interpretation of the Rasch binary item response probability. Studying the relationships between the Thurstone and Rasch models from another perspective than Andrich’s, namely, from a data-theoretical point of view, it appears that the equivalency is based on an implicit assumption with respect to the subject population. This assumption (1) is rather restrictive, and therefore its empirical validity seems to be low, and (2) seems to contradict the substantial reasoning corroborating the Thurstone-Rasch equivalency. It is argued that the Thurstone model cannot be considered the sample-independent pair comparison counterpart of the Rasch model. An alternative pair comparison equivalent of the Rasch model is tentatively proposed. Finally, the theoretical and practical implications of Andrich’s and of the present study are discussed.Item An investigation of methods for reducing sampling error in certain IRT procedures(1984) Wingersky, Marilyn S.; Lord, Frederic M.The sampling errors of maximum likelihood estimates of item response theory parameters are studied in the case when both people and item parameters are estimated simultaneously. A check on the validity of the standard error formulas is carried out. The effect of varying sample size, test length, and the shape of the ability distribution is investigated. Finally, the effect of anchor-test length on the standard error of item parameters is studied numerically for the situation, common in equating studies, when two groups of examinees each take a different test form together with the same anchor test. The results encourage the use of rectangular or bimodal ability distributions, and also the use of very short anchor tests.Item An application of latent class models to assessment data(1984) Haertel, EdwardResponses of 17-year-olds to selected 1977-78 National Assessment of Educational Progress (NAEP) mathematics exercises were analyzed, using latent class models. A single model was fitted to data from five independent samples of examinees, each of which responded to a different set of six algebra or prealgebra exercises. Four categories of items were found, defining five levels of content mastery, ranging from examinees unable to solve any of the exercises (43%) through those able to solve all the exercises (19%). The methods demonstrated are broadly applicable to assessment data, including matrix-sampled data, and provide an aggregate description of examinee abilities independent of the specific characteristics of individual exercises administered.Item Item profile analysis for tests developed according to a table of specifications(1984) Kolen, Michael J.; Jarjoura, DavidAn approach to analyzing items is described that emphasizes the heterogeneous nature of many achievement and professional certification tests. The approach focuses on the categories of a table of specifications, which often serves as a blueprint for constructing such tests. The approach is characterized by profile comparisons of observed and expected correlations of item scores with category scores. A multivariate generalizability theory model provides the foundation for the approach, and the concept of a profile of expected correlations is derived from the model. Data from a professional certification testing program are used for illustration and an attempt is made to provide links with test development issues and generalizability theory.Item Ability metric transformations involved in vertical equating under item response theory(1984) Baker, Frank B.The metric transformations of the ability scales involved in three equating techniques-external anchor test, internal anchor test, and a pooled groups procedure -were investigated. Simulated item response data for two unique tests and a common test were obtained for two groups that differed with respect to mean ability and variability. The obtained metrics for various combinations of groups and tests were transformed to a common metric and then to the underlying ability metric. The results showed that there was reasonable agreement between the transformed obtained metrics and the underlying ability metric. They also showed that the largest errors in the ability score statistics occurred under the external anchor test procedure and the smallest under the pooled procedures. Although the pooled procedure performed well, it was affected by unequal variances in the two groups of examinees.Item Two simple models for rater effects.(1984) De Gruijter, Dato N. M.In many examinations, essays of different examinees are rated by different rater pairs. This paper discusses the estimation of rater effects for rating designs in which rater pairs overlap in a special way. Two models for rater effects are considered: the additive model and a nonlinear model. An illustration with empirical data is provided.Item Relationship between corresponding Armed Services Vocational Aptitude Battery (ASVAB) and computerized adaptive testing (CAT) subtests(1984) Moreno, Kathleen E.; Wetzel, C. Douglas; McBride, James R.; Weiss, David J.The relationships between selected subtests from the Armed Services Vocational Aptitude Battery (ASVAB) and corresponding subtests administered as computerized adaptive tests (CAT) were investigated using Marine recruits as subjects. Three adaptive subtests were shown to correlate as well with ASVAB as did a second administration of ASVAB, even though the CAT subtests contained only half the number of items. Factor analysis showed the CAT subtests to load on the same factors as the corresponding ASVAB subtests, indicating that the same abilities were being measured. The preenlistment Armed Forces Qualification Test (AFQT) composite scores were predicted as well from the CAT subtest scores as from the retest ASVAB subtest scores, even though the CAT contained only three of the four AFQT subtests. It is concluded that CAT can achieve the same measurement precision as a conventional test, with half the number of items.Item Errors of measurement and standard setting in mastery testing(1984) Kane, Michael T.; Wilson, JenniferA number of studies have estimated the dependability of domain-referenced mastery tests for a fixed cutoff score. Other studies have estimated the dependability of judgments about the cutoff score. Each of these two types of dependability introduces error. Brennan and Lockwood (1980) analyzed the two kinds of errors together but assumed that the two sources of error were uncorrelated. This paper extends that analysis of the total error in estimates of the difference between the domain score and the cutoff score to allow for covariance between the two types of error.Item Examination of an extension of Guttman's model of ability tests(1984) Tziner, Aharon; Rimmer, AvigdorAn extension of Guttman’s structural model of ability tests was devised and investigated with two samples consisting, respectively, of 335 and 225 males. The examinees in the first sample came for vocational guidance after their military service and were administered a 17-test battery. The second sample consisted of applicants for various jobs in an organization and were administered a 14-test battery. For each sample, a matrix of intercorrelations between scores was obtained based on the number of correct responses. The matrices were submitted to Guttman-Lingoes Smallest Space Analysis. The two-dimensional structure found was a radex in which (1) the facet of the language of presentation radially divided the space and (2) the facet of mental operation formed concentric rings. The significance of these findings for theoretical and applied problems relating to ability tests is discussed.Item Thorndike, Thurstone, and Rasch: A comparison of their methods of scaling psychological and educational tests(1984) Engelhard, George, Jr.The purpose of this study is to describe and compare the methods used by Thorndike, Thurstone, and Rasch for calibrating test items. Thorndike and Thurstone represent a traditional psychometric approach to this problem, whereas Rasch represents a more modem conceptualization derived from latent trait theory. These three major theorists in psychological and educational measurement were concerned with a common set of issues that seem to recur in a cyclical manner in psychometric theory. One such issue involves the invariance of item parameters. Each recognized the importance of eliminating the effects of an arbitrary sample in the estimation of item parameters. The differences generally arise from the specific methods chosen to deal with the problem. Thorndike attempted to solve the problem of item invariance by adjusting for mean differences in ability distributions. Thurstone extended Thorndike’s work by proposing two adjustments which included an adjustment for differences in the dispersions of ability in addition to Thorndike’s adjustment for mean differences. Rasch’s method implies a third adjustment, which involves the addition of a response model for each person in the sample. Data taken from Trabue (1916) are used to illustrate and compare how Thorndike, Thurstone, and Rasch would approach a common problem, namely, the calibration of a single set of items administered to several groups.Item Reply to van der Linden's "Thoughts on the use of decision theory to set cutoff scores"(1984) De Gruijter, Dato N. M.; Hambleton, Ronald K.