Applied Psychological Measurement, Volume 10, 1986

Persistent link for this collection

Search within Applied Psychological Measurement, Volume 10, 1986

Browse

Recent Submissions

Now showing 1 - 20 of 33
  • Item
    Item banking in computer-based instructional systems
    (1986) Baker, Frank B.
    This paper examines item banking within computer-based instructional systems from both a systems and a measurement perspective. Traditionally, computer-aided instruction involves little testing, although there is a trend to incorporate posttests in the sessions. However, computer-managed instruction has incorporated testing since its inception. The tests employed are similar in most respects to teacher-made classroom tests. The test results are used as the basis for diagnosis, prescription, and management procedures for individual or small groups of students. At the classroom level, test banking may be more appropriate than item banking. Because of the tight linkage of the tests to instructional procedures, the basic measurement issue appears to be the degree to which the approaches evolved from standardized achievement testing can be applied to the large number of short tests employed in computer-based instructional systems.
  • Item
    Some applications of optimization algorithms in test design and adaptive testing
    (1986) Theunissen, T. J. J. M.
    Some test design problems can be seen as combinatorial optimization problems. Several suggestions are presented, with various possible applications. Results obtained thus far are promising; the methods suggested can also be used with highly structured test specifications.
  • Item
    Development of a testing service system
    (1986) Van Thiel, Catharina C.; Zwarts, Michel A.
    The development of an integrated system for the storage of items and the construction and analysis of tests is described. The system is being developed both as a general facility for the Dutch Institute of Educational Measurement and as a support system for the use and maintenance of item banks in schools. The methodology of developing the system is described with attention to the system architecture and to the results of the first stage of the system development.
  • Item
    The changing conception of measurement: A commentary
    (1986) Hambleton, Ronald K.
    This paper comments on the contributions to this special issue on item banking. An historical framework for viewing the papers is provided by brief reviews of the literature in the areas of item response theory, item banking, and computerized testing. In general, the eight papers are viewed as contributing valuable technical knowledge for implementing testing programs with the aid of item banks.
  • Item
    Rule-based item bank construction and evaluation within the linear logistic framework
    (1986) Hornke, Lutz F.; Habon, Michael W.
    In cognition research, item writing rules are considered a necessary prerequisite of item banking. A set of 636 items was constructed using prespecified cognitive operations. An evaluation of test data from some 7,400 examinees revealed 446 homogeneous items. Some items had to be discarded because of printing flaws, and others because of operation complexion or other well-describable reasons. However, cognitive operations explained item difficulty parameters quite well; further cross-validation research may contribute to an item writing approach which attempts to bring psychological theory and psychometric models closer together. This will eventually free item construction from item writer idiosyncrasies.
  • Item
    Banking non-dichotomously scored items
    (1986) Masters, Geofferey N.; Evans, John
    A method for constructing a bank of items scored in two or more ordered response categories is described and illustrated. This method enables multistep problems, rating scale items, question "clusters," and other items using partial credit scoring to be calibrated and incorporated into an item bank, and it provides a mechanism for computer adaptive testing with items of this type. Procedures are described for calibrating an initial set of items, for testing the fit of items to the underlying measurement model, and for linking new items to an existing item bank. The method is illustrated using items from the Watson-Glaser Critical Thinking Appraisal.
  • Item
    An empirical Bayesian approach to item banking
    (1986) Van der Linden, Wim J.; Eggen, Theo J. H. M.
    A procedure for the sequential optimization of the calibration of an item bank is given. The procedure is based on an empirical Bayesian approach to a reformulation of the Rasch model as a model for paired comparisons between the difficulties of test items in which ties are allowed to occur. First, it is shown how a paired-comparisons design deals with the usual incompleteness of calibration data and how the item parameters can be estimated using this design. Next, the procedure for a sequential optimization of the item parameter estimators is given, both for individuals responding to pairs of items and for item and examinee groups of any size. The paper concludes with a discussion of the choice of the first priors in the procedure and the problems involved in its generalization to other item response models.
  • Item
    Linking item parameters onto a common scale
    (1986) Vale, C. David
    An item bank typically contains items from several tests that have been calibrated by administering them to different groups of examinees. The parameters of the items must be linked onto a common scale. A linking technique consists of an anchoring design and a transformation method. Four basic anchoring designs are the unanchored, anchor-items, anchor-group, and double-anchor designs. The transformation design consists of the system of equations that is used to translate the anchor information and put the item parameters on a common scale. Several transformation methods are discussed briefly. A simulation study is presented that compared the equivalent-groups method with the anchor-items method, using varying numbers of common items, applied both to the situation in which the groups were equivalent and one in which they were not. The results confirm previous findings that the equivalent-groups method is adequate when the groups are in fact equivalent. When the groups are not equivalent, accurate linking can be obtained with as few as two common items. Linking using a more efficient interlaced anchor-items design can provide accurate linking without the expense of including explicit common items in each of the tests.
  • Item
    The changing conception of measurement in education and psychology
    (1986) Van der Linden, Wim J.
    Since the era of Binet and Spearman, classical test theory and the ideal of the standard test have gone hand in hand, in part because both are based on the same paradigm of experimental control by manipulation and randomization. Their longevity is a consequence of this mutually beneficial symbiosis. A new type of theory and practice in testing is replacing the standard test by the test item bank, and classical test theory by item response theory. In this paper it is shown how these also reinforce and complete each other.
  • Item
    An exploration of the robustness of four test equating models
    (1986) Skaggs, Gary; Lissitz, Robert W.
    This monte carlo study explored how four commonly used test equating methods (linear, equipercentile, and item response theory methods based on the Rasch and three-parameter models) responded to tests of different psychometric properties. The four methods were applied to generated data sets where mean item difficulty and discrimination as well as level of chance scoring were manipulated. In all cases, examinee ability was matched to the level of difficulty of the tests. The results showed the Rasch model not to be very robust to violations of the equal discrimination and non-chance scoring assumptions. There were also problems with the three-parameter model, but these were due primarily to estimation and linking problems. The recommended procedure for tests similar to those studied is the equipercentile method.
  • Item
    Assessing the dimensionality of a set of test items
    (1986) Hambleton, Ronald K.; Rovinelli, Richard J.
    This study compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis, residual analysis, and a method developed by Bejar (1980). Five artificial test datasets (for 40 items and 1,500 examinees) were generated to be consistent with the three-parameter logistic model and the assumption of either a one- or a two-dimensional latent space. Two variables were manipulated: (1) the correlation between the traits (r = .10 or r = .60) and (2) the percent of test items measuring each trait (50% measuring each trait, or 75% measuring the first trait and 25% measuring the second trait). While linear factor analysis in all instances overestimated the number of underlying dimensions in the data, nonlinear factor analysis with linear and quadratic terms led to correct determination of the item dimensionality in the three datasets where it was used. Both the residual analysis method and Bejar’s method proved disappointing. These results suggest the need for extreme caution in using linear factor analysis, residual analysis, and Bejar’s method until more investigations of these methods can confirm their adequacy. Nonlinear factor analysis appears to be the most promising of the four methods, but more experience in applying the method seems necessary before wide-scale use can be recommended.
  • Item
    Rejoinder to "The Mokken scale: A critical discussion."
    (1986) Mokken, Robert J.; Lewis, Charles; Sijtsma, Klaas
    The nonparametric approach to constructing and evaluating tests based on binary items proposed by Mokken has been criticized by Roskam, van den Wollenberg, and Jansen. It is contended that their arguments misrepresent the objectives of this approach, that their criticisms of the role of the H coefficient in the procedures are irrelevant or erroneous, and that they fail to distinguish the inherent requirements (and limitations) of general nonparametric models and procedures from those of parametric ones. It is concluded that Mokken’s procedures provide a useful tool for researchers in the social sciences who wish to construct and evaluate tests for measuring theoretically meaningful latent traits while avoiding the strong parametric assumptions of traditional item response theory.
  • Item
    The Mokken scale: A critical discussion
    (1986) Roskam, Edward E.; Van den Wollenberg, Arnold L.; Jansen, Paul G. W.
    The Mokken scale is critically discussed. It is argued that Loevinger’s H, adapted by Mokken and advocated as a coefficient of scalability, is sensitive to properties of the item set which are extraneous to Mokken’s requirement of holomorphy of item response curves. Therefore, when defined in terms of H, the Mokken scale is ambiguous. It is furthermore argued that item-selection free statistical inferences concerning the latent person order appear to be insufficiently based on double monotony alone, and that the Rasch model is the only item response model fulfilling this requirement. Finally, it is contended that the Mokken scale is an unfruitful compromise between the requirements of a Guttman scale and the requirements of classical test theory.
  • Item
    The use of item statistics in the calibration of an item bank
    (1986) De Gruijter, Dato N. M.
    An IRT analysis based on p (proportion correct) and r (item-test correlation) is proposed for a group of tests having items in common. The procedure is a generalization of a procedure proposed by De Gruijter and Mooijaart (1983) which is related to procedures for the factor analysis of dichotomous data. The procedure results in IRT item parameters using data from examinee groups with subsets of common items; it is, therefore, particularly appropriate for calibrating items for use in small-scale item banks. Simulated data are used to illustrate the procedure.
  • Item
    Simple and weighted unfolding threshold models for the spatial representation of binary choice data
    (1986) DeSarbo, Wayne S.; Hoffman, Donna L.
    This paper describes the development of an unfolding methodology designed to analyze "pick any" or "pick any/n" binary choice data (e.g., decisions to buy or not to buy various products). Maximum likelihood estimation procedures are used to obtain a joint space representation of both persons and objects. A review of the relevant literature concerning the spatial treatment of such binary choice data is presented. The nonlinear logistic model type is described, as well as the alternating maximum likelihood algorithm used to estimate the parameter values. The results of an application of the spatial choice model to a synthetic data set in a monte carlo analysis are presented. An application concerning consumer (intended) choices for nine competitive brands of sports cars is discussed. Future research may provide a means of generalizing the model to accommodate three-way choice data.
  • Item
    Graphical analysis of item response theory residuals
    (1986) Ludlow, Larry H.
    A graphical comparison of empirical versus simulated residual variation is presented as one way to assess the goodness of fit of an item response theory model. The two forms of residual variation were generated through the separate calibration of empirical data and data "tailored" to fit the model, given the empirical parameter estimates. A variety of techniques illustrate the utility of using tailored residuals as a specific baseline against which empirical residuals may be understood. This paper presents an analytic method for isolating and identifying departures from the fit of an item response theory (IRT) model. The specific techniques employed focus on the graphical comparison of empirical residual variation to baseline residual variation. The baseline variation is the result of data generated to fit the model, given the empirical parameter estimates. The baseline residuals thus serve as the reference background for interpreting the empirical residuals. Although the Rasch model is applied in this paper, the principles that are discussed and illustrated hold for the residual analysis of any IRT model.
  • Item
    Covariance and regression slope models for studying validity generalization
    (1986) Raju, Nambury S.; Fralicx, Rodney; Steinhaus, Stephen D.
    Two new models, the covariance and regression slope models, are proposed for assessing validity generalization. The new models are less restrictive in that they require only one hypothetical distribution (distribution of range restriction for the covariance model and distribution of predictor reliability for the regression slope model) for their implementation, in contrast to the correlation model which requires hypothetical distributions for criterion reliability, predictor reliability, and range restriction. The new models, however, are somewhat limited in their applicability since they both assume common metrics for predictors and criteria across validation studies. Several simulation (monte carlo) studies showed the new models to be quite accurate in estimating the mean and variance of population true covariances and regression slopes. The results also showed that the accuracy of the covariance, regression slope, and correlation models is affected by the degree to which hypothetical distributions of artifacts match their true distributions; the regression slope model appears to be slightly more robust than the other two models.
  • Item
    A cautionary note on the use of LISREL's automatic start values in confirmatory factor analysis studies
    (1986) Brown, R. L.
    The accuracy of parameter estimates provided by the major computer programs for confirmatory factor analysis studies is questioned. This note demonstrates an inconsistency in parameter estimates across two of the major programs (LISREL and EQS), with the inconsistency attributed to the use of LISREL VI’S automatic start values for the estimation of generalized least squares models.
  • Item
    Small N does not always justify Rasch model
    (1986) De Gruijter, Dato N. M.
    In many applications of item response theory, it is of little consequence whether the Rasch model or a more accurate, but more complicated item response model is used. With small sample sizes, it might be advantageous to employ the Rasch model. A clear counterexample is the case of optimal item selection under guessing.
  • Item
    An estimator of examinee-level measurement error variance that considers test form difficulty adjustments
    (1986) Jarjoura, David
    A model and estimator for examinee-level measurement error variance are developed. Although the binomial distribution is basic to the modeling, the proposed error model provides some insights into problems associated with simple binomial error, and yields estimates of error that are quite distinct from binomial error. By taking into consideration test form difficulty adjustments often used in standardized tests, the model is linked also to indices designed for identifying unusual item response patterns. In addition, average error variance under the model is approximately that which would be obtained through a KR-20 estimate of reliability, thus providing a unique justification for this popular index. Empirical results using odd-even and alternate-forms measures of error variance tend to favor the proposed model over the binomial.