University Digital Conservancy :: Browsing by Subject "Item response theory"

Browsing by Subject "Item response theory"

Now showing 1 - 6 of 6

An Automated Test Assembly Approach Using Item Response Theory to Enhance Evidence of Measurement Invariance
(2022-06) Cooperman, Allison
When creating a new test to measure a latent trait, test developers must select items that together demonstrate desirable psychometric properties. Automated test assembly (ATA) algorithms allow test developers to systematically compare possible item combinations based on the test’s goals. ATA algorithms afford flexibility to incorporate various psychometric criteria for evaluating a new test. However, few algorithms have integrated analyses for item- and test-level bias, particularly within the item response theory framework. This dissertation proposes an approach that balances common indices of test score precision and model fit while simultaneously accounting for differing measurement models between two groups. Three Monte Carlo studies were designed to evaluate the proposed method (termed “Unbiased-ATA”). The first study found that in many testing scenarios, Unbiased-ATA appropriately constructed tests with evidence of measurement invariance (MI), item fit, and test information function alignment. Importantly, Unbiased-ATA’s performance depended on the accuracy of both the DIF detection method and item parameter estimation. The second study revealed that differentially weighting the Unbiased-ATA objective function criteria did not substantially affect the method’s performance. The final study found that Unbiased-ATA produced tests with stronger psychometric properties than an objective function based solely on test score precision. Yet adding a criterion for item-level MI did not noticeably improve tests’ psychometric strength above and beyond a criterion for test-level MI. Future directions for integrating ATA, test bias, and test fairness more broadly in psychological and educational measurement are discussed.
Between-person and within-person subscore reliability: comparison of unidimensional and multidimensional IRT models
(2013-06) Bulut, Okan
The importance of subscores in educational and psychological assessments is undeniable. Subscores yield diagnostic information that can be used for determining how each examinee's abilities/skills vary over different content domains. One of the most common criticisms about reporting and using subscores is insufficient reliability of subscores. This study employs a new reliability approach that allows the evaluation of between-person subscore reliability as well as within-person subscore reliability. Using this approach, the unidimensional IRT (UIRT) and multidimensional IRT (MIRT) models are compared in terms of subscore reliability in simulation and real data studies. Simulation conditions in the simulation study are subtest length, correlations among subscores, and number of subtests. Both unidimensional and multidimensional subscores are estimated with the maximum a posteriori probability (MAP) method. Subscore reliability of ability estimates are evaluated in light of between-person reliability, within-person reliability, and total profile reliability. The results of this study suggest that the MIRT model performs better than the UIRT model under all simulation conditions. Multidimensional subscore estimation benefits from correlations among subscores as ancillary information, and it yields more reliable subscore estimates than unidimensional subscore estimation. The subtest length is positively associated with both between-person and within-person reliability. Higher correlations among subscores improve between-person reliability, while they substantially decrease within-person reliability. The number of subtests seems to influence between-person reliability slightly but it has no effect on within-person reliability. The two estimation methods provide similar results with real data as well.
Incorporating Response Times in Item Response Theory Models of Reading Comprehension Fluency
(2017-06) Su, Shiyang
With the online assessment becoming mainstream and the recording of response times becoming straightforward, the importance of response times as a measure of psychological constructs has been recognized and the literature of modeling times has been growing during the last few decades. Previous studies have tried to formulate models and theories to explain the construct underlying response times, the relationship between response times and response accuracy, and to understand examinees’ behaviors. Different from most existing psychometric models, the current study is based on the idea of reading comprehension fluency in the reading literature and proposes several item response theory based models combining response times and response accuracy. To better understand the construct of reading comprehension fluency, the current study used a new computer-administered assessment of reading comprehension and recorded both the responses and response times of each item. Response times connect examinees’ performance on the reading comprehension test to the concepts of fluency or automaticity in the reading literature, concepts that are evidenced by responses that are accurate and appropriately fast. The current study evaluates reading comprehension fluency through two approaches: one with polytomously scored variables and one with conditional variables. The models show the benefits of using the response time information in terms of improving the construct validity when the measured latent construct is reading comprehension fluency. The current study contributes to an interpretation of the latent trait of reading fluency. The models can be used to identify the intervals along the comprehension continuum in which the students tend to read fluently.
Latent Class Models: Design and Diagnosis
(2019-01) Shang, Zhuoran
A restricted latent class model is a family of latent variable models with broad applications in psychological and educational assessment, where the model is restricted via a latent matrix to reflect pre-specified assumptions on latent attributes. In this dissertation, I focus on the design and diagnosis of such models. First, the latent structure is often provided by experts and assumed to be correct upon construction, which may be subjective and misspecified. Recognizing this problem, I establish identifiability conditions that ensure the estimability of the structure matrix. With the theoretical development, a likelihood-based method is proposed to estimate and update the latent structure from the data. Second, it is usually assumed in cognitive diagnosis models, a group of such latent class models, that test items require mastery of specific skills – represented by latent attributes – and that each is either fully mastered or not by a subject. As a consequence, the concept of partial mastery may not be well accounted for. I propose a new class of models, partial mastery CDMs (PM-CDMs). This class generalizes both CDMs by allowing for partial mastery and mixed membership models by specifying mixed membership for each latent attribute dimension. I demonstrate that PM-CDMs can be represented as restricted latent class models and propose a Bayesian approach for estimation. Simulation studies show that the proposed method outperforms the existing approaches in latent structure estimation and the PM-CDM is able to investigate the impact of model misspecification with respect to partial mastery. I illustrate these methods through data in educational assessment. Latent structure estimation results provide interpretable results on the fraction subtraction data and the English test data demonstrate a case where PM-CDM improves the model-fit of the classical model settings.
Multilevel modeling of item position Effects
(2012-04) Albano, Anthony D.
In many testing programs it is assumed that the context or position in which an item is administered does not have a differential effect on examinee responses to the item, or at least that any differential effect is negligible. Violations of this assumption may bias item response theory estimates of item and person parameters. This study examines the potentially biasing effects of item position. Previous work has approached position effects in testing from a variety of methodological perspectives, resulting in a variety of findings. This study presents a hierarchical generalized linear model, a type of multilevel model, for estimating item position effects. Previous approaches to estimating and modeling position effects are described within a multilevel framework, and an extension of these approaches is demonstrated, one which incorporates item position as a continuous variable. Position effects are estimated as interactions between the position and the item, in other words, as slopes or changes in item difficulty per shift in the position of the item within the test form. The model is demonstrated using real and simulated data. Real data came from two sources: a K-12 reading achievement test administered to over 90,000 students in which pilot items were included in random positions; and pilot sections of the GRE administered to roughly 1,800 examinees, where the same items appeared in different positions across the form. Data were simulated to have item-position effects similar to those found in the real data studies and in previous research. A base model and two position effect models were then compared in terms of parameter recovery and fit to the simulated data. Practical applications of the model are discussed.
Revisiting three comparisons of unobserved conditional invariance techniques for the detection of differential item functioning
(2014-12) Love, Quintin Ulysses Adrian
Within social science research, data are often collected using a measurement instrument that produces ordered-categorical data. When comparing scores created from a measurement instrument across subpopulations, measurement invariance must be a tenable assumption. Confirmatory Factor Analysis (CFA) and Item Response Theory (IRT) are two unobserved conditional approaches to assessing measurement invariance. Within the research literature, there are three often cited simulation studies that compare the two unobserved conditional invariance techniques. Because the research design of the three studies varied greatly, the results of the studies are contradictory and not comparable. In this simulation study, the true positive (TP) and false positive (FP) rates of the IRT and CFA approaches to assessing measurement invariance are evaluated under four manipulated factors: (a) source of Differential Item Functioning (DIF), (b) size of DIF, (c) sample size, and (d) baseline model. The parameters used for the data generation came from a five-item unidimensional scale with four ordered-categories (i.e., Likert-scale). The results suggest that the IRT model using a free-baseline is the most precise model. Additionally, regardless of the model chosen, a free-baseline model is most favorable across all conditions of source of DIF, size of DIF, and sample size. Finally, the TP and FP rates of the studied models vary as a function of source of DIF, size of DIF, sample size, and baseline model. The significance of these results for social science research is discussed.

University Digital Conservancy

Browsing by Subject "Item response theory"

Results Per Page

Sort Options