Browsing by Subject "IRT"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Application of the bifactor model to computerized adaptive testing.(2011-01) Seo, Dong GiMost CAT has been studied under the framework of unidimensional IRT. However, many psychological variables are multidimensional and might benefit from using a multidimensional approach to CAT. In addition, a number of psychological variables (e.g., quality of life, depression) can be conceptualized as being consistent with a bifactor model (Holzinger & Swineford, 1937) in which there is a general dimension and some number of subdomains with each item loading on only one of those domains. The present study extended the work on the bifactor CAT of Weiss & Gibbons (2007) in comparison to a fully multidimensional bifactor method using multidimensional maximum likelihood estimation and Bayesian estimation for the bifactor model (MBICAT algorithm). Although Weiss and Gibbons applied the bifactor model to CAT (BICAT algorithm), their methods for item selection and scoring were based on unidimensional IRT methods. Therefore, this study investigated a fully multidimensional bifactor CAT algorithm using simulated data. The MBICAT algorithm was compared to the two BICAT algorithms under three different factors: the number of group factors, the group factor discrimination condition, and estimation method. A fixed- test length was used as the termination criterion for the CATs for Study 1. The accuracy of estimates using the BICAT algorithm and the MBICAT algorithm was evaluated with the correlation between true and estimated scores , the root mean square error (RMSE), and the observed standard error (OSE). Two termination criteria (OSE = .50 and .55) were used to investigate efficiency of the MBICAT for Study 2. This study demonstrated that the MBICAT algorithm worked well when latent scores on the secondary dimension were estimated properly. Although the MBICAT algorithm did not improve the accuracy and efficiency for the general factor scores compared to two BICAT algorithms, the MBICAT showed an improvement of the accuracy and efficiency for the group factors. In the two BICAT algorithms, the use of differential entry on the group factors did not make a difference compared to initial item at trait of 0 for both the general factor and group factor scales (Gibbons, et al., 2008) in terms of accuracy and efficiency.Item A comparative study of item-level fit Indices in item response theory.(2009-07) Davis, Jennifer PaigeItem-level fit indices (IFI) in item response theory (IRT) are designed to assess the degree to which an estimated item response function approximates an observed item response pattern. There are numerous IFIs whose theoretical sampling distributions are specified; however, in some cases little is known regarding the degree to which these indices follow their theoretical distributions in practice. If an IFI departs substantially from its theoretical distribution, degree of misfit will be misestimated, and test developers will have very little idea of whether their models provide accurate depictions of true item response behavior. Therefore, a Monte Carlo simulation study was conducted to assess the degree to which many available IFIs follow their theoretical distributions. The IFIs examined in this study were (1) Infit (VI) and Outfit (VO), two IFIs commonly used for the Rasch model; (2) Yen’s (1981) c2 (Q1) and Orlando and Thissen’s (2000) c2 (QO); (3) three Langrange multiplier statistics [LM(a), LM(b), and LM(ab)] proposed by Glas (1999); and (4) Dragow, Levine, and Williams’ (1985) person fit Lz modified by Reise (1990) to assess item fit. The primary research objective of this study was to determine how a number of factors (listed below) affect Type I error rates and empirical sampling distributions of IFIs. The relationship between IFIs and item parameters was also examined. The crossed between-subjects conditions were: IRT model (1-, 2-, and 3 parameter); data noise, operationalized as strictly unidimensional vs. essentially unidimensional data; item discrimination (high and low); test length (n = 15 and n = 75); and sample size (N = 500 and N = 1,500). There were also two crossed within-subjects factors to capture the impact of item and person parameter estimation error. The dependent variables in this study were IFI Type I error rates and empirical sampling distribution moments across 18,750 replicated items. Data were analyzed and summarized using ANOVA, Pearson correlations, and graphical procedures. The Kolmogorov-Smirnov test was used to directly assess distributional assumptions. The results of the study indicated that QO was the only statistic to adhere closely to its theoretical sampling distribution across all study conditions. For VI, VO, Lz, and Q1 statistics, sampling distributions were strongly influenced by test length, parameter estimation error, and, to a lesser degree, sample size. In the absence of parameter estimation error, all statistics more closely approximated their theoretical sampling distributions and were affected little by other study conditions. The presence of person parameter estimation error tended to have an inflationary effect on sampling distribution means whereas the presence of item parameter estimation error tended to have a deflationary effect on sampling distribution variances. VI, VO, and Lz functioned very similarly to one another, with Type I error rates tending to be grossly inflated for n = 15 and deflated for n = 75 when both person and item parameter error were present. Q1 Type I error rates were also grossly inflated for n = 15, but were near nominal levels for n = 75. Finally, the LM statistics generally exhibited inflated Type I error rates and were moderately influenced by IRT model and discrimination; only for LM(b) did empirical sampling distributions tend to approach theoretical distributions, primarily when discrimination was lower or for the 3-parameter model at both levels of discrimination.Item Evaluating Alternative Item Response Theory Approaches to Account for Missing Data in Children’s Word Dictation Responses(2024-08) An, JechunStudents’ responses to Word Dictation curriculum-based measurement (CBM) in writing tend to include a lot of missing values, especially items not reached due to the three-minute test time limit. A large amount of non-ignorable not-reached responses in Word Dictation can be considered using alternative item response theory (IRT) approaches. In addition, these alternative approaches could be used to estimate students’ writing productivity as well as their accuracy. The purpose of this study was to evaluate the Word Dictation performance of elementary students who are struggling with writing using a classical IRT approach that considers writing accuracy only and alternative IRT approaches that consider both writing accuracy and productivity. This study used data from a larger research project that evaluated the effectiveness of a professional development program designed to support elementary teachers in implementing data-based instruction for students struggling with writing. Participants were recruited in two sites in the Midwest in the U.S. A total of 523 elementary students completed screening tests to determine eligibility for the larger project. Word Dictation CBM in writing, used for screening, was designed to measure transcription skills at the word level by asking students to write dictated words as accurately as they can. I examined the extent to which the students’ results differed by comparing the classical IRT approach, latent regression model (LRM) and item response tree (IRTree) model approach. Results from different approaches considering not-reached items (IRTree model and LRM) yielded different ranges of writing ability level, even though students had the same score evaluated by the classical IRT approach. First, goodness-of-fit, item fit, person fit, and model fit were evaluated for each approach to demonstrate that the models have good fit indices. In addition, the results of conditional standardized errors of measurements (cSEM) across models revealed that the alternative IRT approaches evaluated ability more accurately and precisely than classical IRT approaches. Second, considering not-reached responses as either incorrect or missing made a difference in the ability parameters of students with different productivity performance, ultimately underestimating students performing at different levels. In addition, special education eligibility was a significant factor in comparisons of rank-order differences of several models, but not a significant factor in the comparison between classical IRT missing model and alternative IRT models (IRTree model and LRM). Third, most of the results across Word Dictation Forms A and B were consistently replicated. Major inconsistencies in the factor of English Language Learner (ELL) eligibility emerged in examining relations between child-level factors and writing productivity and accuracy and relations between rank-order difference and child-level factors. Handling missing responses is difficult, but entails important procedures that lead to more accurately estimating the ability parameters in the context of CBM. Though no one absolute evaluating approach could be identified, it is possible that abilities of students with specific status including special education eligibility and ELL eligibility were under- or overestimated when missing data were not considered. Better understanding of students’ writing performance as it relates to writing productivity and accuracy could ultimately support teachers to use the instructionally meaningful data for individualized instruction.Item On-The-Fly Parameter Estimation Based on Item Response Theory in Item-based Adaptive Learning Systems(2020-11) Jiang , ShengyuAn online learning system has the capacity to offer customized content that caters to individual learner’s need and has seen growing interest from industry and academia alike in recent years. Noting the similarity between online learning and the more established adaptive testing procedures, research has focused on applying the techniques of adaptive testing to the learning environment. Yet due to the inherent difference between learning and testing, there exist some major challenges that hinder the development of adaptive learning systems. To tackle these challenges, a new online learning system is proposed which features a Bayesian algorithm that computes item and person parameters on the fly. The new algorithm is validated in two separate simulation studies and the results show that the system, while being cost-effective to build and easy to implement, can also achieve adequate adaptivity and measurement precision for the individual learner.