Browsing by Subject "International Large-scale Assessments"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Linking Errors Introduced By Rapid Guessing Responses When Employing Multigroup Concurrent Irt Scaling(2024) Deng, JiayiTest score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic version of test forms into a common score scale. An example is the multigroup concurrent IRT calibration method, which is used for estimating item and ability parameters across multiple linguistic groups of test-takers. The method uses common item parameters to most items and groups, with a select few items allowed to have group-specific parameters. Although prior researchers used empirical data from international LSAs to demonstrate the effectiveness of multigroup concurrent IRT calibration in offering greater global comparability in score scales, it is important to note that they assumed comparable test-taking efforts across cultural and linguistic populations. This assumption may not hold true due to differential rapid guessing (RG) rates, potentially biasing item parameter estimation. To address this gap, I proposed a real data analysis and simulation to explore this area. The objective of the current study is to investigate the linking errors introduced by RG responses when employing multigroup concurrent IRT calibration.In the real data analysis, data from the Arabic and Chinese groups in the PISA 2018 Form 18 science module were linked, with RG responses flagged using response time information. Despite observed differential RG, the linking procedure proved robust to anchor identification and ability estimation. In the simulation, data was generated for two groups with varying motivation levels. These groups were administered two linguistic versions of a test form comprising multiple-choice items. Factors such as differential RG rate, association between ability and RG propensity, group impact, sample size, and model fit criteria were considered. The assessment focused on anchor identification accuracy, item parameter estimation accuracy, and ability parameter estimation accuracy and precision. The findings showed that multigroup concurrent IRT calibration was robust against differential RG, with sample size and group impact being primary factors influencing errors. However, differential RG could affect ability estimation precision and item parameter estimation accuracy.