Reliability and Validity Evidence of Diagnostic Methods: Comparison of Diagnostic Classification Models and Item Response Theory-Based Methods

JANG, YOO JEONG2022-11-142022-11-142022-05https://hdl.handle.net/11299/243167University of Minnesota Ph.D. dissertation. 2022. Major: Educational Psychology. Advisors: Michael Rodriguez, Mark Davison. 1 computer file (PDF); 153 pages.Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has also attracted increasing attention as a powerful diagnostic tool that can provide fine-tuned diagnostic feedback. Despite its potential, there has been a dearth of research evaluating the psychometric quality of DCM, especially in comparison with diagnostic methods from other psychometric frameworks. Therefore, in this simulation study, DCM was compared with two IRT-based subscore estimation methods in terms of classification accuracy, distinctiveness, and incremental criterion-related validity evidence of subscores. Manipulated factors included diagnostic methods, subscale length, item difficulty distribution, intercorrelations of subscores, and criterion validity coefficients. For classification accuracy, all diagnostic methods yielded comparable results when the center of item difficulty coincided with mean examinee ability and cut-scores. However, when average item difficulty was mismatched with mean examinee ability and cut-scores, DCM yielded substantially higher/lower classification accuracy than IRT-based methods with direction and magnitude of discrepancy depending on the type of agreement measures employed. For subscore distinctiveness, compared to IRT-based methods, DCM yielded subscores more distinct from each other and overall scores when continuous rather than discrete subscores were utilized. Lastly, regarding incremental criterion-related validity evidence, the contribution of DCM estimates over and above overall scores tended to be comparable to but slightly smaller than that of IRT-based methods. Additionally, higher classification accuracy was associated with longer subscales, item difficulty distribution more aligned with examinee ability distribution and cut-scores, and higher intercorrelations of subscores. The same conditions except for higher intercorrelations of subscores also tended to be associated with higher subscore distinctiveness. In contrast, incremental criterion-related validity evidence of subscores was largely a function of intercorrelations of subscores and magnitude of criterion validity coefficients: it increased with lower intercorrelations of subscores and higher criterion validity coefficients. In general, the results of this study suggested that IRT-based methods would be preferable over DCM as diagnostic means when item responses are obtained from IRT-based assessment forms.enClassification accuracyCriterion-related validityDiagnostic classification modelsMultidimensional item response theorySubscore augmentationSubscoresReliability and Validity Evidence of Diagnostic Methods: Comparison of Diagnostic Classification Models and Item Response Theory-Based MethodsThesis or Dissertation