Nguyen, Trung2018-03-142018-03-142017-12https://hdl.handle.net/11299/194562University of Minnesota Ph.D. dissertation.December 2017. Major: Health Services Research, Policy and Administration. Advisor: Angeline Carlson. 1 computer file (PDF); ix, 165 pages.Many hospitalizations have been considered preventable if health practitioners through management programs and interventions can address early disease symptoms and identify exactly potential risk factors related to patient or healthcare system level causing these hospitalizations. However, it seems to be lack of a reliable method to indicate the risk factors and predict patients at risk of hospitalizations in practice. Accordingly, the goal of this thesis was to provide an in-depth application of Random Forests (RF), a machine learning technique, in order to specify and examine important factors for predicting hospitalizations and/or rehospitalizations of patients with COPD. Using claims data from a single, large, Midwestern, self-insured employer group, the first part of this dissertation built and validated several RF models to get the final reliable models for identifying the contributing predictors. The final RF model for hospitalization prediction presented high accuracy (89 %), high sensitivity (83 %), high specificity (0.93), high value of c-statistic (0.88), and good agreement (Kappa = 0.77). The final RF model for rehospitalization prediction presented high accuracy (89 %), high sensitivity (100 %), high specificity (0.83), high value of c-statistic (0.92), and good agreement (Kappa = 0.79). In the second part, the core of our contributions focused on identifying variable importance for hospitalization and/or rehospitalization by using the Mean Decrease of Impurity variable importance measure (i.e., Mean Decrease of Gini). In consequence of this work, our analyses demonstrated that important variables for hospitalization prediction included comorbidity index, outpatient visit, care management, prescription of COPD drugs, prescription of cardiovascular drugs, and number of prescriptions. For rehospitalization prediction, important variables were comorbidity index, post-discharge prescription of COPD drugs, pre-discharge care management, length of stay in hospital, gender, pre-discharge number of prescriptions, post-discharge prescription of cardiovascular drugs, and emergency room visit. Finally, the last part of this dissertation indicated a possibility of applying our models to practice by ranking patients at different levels of risks to hospitalization and/or rehospitalization. In the analytic sample for the hospitalization prediction, the final model predicted 51.0 % at low risk level (P Hospitalization < 50), 23.1 % at the medium risk level (50 % ≤ P Hospitalization < 80 %), and 25.9 % of patients at high risk level (i.e., P Hospitalization ≥ 80 %). In the analytic sample for the rehospitalization prediction, the final model found 53.6 % at low risk level (P Rehospitalization < 50), 16.9 % at the medium risk level (50 % ≤ P Rehospitalization < 80 %), and 29.5 % of patients at high risk level (P Rehospitalization ≥ 80 %).enUsing Random Forest Model For Risk Prediction Of Hospitalization And Rehospitalization Associated With Chronic Obstructive Pulmonary DiseaseThesis or Dissertation