Advancing precision medicine: harnessing data from diverse sources and individuals to predict medication efficacy and safety.
Authors
Published Date
Publisher
Abstract
My research is centered on the integration of diverse data sources to develop predictive models aimed for individualized medication treatment. These sources include clinical studies and multi-omic data, as well as real-world data (RWD) such as electronic health records (EHRs). My passion for this research area stems from a deep-seated commitment to enhancing patient outcomes by optimizing medication selection and use. During my PharmD curriculum as an inpatient pharmacy intern, I had direct interactions with patients, documenting their medication-related inquiries and needs. It became evident that a one-size-fits-all approach is inadequate, and the key to improving medication outcomes lies in individualized therapy. This thesis demonstrates the following research projects: Deciphering medication exposure and response differences to address health disparities in underrepresented populations: warfarin dosing in Minnesota Hmong as an example (Chapter 2 and 3)
Warfarin, a commonly prescribed anticoagulant, exhibits highly variable dosing among individuals. Our prior research on Hmong, an underserved and underrepresented East Asian subgroup, revealed unique pharmacogenomic (PGx) traits impacting their warfarin dosing. Analyzing data from two Hmong cohorts (n=236 and n=198), we found significantly higher CYP2C9*3 allele frequencies (18.9% vs. 3.0%) and lower predicted warfarin maintenance doses (19.8 vs. 21.3 mg/week) compared to other East Asians. These genetic differences, combined with non-genetic factors, result in distinct warfarin dosing requirements for Hmong individuals. In a subsequent retrospective cohort study using Minnesota Fairview EHR data from January 1 2016 to September 30, 2020, we compared Hmong (N=55) and East Asians (N=34) receiving warfarin. The results corroborated that Hmong required a lower warfarin maintenance dose, had more INR measurements, and a higher risk of bleeding events compared to other East Asians.
This project focused on investigating variations in medication dosing and treatment outcomes by leveraging clinical and genetic data. This research focus was especially important for underserved populations, as they often receive treatments based on population averages, which can result in inefficacy or increased prevalence of side effects. Throughout this research, I became aware of the gap in high-quality data for underserved populations, primarily due to their limited participation in clinical studies. Recognizing this limitation, I turned my attention to the use of RWD with a particular emphasis on EHR data, as a feasible and reliable resource to compensate for the lack of clinical data available for analysis in underserved populations. Furthermore, I've come to appreciate the significance of employing combinatorial and multifactorial approaches to predictive modeling, rather than relying solely on genetic or clinical data. It has been widely acknowledged that machine learning (ML) is particularly useful for analyzing and predicting outcomes based on multidimensional features. These realizations have shaped my current research focus, centered on the mining of EHR data for medication efficacy and outcome modeling using advanced artificial intelligence (AI)/ML techniques.
Leveraging multi-omic data to address variations in medication exposure and responses: a phase II clinical trial investigating Hmong’s response to Vitamin C as potential Gout treatment based on genome and microbiome analysis (Chapter 4)
Hmong men in Minnesota display a higher prevalence of gout and hyperuricemia. Despite conflicting evidence regarding the efficacy of vitamin C as a treatment for gout, an exploration of its therapeutic potential based on an individual's multi-omic signature could unveil predictive markers for treatment success. In line with community-based participatory research (CBPR) principles, we conducted a Phase II clinical trial with the primary goal of evaluating the impact of vitamin C on serum urate levels in Hmong adults, both with and without gout/hyperuricemia (Gout/HU).
Enrolling a total of N=135 Hmong adults, including those with or without Gout/HU, participants provided comprehensive medical, demographic, dietary, and anthropometric information. Among the compliant participants (N=62), comprising N=36 with Gout/HU and N=25 as healthy controls, a regimen of vitamin C at 500mg twice daily for 8 weeks was administered. Pre- and post-treatment blood and urine samples were collected for urate measurements, alongside stool samples for assessing the gut microbiome. Salivary DNA was also obtained to explore genetic markers relevant to uric acid disposition.
Preliminary analysis on the N=62 compliant participants indicates that vitamin C does not have a statistically significant impact on lowering serum urate levels in Hmong adults, irrespective of Gout/HU status. Ongoing analysis is anticipated to reveal clinical, genetic, and microbiome markers important for predicting an individual's response to Vitamin C in reducing serum urate.
Optimizing identification and prediction of medication side effects using natural language processing (NLP) and (machine learning) ML: phenotyping and predicting statin muscle symptoms using EHR data as an example (Chapter 5 and 6)
Around half of Americans aged 65 and above rely on statins to mitigate the risks of cardiovascular disease. Statin-associated muscle symptoms (SAMS) often lead to statin discontinuation and are documented in clinical notes within EHRs. NLP, a subfield of AI, is used to extract such real-world information within patients' EHRs, which, in turn, can be utilized to build ML algorithms that can predict patients at risk for SAMS and statin intolerance. In this project, we aimed to develop SAMS phenotyping and predicting algorithms using academic health center (AHC) - information exchange (IE) EHR data. Our approach involved obtaining structured and unstructured EHR data from statin users and manually establishing a gold standard set of SAMS cases and controls in 200 patients from clinical notes. We developed both ML and rule-based algorithms, considering various criteria such as ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. The best-performing algorithm, the combined rule-based (CRB) algorithm, which integrated clinical notes and structured data criteria, achieved a precision of 0.85, recall of 0.71, and an F1 score of 0.77 against the gold standard set. Next, using our SAMS phenotyping algorithm, We identified SAMS cases and controls from Fairview EHR. We utilized the Least Absolute Shrinkage and Selection Operator regression model to identify significant features for Pharmacological SAMS (PSAMS). PSAMS- Risk Stratification (PSAMS-RS) scores were calculated and the clinical utility of stratifying PSAMS risk was assessed by comparing hazard ratios (HR) between 4th versus 1st score quartiles. PSAMS cases were identified in 1.9% of the derivation and 1.5% of the validation cohorts. 16 out of 38 clinical features were determined to be significant predictors for PSAMS risk. Patients within the 4th quartile of the PSAMS-RS scores had an over sevenfold higher hazard for developing PSAMS versus those in the 1st quartile.
This project initially developed and validated a rule-based algorithm for identifying pharmacological SAMS (PSAMS). Subsequently, we introduced the PSAMS-RS score, a user-friendly tool that stratifies patients' risk of developing PSAMS after initiating statin therapy. The tools developed from this project can help clinicians identify and predict potential SAMS-related statin noncompliance and therefore take preventative measures to improve statin adherence. As a majority of patient reported medication nonadherence events are documented within EHRs in unstructured data format, specific mining and interpreting such information using novel informatics tools such as NLP have clinical and translational impact as they provide more reliable information and therefore could compliment the developments of risk prediction models that are more accurate, and patient centered. While challenges remain, such as the need for external validation and uncertainty regarding the integration of these tools into routine clinical care, the future research will focus on leveraging ML and NLP to optimize the development and integration of predictive models in routine clinical care.
Description
University of Minnesota Ph.D. dissertation. March 2024. Major: Experimental & Clinical Pharmacology. Advisor: robert straka. 1 computer file (PDF); x, 110 pages.
Related to
item.page.replaces
License
Collections
Series/Report Number
Funding Information
item.page.isbn
DOI identifier
Previously Published Citation
Other identifiers
Suggested Citation
Sun, Boguang. (2024). Advancing precision medicine: harnessing data from diverse sources and individuals to predict medication efficacy and safety.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/273531.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.
