Extracting Information on Dietary Supplements from Clinical Notes in Electronic Health Record Systems Through Natural Language Processing Techniques

Fan, Yadan2020-05-042020-05-042020-02https://hdl.handle.net/11299/213117University of Minnesota Ph.D. dissertation. February 2020. Major: Health Informatics. Advisor: Rui Zhang. 1 computer file (PDF); xii, 115 pages.Patient safety has been linked with increasing importance to the growing popularity and consumption of dietary supplements (DS). DS are promoted and regulated as food without rigorous pre-marketing tests and Food and Drug Administration (FDA) approval, thus propagating concerns surrounding their safety and efficacy. The current post-marketing surveillance primarily relies on voluntarily submitted reports of suspected adverse events (AEs) caused by DS. A reporting schema such as this inherently suffers from underestimation and reporting bias. Additionally, there remains a paucity of clinical trials conducted to evaluate the pharmaceutical mechanisms and the safety of DS. The limitations mentioned above have created a critical need to use alternative data sources for active pharmacovigilance on DS safety, which can be addressed through leveraging clinical notes in the electronic health records (EHR), a valuable data source documenting comprehensive real-world information with respects to patient safety in the course of care visits. Therefore, the essential and fundamental step for advancing potential DS pharmacovigilance study is to automatically extract DS usage and safety information embedded in unstructured clinical notes. Natural language processing (NLP), and more precisely information extraction (IE), offers a set of enabling techniques and tools that can facilitate the automatic information extraction process. In this dissertation, IE methods have been developed and evaluated with the aim of extracting DS information from clinical notes. First, a study was conducted to demonstrate the feasibility of using word embeddings to expand the terminology of DS in clinical notes. Through the extrinsic evaluation, 14 commonly used DS semantic variants, brand names, and misspellings were expanded. Expanded terms have been shown to be valuable in notes/patients identification tasks, with more notes and patients retrieved compared with two sets of baseline terms. Second, to detect and extract the named entities of DS as well as their relations with events (i.e., indications or AEs), named entity recognition (NER) and relation extraction (RE) tasks have been performed. Both machine learning and deep learning methods were evaluated and compared in these two tasks. Deep learning models are found to be more efficient and scalable compared with machine learning models. Finally, machine learning-based and rule-based classifiers were built to automatically classify the use status of DS into four categories (i.e., Continuing, Discontinued, Started, Unclassified). The machine learning-based classifier performs better when the sample size doubles. The techniques and methods developed in this dissertation can be further integrated into existing EHR or NLP systems for automatic DS IE, which can potentially advance DS active surveillance and improve patient safety via clinical decision support.enBiomedical natural language processingClinical notesDietary supplementsElectronic health recordsInformation extractionpatient's safetyExtracting Information on Dietary Supplements from Clinical Notes in Electronic Health Record Systems Through Natural Language Processing TechniquesThesis or Dissertation