Bompelli, Anusha2020-10-262020-10-262020-08https://hdl.handle.net/11299/216774University of Minnesota M.S. thesis. August 2020. Major: Health Informatics. Advisor: Rui Zhang. 1 computer file (PDF); 44 pages.Dietary supplements (DSs) have been widely used in the U.S. and evaluated in clinical trials as potential interventions for various diseases. However, many clinical trials face challenges in recruiting enough eligible patients in a timely fashion, causing delays or even early termination. Using electronic health records to find eligible patients who meet clinical trial eligibility criteria has been shown as a promising way to assess recruitment feasibility and accelerate the recruitment process. Natural Language Processing (NLP) techniques have been used extensively to extract concepts from the clinical trial eligibility criteria. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. The study comprises of two parts. In the first part of the study, the objective was to (1) understand data elements associated with DS trials’ eligibility criteria and assess if they can be mapped to OMOP Common Data Model (CDM); (2) develop and evaluate NLP methods, especially deep learning-based models, for extracting eligibility criteria data elements. We analyzed the eligibility criteria of 100 randomly selected DS clinical trials and identified both computable and non-computable criteria. We mapped annotated entities to OMOP Common Data Model (CDM) with novel entities (e.g., DS). We also evaluated a deep learning model (Bi-LSTM-CRF) for extracting these entities on CLAMP platform, with an average F1 measure of 0.601. This study shows the feasibility of automatic parsing of the eligibility criteria following OMOP CDM for future cohort identification. In the second part of the study, the objective was to examine the performance of standard open-source clinical NLP systems for the task of Named Entity Recognition (NER) for a corpus outside of the domain for which these systems were developed. we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.enClinical Trial Eligibility CriteriaInformation ExtractionNamed Entity RecognitionNatural Language ProcessingNLP-ADAPTNatural Language Processing Methods to Automatically Parse Eligibility Criteria in Dietary Supplements Clinical TrialsThesis or Dissertation