Natural Language Processing Methods to Automatically Parse Eligibility Criteria in Dietary Supplements Clinical Trials

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Natural Language Processing Methods to Automatically Parse Eligibility Criteria in Dietary Supplements Clinical Trials

Published Date

2020-08

Publisher

Type

Thesis or Dissertation

Abstract

Dietary supplements (DSs) have been widely used in the U.S. and evaluated in clinical trials as potential interventions for various diseases. However, many clinical trials face challenges in recruiting enough eligible patients in a timely fashion, causing delays or even early termination. Using electronic health records to find eligible patients who meet clinical trial eligibility criteria has been shown as a promising way to assess recruitment feasibility and accelerate the recruitment process. Natural Language Processing (NLP) techniques have been used extensively to extract concepts from the clinical trial eligibility criteria. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. The study comprises of two parts. In the first part of the study, the objective was to (1) understand data elements associated with DS trials’ eligibility criteria and assess if they can be mapped to OMOP Common Data Model (CDM); (2) develop and evaluate NLP methods, especially deep learning-based models, for extracting eligibility criteria data elements. We analyzed the eligibility criteria of 100 randomly selected DS clinical trials and identified both computable and non-computable criteria. We mapped annotated entities to OMOP Common Data Model (CDM) with novel entities (e.g., DS). We also evaluated a deep learning model (Bi-LSTM-CRF) for extracting these entities on CLAMP platform, with an average F1 measure of 0.601. This study shows the feasibility of automatic parsing of the eligibility criteria following OMOP CDM for future cohort identification. In the second part of the study, the objective was to examine the performance of standard open-source clinical NLP systems for the task of Named Entity Recognition (NER) for a corpus outside of the domain for which these systems were developed. we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.

Description

University of Minnesota M.S. thesis. August 2020. Major: Health Informatics. Advisor: Rui Zhang. 1 computer file (PDF); 44 pages.

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Bompelli, Anusha. (2020). Natural Language Processing Methods to Automatically Parse Eligibility Criteria in Dietary Supplements Clinical Trials. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/216774.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.