TRUST: Clinical Text Retrieval and Use towards Scientific Rigor and Transparent Process

Rapid proliferation and adoption of the electronic health record (EHR) has led to seamless integration of clinical research into practice, and has facilitated healthcare decision-making through enabling accurate and timely supply of health information. Leveraging this supply of information, the Institute of Medicine envisioned the concept of continuously Learning Health Systems (LHS) in 2007, with the aim of first deriving knowledge from routine care data and then translating such knowledge into evidence-based clinical practice. To achieve such a vision, it is critical to have a robust data and informatics infrastructure with the following properties: 1) high-throughput and real-time methods for data retrieval, extraction, and analysis, 2) transparent and reproducible processes to ensure scientific rigor in clinical research, and 3) implementable and generalizable scientific findings. There are many approaches to the derivation of knowledge from care data, one of which is through the use of chart review: a common, albeit manual, approach to practice-based knowledge discovery. Traditionally, chart review is performed by manually reviewing patient medical records. As a significant portion of clinical information is represented in textual format, this manual approach can be time-consuming and costly. With the implementation of EHRs, chart review can be automated by extracting data from structured fields systematically and leveraging natural language processing (NLP) techniques to extract information from text. Rigorous development and evaluation of NLP algorithms for a specific chart review task requires, however, data abstraction and annotation (i.e., the manual creation of a gold standard clinical corpus to evaluate the developed NLP algorithm). In EHR-based settings, there is, however, a lack of standard processes or best practices for creating such a corpus due to the heterogeneity of institutional EHR systems and process variation between single and multi-site research settings. Recent advancement in healthcare AI identifies the need for detailed data provenance for data used in the training and validation of AI models. Secondary use of EHR for clinical research leveraging AI technologies such as NLP therefore requires the documentation of the provenance information relating to the process used for the retrieval and organization of the raw data used as well as the extraction and annotation of training data. We thus define this process as clinical Text Retrieval and Use towards Scientific rigor and Transparent (TRUST) process. As EHR-based research becomes increasingly integrated into clinical care, it is important to have a systematic understanding of the TRUST process, its corresponding utilization when developing informatics tools and methods, as well as its overall impact on research reproducibility. In this work, we propose a multi-phase method to develop informatics frameworks and best practices to ensure reproducible TRUST processes for single and multi-site studies. In the following chapters, we propose: 1) a definition of reproducibility in the context of the secondary use of EHRs, 2) methods to assess various levels of data heterogeneity caused by differing EHR systems and inter-institutional variations, 3) approaches to examine the implication of data heterogeneity to reproducibility, 4) steps to develop frameworks, best practices, and reporting standards conforming to the TRUST process, and 5) an application of the TRUST process in a real-world case study.

Keywords

Electronic Health Records

Information Provenance

Information Quality

Natural Language Processing

Reproducibility

Description

University of Minnesota Ph.D. dissertation. 2021. Major: Biomedical Informatics and Computational Biology. Advisors: Hongfang Liu, Yuk Sham. 1 computer file (PDF); 151 pages.

Collections

Dissertations

Suggested citation

Fu, Sunyang. (2021). TRUST: Clinical Text Retrieval and Use towards Scientific Rigor and Transparent Process. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/226410.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

TRUST: Clinical Text Retrieval and Use towards Scientific Rigor and Transparent Process

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

TRUST: Clinical Text Retrieval and Use towards Scientific Rigor and Transparent Process

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation