Development Of Semi-Automated Tools To Map Cancer Research Common Data Elements To The Biomedical Research Integrated Domain Group Model

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Development Of Semi-Automated Tools To Map Cancer Research Common Data Elements To The Biomedical Research Integrated Domain Group Model

Published Date

2020-03

Publisher

Type

Thesis or Dissertation

Abstract

While using data standards can facilitate research by making it easier to share data, manually mapping to data standards creates an obstacle to their adoption. Semi-automated mapping strategies can reduce the manual mapping burden. This research addresses the mapping dilemma by applying well-established and emerging techniques to a real-world use case. First, machine learning approaches were used and evaluated to map Common Data Elements (CDEs) from the National Cancer Institute’s (NCI) cancer Data Standards Registry and Repository to the Biomedical Research Integrated Domain Group (BRIDG) model. Second, a graph database that incorporates the CDEs, BRIDG Model, and the NCI Thesaurus was developed and evaluated. A shortest path algorithm was then used to predict mappings from CDEs to classes in the BRIDG model. Finally, analysis was conducted to: determine the strengths and weaknesses of each approach; highlight data quality issues; and determine when either approach or a combination of the approaches provides the optimal results. The results indicate that an artificial neural network-based mapping tool is able to predict CDE to BRIDG class mappings with between 34 - 94% accuracy but is limited by the availability of training data. The results also show that a graph database can be used to map CDEs to BRIDG classes but is limited by the subjective nature of the mapping process. An optimal mapping tool combines machine learning and graph database techniques with the knowledge and experience of a human subject matter expert.

Description

University of Minnesota Ph.D. dissertation.March 2020. Major: Biomedical Informatics and Computational Biology. Advisors: Guoqian Jiang, Chad Myers. 1 computer file (PDF); xi, 110 pages + 1 supplemental file.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Renner, Robinette. (2020). Development Of Semi-Automated Tools To Map Cancer Research Common Data Elements To The Biomedical Research Integrated Domain Group Model. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215075.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.