Development Of Semi-Automated Tools To Map Cancer Research Common Data Elements To The Biomedical Research Integrated Domain Group Model
2020-03
Loading...
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Development Of Semi-Automated Tools To Map Cancer Research Common Data Elements To The Biomedical Research Integrated Domain Group Model
Authors
Published Date
2020-03
Publisher
Type
Thesis or Dissertation
Abstract
While using data standards can facilitate research by making it easier to share data, manually mapping to data standards creates an obstacle to their adoption. Semi-automated mapping strategies can reduce the manual mapping burden. This research addresses the mapping dilemma by applying well-established and emerging techniques to a real-world use case. First, machine learning approaches were used and evaluated to map Common Data Elements (CDEs) from the National Cancer Institute’s (NCI) cancer Data Standards Registry and Repository to the Biomedical Research Integrated Domain Group (BRIDG) model. Second, a graph database that incorporates the CDEs, BRIDG Model, and the NCI Thesaurus was developed and evaluated. A shortest path algorithm was then used to predict mappings from CDEs to classes in the BRIDG model. Finally, analysis was conducted to: determine the strengths and weaknesses of each approach; highlight data quality issues; and determine when either approach or a combination of the approaches provides the optimal results. The results indicate that an artificial neural network-based mapping tool is able to predict CDE to BRIDG class mappings with between 34 - 94% accuracy but is limited by the availability of training data. The results also show that a graph database can be used to map CDEs to BRIDG classes but is limited by the subjective nature of the mapping process. An optimal mapping tool combines machine learning and graph database techniques with the knowledge and experience of a human subject matter expert.
Description
University of Minnesota Ph.D. dissertation.March 2020. Major: Biomedical Informatics and Computational Biology. Advisors: Guoqian Jiang, Chad Myers. 1 computer file (PDF); xi, 110 pages + 1 supplemental file.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Renner, Robinette. (2020). Development Of Semi-Automated Tools To Map Cancer Research Common Data Elements To The Biomedical Research Integrated Domain Group Model. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215075.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.