Developing accessible informatics tools for integrated genomic-proteomic data analysis

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Developing accessible informatics tools for integrated genomic-proteomic data analysis

Published Date

2019-11

Publisher

Type

Thesis or Dissertation

Abstract

Mass-spectrometry (MS) based proteomics is widely used to identify and quantify proteins present in biological samples. Emerging multi-omics approaches involve integrating next-generation DNA and RNA sequencing data with MS-based proteomic data to identify novel and known protein products (proteoforms) present in a sample that could be from a single organism (proteogenomics) or a community of organisms (metaproteomics). These methods can offer a more complete molecular picture of complex biological samples used in human health and environmental studies. In these MS-based proteomics approaches, tandem-mass-spectrometry (MS/MS) data derived from peptides is matched against a database containing amino-acid sequences translated from DNA or RNA sequencing to confirm the presence of proteoforms. However, proteogenomic and metaproteomic databases are significantly larger than those used in traditional MS-based proteomics, leading to decreased sensitivity for identifying true peptide spectrum matches (PSMs) for MS/MS matched to sequences in these databases. Once peptides are identified and used to infer protein presence and quantities, there is also a need of advanced tools to compare the response of proteins to their corresponding RNA transcripts, to analyze underlying molecular mechanisms of biology and disease. Ideally, all of these informatic tools would be accessible to lab scientists within a user-friendly platform, to promote wide-adoption and impact in diverse research studies. To address these challenges, we have developed software tools and workflows in the freely-available and user-friendly Galaxy bioinformatics platform, with the objective of providing solutions to MS-based proteomics multi-omics challenges and making them accessible to others. First, we implemented a novel database sectioning method, integrating it into the suite of tools developed for the Galaxy for proteomics (Galaxy-P) project, and evaluated its utility in metaproteomics, and proteogenomics applications. Second, we created a comprehensive workflow for proteogenomics that can efficiently utilize RNA and protein data to identify novel protein variants and proteoforms. Third, we developed a Galaxy-P based tool for comparing the abundance levels of RNA and proteins for integrated analysis of quantitative transcriptomic and proteomic datasets. Collectively, this work has delivered on our goals to develop accessible and reproducible software tools and workflows for more efficient matching of MS/MS data with large databases and also improve integrated analysis of multi-omics applications that can help enable new discoveries in biological and biomedical research.

Description

University of Minnesota Ph.D. dissertation. November 2019. Major: Biomedical Informatics and Computational Biology. Advisor: Timothy Griffin. 1 computer file (PDF); x, 151 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Kumar, Praveen. (2019). Developing accessible informatics tools for integrated genomic-proteomic data analysis. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/225893.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.