Developing accessible informatics tools for integrated genomic-proteomic data analysis

Mass-spectrometry (MS) based proteomics is widely used to identify and quantify proteins present in biological samples. Emerging multi-omics approaches involve integrating next-generation DNA and RNA sequencing data with MS-based proteomic data to identify novel and known protein products (proteoforms) present in a sample that could be from a single organism (proteogenomics) or a community of organisms (metaproteomics). These methods can offer a more complete molecular picture of complex biological samples used in human health and environmental studies. In these MS-based proteomics approaches, tandem-mass-spectrometry (MS/MS) data derived from peptides is matched against a database containing amino-acid sequences translated from DNA or RNA sequencing to confirm the presence of proteoforms. However, proteogenomic and metaproteomic databases are significantly larger than those used in traditional MS-based proteomics, leading to decreased sensitivity for identifying true peptide spectrum matches (PSMs) for MS/MS matched to sequences in these databases. Once peptides are identified and used to infer protein presence and quantities, there is also a need of advanced tools to compare the response of proteins to their corresponding RNA transcripts, to analyze underlying molecular mechanisms of biology and disease. Ideally, all of these informatic tools would be accessible to lab scientists within a user-friendly platform, to promote wide-adoption and impact in diverse research studies. To address these challenges, we have developed software tools and workflows in the freely-available and user-friendly Galaxy bioinformatics platform, with the objective of providing solutions to MS-based proteomics multi-omics challenges and making them accessible to others. First, we implemented a novel database sectioning method, integrating it into the suite of tools developed for the Galaxy for proteomics (Galaxy-P) project, and evaluated its utility in metaproteomics, and proteogenomics applications. Second, we created a comprehensive workflow for proteogenomics that can efficiently utilize RNA and protein data to identify novel protein variants and proteoforms. Third, we developed a Galaxy-P based tool for comparing the abundance levels of RNA and proteins for integrated analysis of quantitative transcriptomic and proteomic datasets. Collectively, this work has delivered on our goals to develop accessible and reproducible software tools and workflows for more efficient matching of MS/MS data with large databases and also improve integrated analysis of multi-omics applications that can help enable new discoveries in biological and biomedical research.

Keywords

Bioinformatics

metaproteomics

proteogenomics

proteomics

quantitative proteo-transcriptomics

tandem mass spectrometry

Description

University of Minnesota Ph.D. dissertation. November 2019. Major: Biomedical Informatics and Computational Biology. Advisor: Timothy Griffin. 1 computer file (PDF); x, 151 pages.

Collections

Dissertations

Suggested citation

Kumar, Praveen. (2019). Developing accessible informatics tools for integrated genomic-proteomic data analysis. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/225893.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Developing accessible informatics tools for integrated genomic-proteomic data analysis

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Developing accessible informatics tools for integrated genomic-proteomic data analysis

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation