Developing accessible informatics tools for integrated genomic-proteomic data analysis
2019-11
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Developing accessible informatics tools for integrated genomic-proteomic data analysis
Authors
Published Date
2019-11
Publisher
Type
Thesis or Dissertation
Abstract
Mass-spectrometry (MS) based proteomics is widely used to identify and quantify proteins present in biological samples. Emerging multi-omics approaches involve integrating next-generation DNA and RNA sequencing data with MS-based proteomic data to identify novel and known protein products (proteoforms) present in a sample that could be from a single organism (proteogenomics) or a community of organisms (metaproteomics). These methods can offer a more complete molecular picture of complex biological samples used in human health and environmental studies. In these MS-based proteomics approaches, tandem-mass-spectrometry (MS/MS) data derived from peptides is matched against a database containing amino-acid sequences translated from DNA or RNA sequencing to confirm the presence of proteoforms. However, proteogenomic and metaproteomic databases are significantly larger than those used in traditional MS-based proteomics, leading to decreased sensitivity for identifying true peptide spectrum matches (PSMs) for MS/MS matched to sequences in these databases. Once peptides are identified and used to infer protein presence and quantities, there is also a need of advanced tools to compare the response of proteins to their corresponding RNA transcripts, to analyze underlying molecular mechanisms of biology and disease. Ideally, all of these informatic tools would be accessible to lab scientists within a user-friendly platform, to promote wide-adoption and impact in diverse research studies. To address these challenges, we have developed software tools and workflows in the freely-available and user-friendly Galaxy bioinformatics platform, with the objective of providing solutions to MS-based proteomics multi-omics challenges and making them accessible to others. First, we implemented a novel database sectioning method, integrating it into the suite of tools developed for the Galaxy for proteomics (Galaxy-P) project, and evaluated its utility in metaproteomics, and proteogenomics applications. Second, we created a comprehensive workflow for proteogenomics that can efficiently utilize RNA and protein data to identify novel protein variants and proteoforms. Third, we developed a Galaxy-P based tool for comparing the abundance levels of RNA and proteins for integrated analysis of quantitative transcriptomic and proteomic datasets. Collectively, this work has delivered on our goals to develop accessible and reproducible software tools and workflows for more efficient matching of MS/MS data with large databases and also improve integrated analysis of multi-omics applications that can help enable new discoveries in biological and biomedical research.
Description
University of Minnesota Ph.D. dissertation. November 2019. Major: Biomedical Informatics and Computational Biology. Advisor: Timothy Griffin. 1 computer file (PDF); x, 151 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Kumar, Praveen. (2019). Developing accessible informatics tools for integrated genomic-proteomic data analysis. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/225893.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.