Data mining techniques for enhancing protein function prediction
2010-04
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Data mining techniques for enhancing protein function prediction
Authors
Published Date
2010-04
Publisher
Type
Thesis or Dissertation
Abstract
Proteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the function(s) of the protein are inferred from those of other proteins that are "associated" to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein function prediction. We have also developed a method for incorporating the inter-relationships between functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein function prediction algoriths, which enabled us to improve the quality of predictions made for several functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters.
To conclude, we also suggest several opportunities for further progress on the very important problem of protein function prediction
Description
University of Minnesota Ph.D. dissertation. May 2010. Major: Computer Science. Advisor: Vipin Kumar.1 computer file (PDF); xv, 176 pages. Ill. (some col.)
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Pandey, Gaurav. (2010). Data mining techniques for enhancing protein function prediction. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/92467.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.