Browsing by Author "Kauffman, Christopher"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item A Generalized Framework for Protein Sequence Annotation(2007-10-15) Rangwala, Huzefa; Kauffman, Christopher; Karypis, GeorgeOver the last decade several data mining techniques have been developed for determining structural and functional properties of individual protein residues using sequence and sequence-derived information. These protein residue annotation problems are often formulated as either classification or regression problems and solved using a common set of techniques. We develop a generalized protein sequence annotation toolkit (prosat) for solving classification or regression problems using support vector machines. The key characteristic of our method is its effective use of window-based information to capture the local environment of a protein sequence residue. This window information is used with several kernel functions available within our framework. We show the effectiveness of using the previously developed normalized second order exponential kernel function and experiment with local window-based information at different levels of granularity. We report empirical results on a diverse set of classification and regression problems: prediction of solvent accessibility, secondary structure, local structure alphabet, transmembrane helices, DNA-protein interaction sites, contact order, and regions of disorder are all explored. Our methods show either comparable or superior results to several state-of-the-art application tuned prediction methods for these problems. prosat provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems. The results of some of these predictions can be used to assist in solving the overarching 3D structure prediction problem.Item An Analysis of Information Content Present in Protein-DNA Interactions(2007-09-11) Kauffman, Christopher; Karypis, GeorgeUnderstanding the role proteins play in regulating DNA replication is essential to forming a complete picture of how the genome manifests itself. In this work, we examine the feasibility of predicting the residues of a protein essential to binding by analyzing protein-DNA interactions from an information theoretic perspective. Through the lens of mutual information, we explore which properties of protein sequence and structure are most useful in determining binding residues with a particular focus on sequence features. We find that the quantity of information carried in most features is small with respect to DNA-contacting residues, the bulk being provided by sequence features along with a select few structural features. Supplemental information for this article is available at http://www.cs.umn.edu/~kauffman/supplements/psb2008Item Finding Functionally Related Genes by Local and Global Analysis of MEDLINE Abstracts(2004-06-29) Nakken, Sigve; Kauffman, Christopher; Karypis, GeorgeDiscovery of biological relationships between genes is one of the keys to understanding the complex functional nature of the human genome. Currently, most of the knowledge about interrelating genes are found in immense amounts of various biomedical literature. Hence, extraction of biological contexts occurring in free text represents a valuable tool in gaining knowledge about gene interactions. We present a textual analysis of documents associated with pairs of genes, and describe how this approach can be used to discover and annotate functional relationships among genes. A study on a subset of human genes show that our analysis tool can act as a ranking mechanism for sets of genes based on their functional relatedness.Item Improving Homology Models for Protein-Ligand Binding Sites(2008-04-04) Kauffman, Christopher; Rangwala, Huzefa; Karypis, GeorgeIn order to improve the prediction of protein-ligand binding sites through homology modeling, we incorporate knowledge of the binding residues into the modeling framework. Residues are identi?ed as binding or nonbinding based on their true labels as well as labels predicted from structure and sequence. The sequence predictions were made using a support vector machine framework which employs a sophisticated window-based kernel. Binding labels are used with a very sensitive sequence alignment method to align the target and template. Relevant parameters governing the alignment process are searched for optimal values. Based on our results, homology models of the binding site can be improved if a priori knowledge of the binding residues is available. For target-template pairs with low sequence identity and high structural diversity our sequence-based prediction method provided sufficient information to realize this improvement.