Browsing by Author "Deshpande, Mukund"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item Automated Approaches for Classifying Structures(2002-06-26) Deshpande, Mukund; Kuramochi, Michihiro; Karypis, GeorgeIn this paper we study the problem of classifying chemical compound datasets. We present an algorithm that first mines the chemical compound dataset to discover discriminating sub-structures; these discriminating sub-structures are used as features to build a powerful classifier. The advantage of our classification technique is that it requires very little domain knowledge and can easily handle large chemical datasets. We evaluated the performance of our classifier on two widely available chemical compound datasets and have found it to give good results.Item Evaluation of Techniques for Classifying Biological Sequences*(2001-10-18) Deshpande, Mukund; Karypis, GeorgeIn recent years we have witnessed an exponential increase in the amount of biological information, either DNA or protein sequences, that has become available in public databases. This has been followed by an increased interestin developing computational techniques to automatically classify these large volumes of sequence data into variouscategories corresponding to either their role in the chromosomes, their structure, and/or their function. In this paper we evaluate some of the widely-used sequence classification algorithms and develop a framework for modeling sequences in a fashion so that traditional machine learning algorithms, such as support vector machines, can be applied easily. Our detailed experimental evaluation shows that the SVM-based approaches are able to achieve higher classification accuracy compared to the more traditional sequence classification algorithms such as Markov model based techniques and K-nearest neighbor based approaches.Item Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds(2003-03-10) Deshpande, Mukund; Kuramochi, Michihiro; Karypis, GeorgeIn this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model constructionand uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use ofhighly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and outperforms existing schemes by 10% to 35%, on the average.Item Item-Based Top-N Recommendation Algorithms(2003-01-20) Deshpande, Mukund; Karypis, GeorgeThe explosive growth of the world-wide-web and the emergence of e-commerce has led to the development ofrecommender systems-a personalized information filtering technology used to identify a set of N items that willbe of interest to a certain user. User-based collaborative filtering is the most successful technology for buildingrecommender systems to date, and is extensively used in many commercial recommender systems. Unfortunately, thecomputational complexity of these methods grows linearly with the number of customers that in typical commercialapplications can grow to be several millions. To address these scalability concerns item-based recommendationtechniques have been developed that analyze the user-item matrix to identify relations between the different items,and use these relations to compute the list of recommendations. In this paper we present one such class of item-based recommendation algorithms that first determine the similarities between the various items and then used them to identify the set of items to be recommended. The key steps in this class of algorithms are (i) the method used to compute the similarity between the items, and (ii) the method used to combine these similarities in order to compute the similaritybetween a basket of items and a candidate recommenderitem. Our experimental evaluation on nine real datasets show that the proposed item-based algorithms areup to two orders of magnitude faster than the traditionaluser-neighborhood based recommender systems and providerecommendations with comparable or better quality.Item Promoter Prediction of Prokaryotes(2001-07-23) Kuramochi, Michihiro; Deshpande, Mukund; Karypis, George; Zhang, Qing; Kapur, VivekThe availability of computational methods to identify and define the precise structure and location of promoters in prokaryotic genomes will provide a critical first step towards understanding the mechanisms by which genes are organized and regulated. We examine three different methods for promoter identification, two of which are adopted from related work and the other is a novel approach based on feature extraction. By the results of a set of experiments we evaluated prediction accuracy for identifying promoter regions fromnon-coding regions.Item Selective Markov Models for Predicting Web-Page Accesses(2000-10-30) Deshpande, Mukund; Karypis, GeorgeThe problem of predicting a user's behavior on a web-site has gained importance due to the rapid growth of the world-wide-web and the need to personalize and influence a user's browsing experience. Markov models and their variations have been found well suited for addressing this problem. Of the different variations or Markov models it is generally found that higher-order Markov models display high predictive accuracies. However higher order models are also extremely complicated due to their large number of states that increases their space and runtime requirements. In this paper we present different techniques for intelligently selecting parts of different order Markov models so that the resulting model has a reduced state complexity and improved prediction accuracy. We have tested our models on various datasets and have found that their performance is consistently superior to that obtained by higher-order Markov models.Item Using Conjunction of Attribute Values for Classification(2002-03-12) Deshpande, Mukund; Karypis, GeorgeAdvances in the efficient discovery of frequent itemsets in large databases have led to the development of a number of schemes that use frequent itemsets to aid in the development of accurate and efficient classifiers. These approaches use the frequent itemsets to generate a set of composite features that expand the dimensionality of the underlying dataset. In this paper, we build upon this work and (i) present a variety of schemes for composite feature selection that achieve a substantial reduction in the number of features without adversely affecting the accuracy gains, and (ii) show (both analytically and experimentally) that the composite feature space can lead to improved classification models in the context of support vector machines, in which the dimensionality can automatically be expanded by the use of appropriate kernel functions.Item wCLUTO: A Web-Enabled Clustering Toolkit(2003-02-19) Rasmussen, Matthew; Deshpande, Mukund; Karypis, George; Johnson, James; Crow, John A.; Retzel, Ernest F.As structural and functional genomics efforts provide the biological community with ever-broadening sets of inter-related data, the need to explore such complex information for subtle relationships expands. We present wCluto, a web-enabled version of the stand-alone application Cluto, designed to apply clustering methods to genomic information.Its first application is focused on the clustering transcriptome data from microarrays. Data can be uploaded by the user into the clustering tool, a choice of several clustering methods can be made and configured, and data is presented to the user in a variety of visual formats,including a three-dimensional "mountain" view of the clusters. Parameters can be explored to rapidly examine a variety of clustering results, and the resulting clusters can be downloaded either for manipulation by other programs or saved in a format for publication.