Browsing by Subject "association rules"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Clustering Based on Association Rule Hypergraphs(1997) Han, Euihong; Karypis, George; Kumar, Vipin; Mobasher, BamshadTraditional clustering algorithms, used in data mining for transactional databases, arc mainly concerned with grouping transactions, but they do not generally provide an adequate mechanism for grouping items found within these transactions. Item clustering, on the other hand, can be useful in many data mining applications. We propose a new method for clustering related items in transactional databases that is based on partitioning an association rule hypcrgraph, where each association rule defines a hyperedge. We also discuss some of the applications of item clustering, such as the discovery of meta-rules among item clusters, and clustering of transactions. We evaluated our scheme experimentally on data from a number of domains, and, wherever applicable, compared it with AutoClass. In our experiment with stock-market data, our clustering scheme is able to successfully group stocks that belong to the same industry group. In the experiment with congressional voting data, this method is quite effective in finding clusters of transactions that correspond to either democrat or republican voting patterns. We found clusters of segments of protein-coding sequences from protein coding database that share the same functionality and thus are very valuable to biologist for determining functionality of new proteins. We also found clusters of related words in documents retrieved from the World Wide Web (a common and important application in information retrieval). These experiments demonstrate that our approach holds promise in a wide range of domains, and is much faster than traditional clustering algorithms such as AutoClass.Item Clustering in a High-Dimensional Space Using Hypergraph Models(1997) Han, Eui-Hong; Karypis, George; Kumar, Vipin; Mobasher, BamshadClustering of data in a large dimension space is of a great interest in many data mining applications. Most of the traditional algorithms such as K-means or AutoCJass fail to produce meaningful clusters in such data sets even when they are used with well known dimensionality reduction techniques such as Principal Component Analysis and Latent Semantic Indexing. In this paper, we propose a method for clustering of data in a high dimensional space based on a hypergraph model. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge ;epresents a relationship (affinity) among subsets of data and the weight of the hyperedge reflects the strength of this affinity. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. We present results of experiments on three different data sets: S&PSOO stock data for the period of 1994-1996, protein coding data, and Web document data. Wherever aplicable, we compared our results with those of AutoClass and K-means clustering algorithm on original data as well as on the reduced dimensionality data obtained via Principal Component Analysis or Latent Semantic Indexing scheme. These experiments demonstrate that our approach is applicable and effective in a wide range of domains. More specifically, our approach performed much better than traditional schemes [or high dimensional data sets in terms of quality of clusters and runtime. Our approach was also able to filter out noise data from the clusters very effectively without compromising the quaJity of the clusters.Item Web Mining: Information and Pattern Discovery on the World Wide Web(1997) Cooley, Robert; Mobasher, Bamshad; Srivastava, JaideepTwo important and active areas of current research are data mining and the World Wide Web. A natural combination of the two areas, sometimes referred to as Web mining, has been the focus of several recent research projects and papers. As with any emerging research area there is no established vocabulary, leading to confusion when comparing research efforts. Different terms for the same concept or different definitions being attached to the same word are commonplace. The term Web mining has been used in two distinct ways. The first, which is referred to as Web content mining in this paper, describes the process of information or resource discovery from millions of sources across the World Wide Web. The second, which we call Web usage mining, is the process of mining Web access logs or other user information user browsing and access patterns on one or more Web localities. In this paper we define Web mining and, in particular, present an overview of the various research issues, techniques, and development efforts in Web content mining and Web usage mining. We focus mainly on the problems and proposed techniques associated with Web usage mining as an emerging research area. We also present a general architecture for Web usage mining and briefly describe the WEBMINER, a system based on the proposed architecture. We conclude this paper by listing issues that need the attention of the research community.