Clustering in a High-Dimensional Space Using Hypergraph Models
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Clustering in a High-Dimensional Space Using Hypergraph Models
Alternative title
Published Date
1997
Publisher
Type
Report
Abstract
Clustering of data in a large dimension space is of a great interest in many data mining applications.
Most of the traditional algorithms such as K-means or AutoCJass fail to produce meaningful clusters in such
data sets even when they are used with well known dimensionality reduction techniques such as Principal
Component Analysis and Latent Semantic Indexing. In this paper, we propose a method for clustering of
data in a high dimensional space based on a hypergraph model. The hypergraph model maps the relationship
present in the original data in high dimensional space into a hypergraph. A hyperedge ;epresents a
relationship (affinity) among subsets of data and the weight of the hyperedge reflects the strength of this
affinity. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the
corresponding data items in each partition are highly related and the weight of the hyperedges cut by the
partitioning is minimized. We present results of experiments on three different data sets: S&PSOO stock
data for the period of 1994-1996, protein coding data, and Web document data. Wherever aplicable, we
compared our results with those of AutoClass and K-means clustering algorithm on original data as well as
on the reduced dimensionality data obtained via Principal Component Analysis or Latent Semantic Indexing
scheme. These experiments demonstrate that our approach is applicable and effective in a wide range
of domains. More specifically, our approach performed much better than traditional schemes [or high dimensional
data sets in terms of quality of clusters and runtime. Our approach was also able to filter out
noise data from the clusters very effectively without compromising the quaJity of the clusters.
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 97-063
Funding information
This work was supported by NSF ASC-9634719, by Army Research Office contract DA/DAAH04-95-l-0538, by Army High Performance
Computing Research Center cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does
not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Additional support was provided
by the IBM Partnership Award. and by the IBM SUR equipment grant. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute.
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Han, Eui-Hong; Karypis, George; Kumar, Vipin; Mobasher, Bamshad. (1997). Clustering in a High-Dimensional Space Using Hypergraph Models. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215349.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.