CLUTO - A Clustering Toolkit

Clustering algorithms divide data into meaningful or useful groups, called clusters, such that the intra-cluster similarity is maximized and the inter-cluster similarity is minimized. These discovered clusters can be used to explain the characteristics of the underlying data distribution andthus serve as the foundation for various data mining and analysis techniques. The applications of clustering include characterization of different customer groups based upon purchasing patterns, categorization of documents onthe World Wide Web, grouping of genes and proteins that have similar functionality, grouping of spatial locations prone to earth quakes from seismological data, etc. CLUTO is a software package for clustering low and high dimensional datasets and for analyzing the characteristics of the various clusters. CLUTO provides three different classes of clustering algorithms thatoperate either directly in the object's feature space or in the object'ssimilarity space. These algorithms are based on the partitional, agglomerative, and graph-partitioning paradigms. A key feature in most of CLUTO's clustering algorithms is that they treat the clustering problem as an optimization process which seeks to maximize or minimizea particular clustering criterion function defined either globally or locally over the entire clustering solution space. CLUTO provides a total of seven different criterion functions that can be used to drive both partitional and agglomerative clustering algorithms. Most of these criterion functions have been shown to produce high qualityclustering solutions in high dimensional datasets, especially those arising in document clustering. In addition to these criterion functions, CLUTO providessome of the more traditional local criteria (e.g., single-link, complete-link, and UPGMA) that can be used in the context of agglomerative clustering. Furthermore, CLUTO provides graph-partitioning-based clustering algorithms that are well-suited for finding clusters that form contiguous regions that span different dimensions of the underlying feature space. CLUTO's distribution consists of both stand-alone programs for clustering and analyzing these clusters, as well as, a library via which anapplication program can access directly the various clustering and analysis algorithms implemented in CLUTO.

Collections

Computer Science & Engineering (CS&E) Technical Reports

Series/Report Number

Technical Report; 02-017

Suggested citation

Karypis, George. (2002). CLUTO - A Clustering Toolkit. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215521.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

CLUTO - A Clustering Toolkit

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

CLUTO - A Clustering Toolkit

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation