Association Analysis for Real-valued Data: Definitions and Application to Microarray Data

Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Association Analysis for Real-valued Data: Definitions and Application to Microarray Data

Published Date

2008-03-03

Publisher

Type

Report

Abstract

The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in several domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, the search schemes used by these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets that are constituted of binary and/or categorical variables, and their application to real-valued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In this paper, we propose a novel association analysis framework for exhaustively and efficiently mining range support patterns from such a data set. On one hand, this framework reduces the loss of information incurred by binarization- and discretization-based approaches, and on the other, it enables the exhaustive discovery of coherent biclusters. We compared the performance of our framework with two standard biclustering algorithms through the evaluation of the functional coherence on patterns/biclusters derived from microarray data. These experiments show that the real-valued patterns discovered by our framework are better enriched by small biologically interesting functional classes. We also demonstrate the complementarity between our framework and the commonly used biclustering algorithm ISA, using specific examples of patterns that are found and functions that are covered by the former but not the latter. The source code and data sets used in this paper are available at http://www.cs.umn.edu/vk/gaurav/rap.

Keywords

Description

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Pandey, Gaurav; Atluri, Gowtham; Steinbach, Michael; Myers, Chad L.; Kumar, Vipin. (2008). Association Analysis for Real-valued Data: Definitions and Application to Microarray Data. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215750.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.