TAPER: An Efficient Two-Step Approach for All-Pairs Correlation Query in Transaction Databases

Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

TAPER: An Efficient Two-Step Approach for All-Pairs Correlation Query in Transaction Databases

Alternative title

Published Date

2003-05-01

Publisher

Type

Report

Abstract

Given a user-specified minimum correlation threshold and atransaction database with N items, all-strong-pairs correlation query finds all item pairs with correlations above the threshold. However, when the number of items and transactions are large, the computation cost of this query can be very high. In this paper, we identify an upper bound of Pearson's correlation coefficient for binary variables. This upper bound is not only much cheaper to compute thanPearson's correlation coefficient but also exhibits a special monotone property which allows pruning of many item pairs even without computing their upper bounds. A two-step all-strong-pairs correlation query (TAPER) algorithm is proposed to exploit these properties in a filter-and-refine manner. Furthermore, we provide an algebraic cost model which shows that the computation savings from pruning is independent or improves when the number of items is increased in data sets with common Zipf or linear rank-support distributions. Experimental results from synthetic and real data sets exhibit similar trends and show that the TAPER algorithm can be an order of magnitude faster thanbrute-force alternatives. Furthermore, as demonstrated by our experiments on both realand synthetic data sets, TAPER can achieve high pruning ratio and outperform brute-force approach several orders of magnitude especially for high correlation thresholds. Finally, in terms of scalability, the pruning ratio of TAPER can be kept no change or even slightly increase with the increase of the number of attributes.

Keywords

Description

Related to

Replaces

License

Series/Report Number

Technical Report; 03-020

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Xiong, Hui; Shekhar, Shashi; Tan, Pang-ning; Kumar, Vipin. (2003). TAPER: An Efficient Two-Step Approach for All-Pairs Correlation Query in Transaction Databases. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215563.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.