PL2AP: Fast Parallel Cosine Similarity Search

Anastasiu, David C.
Karypis, George
2015-11-16
Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

PL2AP: Fast Parallel Cosine Similarity Search

Published Date

2015-11-16

Publisher

Type

Report

Abstract

Solving the AllPairs similarity search problem entails finding all pairs of vectors in a high dimensional sparse dataset that have a similarity value higher than a given threshold. The output form this problem is a crucial component in many real-world applications, such as clustering, online advertising, recommender systems, near-duplicate document detection, and query refinement. A number of serial algorithms have been proposed that solve the problem by pruning many of the possible similarity candidates for each query object, after accessing only a few of their non-zero values. The pruning process results in unpredictable memory access patterns that can reduce search efficiency. In this context, we introduce pL2AP, which efficiently solves the AllPairs cosine similarity search problem in a multi-core environment. Our method uses a number of cache-tiling optimizations, combined with fine-grained dynamically balanced parallel tasks, to solve the problem 1.5x--232x faster than existing parallel baselines on datasets with hundreds of millions of non-zeros.

Keywords

Description

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Anastasiu, David C.; Karypis, George. (2015). PL2AP: Fast Parallel Cosine Similarity Search. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215982.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.