PL2AP: Fast Parallel Cosine Similarity Search
2015-11-16
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
PL2AP: Fast Parallel Cosine Similarity Search
Alternative title
Authors
Published Date
2015-11-16
Publisher
Type
Report
Abstract
Solving the AllPairs similarity search problem entails finding all pairs
of vectors in a high dimensional sparse dataset that have a similarity
value higher than a given threshold. The output form this problem is
a crucial component in many real-world applications, such as clustering,
online advertising, recommender systems, near-duplicate document detection,
and query refinement. A number of serial algorithms have been proposed that
solve the problem by pruning many of the possible similarity candidates for
each query object, after accessing only a few of their non-zero values.
The pruning process results in unpredictable memory access patterns that
can reduce search efficiency. In this context, we introduce pL2AP, which
efficiently solves the AllPairs cosine similarity search problem in
a multi-core environment. Our method uses a number of cache-tiling optimizations,
combined with fine-grained dynamically balanced parallel tasks, to solve
the problem 1.5x--232x faster than existing parallel baselines on datasets
with hundreds of millions of non-zeros.
Keywords
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 15-017
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Anastasiu, David C.; Karypis, George. (2015). PL2AP: Fast Parallel Cosine Similarity Search. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215982.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.