Scalable GPU Algorithms for Similarity Query Processing and Clustering on Trajectory Data
2019-11
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Scalable GPU Algorithms for Similarity Query Processing and Clustering on Trajectory Data
Authors
Published Date
2019-11
Publisher
Type
Thesis or Dissertation
Abstract
With the increasing prevalence of location sensor devices like GPS, it has been possible to collect large datasets of a special type of spatio-temporal data called trajectory data. A trajectory is a discrete sequence of positions that a moving object occupies in space as time passes. Such large datasets enable researchers to study the behavior of the objects describing these movements by issuing spatial queries such as trajectory similarity queries and trajectory clustering. Top-K trajectory similarity queries retrieve the K most similar trajectories to a given query trajectory. This query has applications in many areas, such as urban planning, ecology and social networking; however, this query is computationally expensive. In this work, we introduce a new parallel top-K trajectory similarity query technique for GPUs, FastTopK, to deal with these challenges. Our experiments on two large real-life datasets showed that FastTopK produces on average 107.96X smaller candidate result sets, and 3.36X faster query execution times than the existing state of-the-art technique, TKSimGPU. The second type of trajectory query covered in this work is trajectory clustering. One important algorithm for clustering is DBSCAN, which is especially useful for finding clusters of arbitrary shapes. As opposed to other clustering techniques, like K-means, it does not require the number of clusters to be specified as an input parameter, and it is highly robust to outliers. However, DBSCAN has a worst-case quadratic time complexity that makes it difficult to handle large dataset sizes. To address this problem, several works have been proposed that exploit the massive parallelism of GPUs for DBSCAN clustering of point data. Nonetheless, none of these works have been experimentally compared against each other and none have been extended to cluster trajectory data. In this thesis, we review the existing GPU algorithms for DBSCAN clustering on point data and conduct the first experimental study comparing these GPU algorithms using three real-world datasets to identify the best performing algorithm. Our results show that G-DBSCAN is the fastest being up to 969X faster than CPU DBSCAN on 128K points, while CUDA-DClust is the best performing GPU algorithm in terms of execution time and memory requirements, performing 53X faster than CPU DBSCAN while taking up to 166X less memory than G-DBSCAN. Lastly, we use the work of our analysis of all the existing GPU-based DBSCAN clustering algorithms on point data to develop a new GPU-based trajectory clustering algorithm, GTRACLUS. Our experiments show that on two real world datasets, trajectory clustering can be made more than 20X faster than the CPU-based TRACLUS by using the proposed GTRACLUS algorithm.
Keywords
Description
University of Minnesota M.S. thesis.November 2019. Major: Computer Science. Advisor: Eleazar Leal. 1 computer file (PDF); x, 119 pages.
Related to
Replaces
License
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Mustafa, Hamza. (2019). Scalable GPU Algorithms for Similarity Query Processing and Clustering on Trajectory Data. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/211706.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.