Scalable GPU Algorithms for Similarity Query Processing and Clustering on Trajectory Data

Mustafa, Hamza2020-02-262020-02-262019-11https://hdl.handle.net/11299/211706University of Minnesota M.S. thesis.November 2019. Major: Computer Science. Advisor: Eleazar Leal. 1 computer file (PDF); x, 119 pages.With the increasing prevalence of location sensor devices like GPS, it has been possible to collect large datasets of a special type of spatio-temporal data called trajectory data. A trajectory is a discrete sequence of positions that a moving object occupies in space as time passes. Such large datasets enable researchers to study the behavior of the objects describing these movements by issuing spatial queries such as trajectory similarity queries and trajectory clustering. Top-K trajectory similarity queries retrieve the K most similar trajectories to a given query trajectory. This query has applications in many areas, such as urban planning, ecology and social networking; however, this query is computationally expensive. In this work, we introduce a new parallel top-K trajectory similarity query technique for GPUs, FastTopK, to deal with these challenges. Our experiments on two large real-life datasets showed that FastTopK produces on average 107.96X smaller candidate result sets, and 3.36X faster query execution times than the existing state of-the-art technique, TKSimGPU. The second type of trajectory query covered in this work is trajectory clustering. One important algorithm for clustering is DBSCAN, which is especially useful for ﬁnding clusters of arbitrary shapes. As opposed to other clustering techniques, like K-means, it does not require the number of clusters to be speciﬁed as an input parameter, and it is highly robust to outliers. However, DBSCAN has a worst-case quadratic time complexity that makes it diﬃcult to handle large dataset sizes. To address this problem, several works have been proposed that exploit the massive parallelism of GPUs for DBSCAN clustering of point data. Nonetheless, none of these works have been experimentally compared against each other and none have been extended to cluster trajectory data. In this thesis, we review the existing GPU algorithms for DBSCAN clustering on point data and conduct the ﬁrst experimental study comparing these GPU algorithms using three real-world datasets to identify the best performing algorithm. Our results show that G-DBSCAN is the fastest being up to 969X faster than CPU DBSCAN on 128K points, while CUDA-DClust is the best performing GPU algorithm in terms of execution time and memory requirements, performing 53X faster than CPU DBSCAN while taking up to 166X less memory than G-DBSCAN. Lastly, we use the work of our analysis of all the existing GPU-based DBSCAN clustering algorithms on point data to develop a new GPU-based trajectory clustering algorithm, GTRACLUS. Our experiments show that on two real world datasets, trajectory clustering can be made more than 20X faster than the CPU-based TRACLUS by using the proposed GTRACLUS algorithm.enclusteringCUDADBSCANGPUsimilarity querytrajectoryScalable GPU Algorithms for Similarity Query Processing and Clustering on Trajectory DataThesis or Dissertation