Samad, Abdul2021-12-162021-12-162021-05https://hdl.handle.net/11299/225666University of Minnesota M.S. thesis. May 2021. Major: Computer Science. Advisor: Eleazar Leal. 1 computer file (PDF); viii, 76 pages.The popularity of location-based social media and GPS-enabled mobile devices has produced a large amount of streaming trajectory data. Each streaming trajectory consists of the sequence of positions that a moving object occupies in time and is generated in an online fashion, coming at high speed. Disciplines such as social networking, urban planning, ecology, and epidemiology have great interest in querying this type of data. However, the large volume of streaming trajectories poses scalability challenges that can be addressed by efficient indexing structures and in-memory distributed architectures such as Spark. Despite this, no streaming trajectory query processing algorithm has been proposed that uses indices and distributed architectures to tackle this large-scale problem. To address this, we propose a novel in-memory predictive multi-level indexing technique, called PIMMLI, that leverages the distributed Spark Streaming framework to process spatio-temporal queries on streaming trajectories in an efficient manner. We evaluated the effectiveness of PIMMLI on 3 real-life large-scale datasets. These experiments showed that PIMMLI had an average improvement of 3.5X in total query execution and indexing time over DITA, an existing state-of-the-art batch processing algorithm for spatio-temporal querying on trajectories, and of 34.09X in query execution time over an approach that uses no indices.endistributed computingpredictive indexingrange queryspark streamingtrajectoryPIMMLI: Predictive In-Memory Multi-Level Indexing for Distributed Trajectory StreamsThesis or Dissertation