Urban Informatics is the practice of using computer technology to support core city functions: planning, governance and operations. This technology consists of hardware, software, databases, sensors, and communication devices used to develop and sustain more livable and healthier cities. Urban Informatics provides governments with the tools to make data-driven decisions regarding long-term plans, predict and respond to current and upcoming situations, and even help with day-to-day tasks such as monitoring water use and waste processing. New and immense location-aware datasets formally defined in this thesis as Spatial Big Data are emerging from a variety of sources and can be used to find novel and interesting patterns for use in urban informatics. Spatial big data is the key component driving the emerging field of Urban Informatics at the intersection of people, places, and technology. However, spatial big data presents challenges for existing spatial computing systems to store, process, and analyze such large datasets. With these challenges come new opportunities in many fields of computer science research, such as spatial data mining and spatial database systems. This thesis contains original research on two types of spatial big data, each study focusing on a different aspect of handling spatial big data (storage, processing, and analysis). Below we describe each data type through a real-world problem with challenges, related work, novel algorithmic solutions, and experimental analysis.
To address the challenge of analysis of spatial big data, we studied the problem of finding primary corridors in bicycle GPS datasets. Given a set of GPS trajectories on a road network, the goal of the All-Pair Network Trajectory Similarity (APNTS) problem is to calculate the similarity between all trajectories using the Network Hausdorff Distance. This problem is important for a variety of societal applications, such as facilitating greener travel via bicycle corridor identification. The APNTS problem is challenging due to the high cost of computing the exact Network Hausdorff Distance between trajectories in spatial big datasets. Previous work on the APNTS problem takes over 16 hours of computation time on a real-world dataset of bicycle GPS trajectories in Minneapolis, MN. In contrast, this work focuses on a scalable method for the APNTS problem using the idea of row-wise computation, resulting in a computation time of less than 6 minutes on the same datasets. We provide a case study for transportation services using a data-driven approach to identify primary bicycle corridors for public transportation by leveraging emerging GPS trajectory datasets. Experimental results on real-world and synthetic data show a two orders of magnitude improvement over previous work. To address the challenge of storage of spatial big data, we studied the problem of storing spatio-temporal networks in spatial database systems. Given a spatio-temporal network and a set of database query operators, the goal of the Storing Spatio-Temporal Networks (SSTN) problem is to produce an efficient data storage method that minimizes disk I/O access costs. Storing and accessing spatio-temporal networks is increasingly important in many societal applications such as transportation management and emergency planning. This problem is challenging due to strains on traditional adjacency list representations when storing temporal attribute values from the sizable increase in length of the time-series. Current approaches for the SSTN problem focus on orthogonal partitioning (e.g., snapshot, longitudinal, etc.), which may produce excessive I/O costs when performing traversal-based spatio-temporal network queries (e.g., route evaluation, arrival time prediction, etc) due to the desired nodes not being allocated to a common page. We propose a Lagrangian-Connectivity Partitioning (LCP) technique to efficiently store and access spatio-temporal networks that utilizes the interaction between nodes and edges in a network. Experimental evaluation using the Minneapolis, MN road network showed that LCP outperforms traditional orthogonal approaches. The work in this thesis the first step toward understanding the immense challenges and novel applications of Spatial Big Data Analytics for Urban Informatics. In this thesis, we define spatial big data and propose novel approaches for storing and analyzing two popular spatial big data types: GPS trajectories and spatio-temporal networks. We conclude the thesis by exploring future work in the processing of spatial big data.