Li, Yan2022-08-292022-08-292022-05https://hdl.handle.net/11299/241426University of Minnesota Ph.D. dissertation. May 2022. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); x, 132 pages.Geospatial artificial intelligence (GeoAI) is the generalization of conventional artificial intelligence (AI) to meet the challenges posed by spatial data. Spatial data, i.e., data annotated with spatial information such as locations and shapes, has been growing available over the last decade and transformed lives by providing novel ways of observing the world, knowing places and the relations between them. For example, large amount of onboard diagnostics data from vehicles becomes available with the popularity of telematics devices equipped with GPS chips and makes monitoring vehicles’ real-world performance possible, which is valuable for domains such as vehicle mechanics, transportation science, and city planning. In many other domains such as smart city and public health, spatial data becomes critical as well. For example, during the Covid-19 pandemic period, mobile tracking data from devices with GPS chips has been used as an important way of contact tracing and traveling pattern surveying. A McKinsey Digital report estimates that personal spatial data could help save consumers about $600 billion by 2020.Recent years have witnessed significant advances in AI in both academia and industry. Its fast development is powered by big data and high-performance computing platforms that support the development, training, and deployment of AI methods with reasonable cost. Even though spatial data are critical, valuable, and collected in a large scale, and AI techniques have been applied to many problems such as computer vision and natural language processing successfully, spatial data pose great challenges to conventional AI techniques. The first challenge is the gap between AI techniques and domain knowledge. Conventional AI techniques rarely consider domain knowledge (e.g., physics laws and epidemiology models), making their results hard to interpret and susceptible to violate domain constraints even with large volumes of data. On the other hand, domain knowledge by itself is insufficient due to its reliance on simplifying assumptions that may not approximate the complex real-world scenarios well. The other challenges are caused by the properties of spatial data, namely, spatial autocorrelation, spatial heterogeneity, and spatial continuity. Spatial autocorrelation describes the fact that the data samples (e.g., temperature, precipitation) at different spatial locations are correlated with each other and are affected by their geographical neighbors, which violates the common i.i.d. (i.e., independent and identical distribution) assumption underlying many machine learning models. Spatial heterogeneity refers to the fact that the data samples at different spatial locations are different from each other, so there may not be universal models that are applicable globally. Spatial continuity refers to the fact that the conflict between the continuity of the geographic space and the discrete representation of spatial data. This thesis investigates novel and societally important GeoAI techniques for emerging spatial datasets such as multi-attributed trajectories and categorical point sets. Multiple novel approaches are proposed to address challenges posed by the datasets on conventional AI techniques. Specifically, a Quad-Grid Filter & Refine algorithm is introduced to detect local spatial colocation patterns, which consider the spatial heterogeneity property of colocation patterns. The algorithm can detect colocation patterns that may not be prevalent globally but are prevalent in local regions, and it is much more computationally efficient than the baseline algorithm. Second, the thesis investigate the problem of discovering contrasting spatial colocation patterns that have different prevalence in two groups of spatial datasets. It leverages the domain knowledge that neighborhood relationships between categorical spatial objects may convey important information, and introduces a filter & refine algorithm using the anti-monotone property of a proposed metric to measure the prevalence difference of any colocation patterns in the two groups. Third, the thesis discusses a point-set classification method for multiplexed pathology images. Inspired by the domain assumption that the spatial configuration of cells may vary under different health conditions, this thesis introduces a neural network architecture to capture the spatial configurations of categorical point sets through modeling pairwise relationships. Last, the thesis introduces a physics-guided K-means algorithms to estimate the energy consumption for a vehicle to travel along a path, which is a combination of physics laws followed by vehicle energy consumption and a machine learning model. The thesis also proposes a path-centric path selection algorithm using the proposed energy consumption estimation model considering the spatial autocorrelation property of the data.enGeoAI for Emerging Spatial DatasetsThesis or Dissertation