Spatial data have tremendous value and are necessary components in many important societal applications. In recent years, our world has been witnessing a revolution brought by spatial technologies (e.g., Google Maps, Waze, Uber, Lyft, Grubhub, Lime, autonomous driving). According to a McKinsey Global Institute report, location data will generate about $600 billion annual revenue by 2020 with applications in energy, health, retail, etc. The world's economy also heavily relies on location and time data from over 2 billion GPS receivers, and these data are essential to many applications such as banks, airlines, police, emergency services, and telecommunications. Meanwhile, new types of spatial data are emerging at unprecedented scales and varieties (e.g., 25GB/hour per connected vehicle, 47.7PB per year by NASA by 2022). While spatial data are critical, valuable and collected at massive scales, they pose great challenges to traditional artificial intelligence (AI) techniques when applied to important societal problems. This thesis addresses three of these challenges. First, spatial data (e.g., crime or disease distribution, air quality) are often directly linked to our lived environments. As a result, decisions made on such data tend to have direct impacts on the life of citizens, and thus require statistical robustness to avoid errors which can have high economic and social costs (e.g., false alarm of a crime hotspot). Second, spatial data exhibit interdependency and variability, violating the common i.i.d. (identically and independently distributed) assumption in traditional statistics. This introduces new challenges to traditional optimization problems where spatial interdependency between nearby locations is often neglected and understudied (e.g., spatial contiguity required in land allocation). Finally, data and domain knowledge gaps are common in geospatial problems. For example, while Earth observation imagery is available in the tens of petabytes, there is very limited training data for many important objects or events (e.g., tree data for preventing fires and power blackouts) and expert knowledge is often required to create such data. This thesis investigates novel GeoAI techniques to explicitly address these challenges posed by spatial data and problems in three types of AI tasks: learning (i.e., unsupervised clustering); planning (i.e., spatial constraints and optimization); and perception (i.e., geospatial object mapping). First, the thesis proposes a significant DBSCAN approach for statistically-robust clustering to control the rate of spurious patterns. This work introduces a modeling of statistical significance for DBSCAN as well as a dual-convergence algorithm to speed up the computation. Second, the thesis proposes a fragmentation-free spatial allocation algorithm to explicitly model interdependency constraints among decision variables during optimization. Specifically, it introduces an optimization formulation with new spatial decision variables to model spatial contiguity and regularity constraints. It also proposes a hierarchical fragmentation elimination algorithm as well as a multi-layer integral image to efficiently solve the problem in a heuristic manner. Third, the thesis proposes a domain-knowledge assisted learning framework (i.e., TIMBER) to map geospatial objects (i.e., trees) with limited training data. The TIMBER framework introduces a geometric optimization formulation and a fast solver to generate candidates of tree-like structures for the deep learning model, which greatly reduces the difficulty of learning as well as the huge demand on training data. It also proposes a core object reduction algorithm to improve the computational performance. Extensive experiments and case studies show that the proposed approaches greatly outperform existing work in solution quality, and the proposed acceleration techniques greatly reduce the computational cost.