Adopting Markov Logic Networks for Big Spatial Data and Applications

Sabek, Ibrahim2020-05-042020-05-042020-01https://hdl.handle.net/11299/213095University of Minnesota Ph.D. dissertation. January 2020. Major: Computer Science. Advisor: Mohamed Mokbel. 1 computer file (PDF); 140 pages.Markov Logic Networks (MLN) have become a de-facto statistical learning and inference framework to perform efficient and user-friendly analysis on massive data, with many applications in knowledge base construction, data cleaning, among others. Meanwhile, large-scale spatial data analysis has gained much interest in recent years due to the need for extracting insights from spatial data. However, analyzing spatial data using existing solutions typically cannot satisfy the scalability requirement of most applications as these solutions were not originally designed for the huge spatial data being generated at the moment. Unfortunately, none of these existing solutions exploits the power of the MLN framework to boost the usability, scalability, and accuracy of spatial analysis applications. The main goal of this thesis is to provide the first research effort to combine the two worlds of MLN and spatial data analysis. We address the two main challenges that face any spatial analysis application when using MLN. The first challenge is how to modify the core processing and functionalities of MLN to make it aware with the distinguished features of spatial data. The core of MLN is composed of two main components, namely, grounding using factor graphs and inference using Gibbs sampling. The factor graph is used as the main data structure for learning and inferring the weights of the MLN features, while Gibbs sampling infers the values of model variables and computes their associated probabilities using the weighted MLN features. The second challenge is how to efficiently represent spatial analysis problems (e.g., spatial regression) using MLN. This requires to find an equivalent first-order logic representation for any input spatial analysis problem that makes sure that the input problem can be appropriately executed using MLN. This thesis makes the following contributions. First, we present Sya; the first spatial probabilistic knowledge base construction system based on the spatial-aware MLN framework. We show our spatial extensions to the different MLN layers, including language, grounding and inference, implemented inside Sya. We then introduce three scalable spatial analysis systems, namely, TurboReg, RegRocket, and Flash, that are equipped with efficient first-order logic representations for different spatial analysis problems using MLN.enFactor GraphGibbs SamplingKnowledge BasesMarkov Logic NetworksScalabilitySpatial Data AnalysisAdopting Markov Logic Networks for Big Spatial Data and ApplicationsThesis or Dissertation