Browsing by Author "Gupta, Jayant"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item An Introduction to Spatial Data Mining(2018-08-08) Golmohammadi, Jamal; Xie, Yiqun; Gupta, Jayant; Li, Yan; Cai, Jiannan; Detor, Samantha; Roh, Abigail; Shekhar, ShashiThe goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial datasets. Spatial data mining is important for societal applications in public health, public safety, agriculture, environmental science, climate etc. For example,in epidemiology, spatial data mining helps to find areas with a high concentrations of disease incidents to manage disease outbreaks. Computerized methods are needed to discover spatial patterns since the volume and velocity of spatial data exceeds the number of human experts available to analyze it. In addition, spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed data samples) assumption of traditional statistics and data mining methods. So, using traditional methods may miss patterns or may yield spurious patterns which are costly (e.g., stigmatization) in spatial applications. Also, there are other intrinsic challenges such as MAUP (Modifiable Areal Unit Problem) as illustrated by a current court case debating gerrymandering in elections. Spatial data mining considers the unique characteristics, and challenges of spatial data and domain knowledge of the target application to discover more accurate and interesting patterns.In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, colocation detection, spatial prediction and spatial outlier detection. Hotspot detection methods use domain information to model accurately more active and high density areas. Colocation detection methods find objects whose instances are in proximity of each other in a location. Spatial prediction approaches explicitly model neighborhood relationship of locations to predict target variables from input features. The goal of spatial outlier detection methods is to find data that are different from their neighbors.Item An Introduction to Spatial Data Mining(University Consortium for Geographic Information Science, 2020) Golmohammadi, Jamal; Xie, Yiqun; Gupta, Jayant; Farhadloo, Majid; Li, Yan; Cai, Jiannan; Detor, Samantha; Roh, Abigail; Shekhar, ShashiThe goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to find areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modifiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, colocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Colocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.Item Linear Hotspot Discovery on All Simple Paths: A Summary of Results(2019-09-10) Tang, Xun; Gupta, Jayant; Shekhar, ShashiSpatial hotspot discovery aims at discovering regions with statistically significant concentration of activities. It has shown great value in many important societal applications such as transportation engineering, public health, and public safety. This paper formulates the problem of Linear Hotspot Detection on All Simple Paths (LHDA) which identifies hotspots from the complete set of simple paths enumerated from a given spatial network. LHDA overcomes the limitations of existing methods which miss hotspots that naturally occur along linear simple paths on a road network. The problem is #p-hard due to the exponential number of simple paths. To address the computational challenges, we propose a novel algorithm named bi-directional fragment-multi-graph traversal (ASP_FMGT) and two path reduction approaches ASP_NR and ASP_HD. Extensive theoretical and experimental analyses show that ASP_FMGT has substantially improved performance over a baseline approach using depth-first-search with backtracking (ASP_Base) while keeping the solution complete and correct. Moreover, case studies on real-world datasets showed that ASP_FMGT outperforms existing approaches, including by discovering new hotspots unknown before and achieving higher accuracy for locating known hotspots.Item Responsible Spatial Data Science(2023-06) Gupta, JayantThe goal of responsible spatial data science is to encourage the design and development of spatial methods, processes, algorithms, and systems to discover spatial patterns (e.g., hotspots, colocations) that reduce adverse impacts on the communities that use them. Related work on fairness issues (F) for discrete classes may not generalize well for continuous geographical spaces and be confounded by spatial-auto-correlation. Similarly, existing frameworks may not be enough to ensure accountability (A) of location-based services. A lack of transparency (T) in the choice of spatial units may lead to misinformed conclusions and without appropriate ethical (E) tools location privacy is at stake. Addressing the limitations of related work of FATE issues is important for the development and adoption of responsible practices with important societal applications in ecology, navigation, public health, etc. Further, responsible spatial data science is a key emerging topic motivated by a recent U.S. executive order, development of European Commission guidelines, and industrial standards (e.g., Microsoft Responsible AI Standard). Developing responsible spatial data science techniques is challenging due to spatially-biased datasets, limited accountability frameworks, recurring patterns of movement, the modifiable areal unit problem (MAUP) (i.e., results depend on the spatial unit of analysis), and specific properties of spatial datasets (e.g., heterogeneity, auto-correlation, etc.). This thesis addresses three key challenges due to a lack of adherence to the responsible spatial data science principles while mining spatial pattern families. First, to address the challenges arising from spatial variability while building deep neural network models the thesis proposed a spatial variability aware deep neural network (SVANN) approach where each neural network weight is a map (i.e., varies across geographic locations) rather than a scalar used in traditional one-size-fits-all (OSFA) approaches. Within SVANN, the thesis described two types of training and prediction methods. Then, the thesis proposed a generalized form of SVANN where where the neural network architecture varies across geographical locations. The thesis also provide a taxonomy of SVANN types and a physics inspired interpretation model. Second, to enhance algorithmic transparency, the thesis discussed spatial dimensions of algorithmic transparency. Beyond the well-known Modifiable Areal Unit Problem, the thesis show (via mathematical proofs as well as case studies with census data and census based synthetic micro-population data) that values of many measures (e.g., Gini index, dissimilarity index) diminish monotonically with increasing spatial-unit size in a hierarchical space partitioning (e.g., block, block-group, tract), however the ranking based on spatially aggregated measures remain sensitive to the scale of spatial partitions (e.g., block, block group). Then, the thesis proposed the concept of partial aggregates and provided the partial aggregates and the algorithms to compute them for three measures, namely, gini-index, index of dissimilarity, and IQSR. The thesis also provided a modification of a well-known aggregate function classification and used it to organize the three measures and their partial aggregates. Third, to account for emerging taxonomies (i.e., representation of parent-child relation between spatial objects) in a spatial colocations the thesis proposed a taxonomy-aware colocation miner (TCM) algorithm which uses a user-defined taxonomy to find taxonomy-aware colocation patterns. The thesis also proposed TCM-Prune algorithm that prunes duplicate colocations instances having a parent-child relation.Item Understanding COVID-19 Effects on Mobility: A Community-Engaged Approach(2022) Sharma, Arun; Farhadloo, Majid; Li, Yan; Kulkarni, Aditya; Gupta, Jayant; Shekhar, ShashiGiven aggregated mobile device data, the goal is to understand the impact of COVID-19 policy interventions on mobility. This problem is vital due to important societal use cases, such as safely reopening the economy. Challenges include understanding and interpreting questions of interest to policymakers, cross-jurisdictional variability in choice and time of interventions, the large data volume, and unknown sampling bias. The related work has explored the COVID-19 impact on travel distance, time spent at home, and the number of visitors at different points of interest. However, many policymakers are interested in long-duration visits to high-risk business categories and understanding the spatial selection bias to interpret summary reports. We provide an Entity Relationship diagram, system architecture, and implementation to support queries on long-duration visits in addition to fine resolution device count maps to understand spatial bias. We closely collaborated with policymakers to derive the system requirements and evaluate the system components, the summary reports, and visualizations.