Scalable data analytics techniques for summarizing spatial network-based observations

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Scalable data analytics techniques for summarizing spatial network-based observations

Published Date

2014-05

Publisher

Type

Thesis or Dissertation

Abstract

Summarizing spatial network-based observations (SSNO) entails finding a compact description or representation of large spatial or spatio-temporal datasets. For example, transportation planners and engineers may need to identify road segments that pose risks for pedestrians and require redesign. However, SSNO is computationally challenging for the following reasons: (1) There may be a large number of k-subsets of connected components in the network, (2) Patterns may not obey the monotonicity property, and (3) There may be a large number of candidates. To address the challenge of the large number of k-subsets of connected components in the network}, we explored the problem of spatial network activity summarization. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find k shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This work proposes a K-Main Routes (KMR) approach that discovers k shortest paths to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage. We also present a case study comparing KMR with K-Means on real data.To address the challenge of patterns not obeying the monotonicity property, we studied the problem of geo-referenced time-series summarization. Given a set of regions with activity counts at each time instant (e.g., a listing of countries with number of mass protests or disease cases over time) and a spatial neighbor relation, geo-referenced time-series summarization (GTS) finds k-full trees that maximize activity coverage. GTS has important potential societal applications such as understanding the spread of political unrest, disease, crimes, etc. Previous approaches for spatio-temporal data mining detect anomalous or unusual areas and do not summarize activities. We propose a k-full tree (kFT) approach for GTS which features an algorithmic refinement (i.e., voronoi partition assignment) that leads to computational savings without affecting result quality. Analytical and experimental results show that the algorithmic refinement substantially reduces computational cost. We also present a case study that shows the output of our approach on Arab Spring data. To address the challenge of a large number of candidates, we tackled the problem of significant linear hotspot discovery (SLHD). Given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports), SLHD finds shortest paths in the spatial network where the concentration of activities is unusually high (i.e., statistically significant). SLHD is important for societal applications in transportation safety, public safety, or public health such as finding paths with significant concentrations of accidents, crimes, or diseases. SaTScan may miss many significant paths since a large fraction of the area bounded by circles for activities on a path will be empty. Previous network-based approaches only consider a small fraction of the network and only one significant network component (e.g., path). We propose novel algorithms for discovering statistically significant linear hotspots using the ideas of likelihood pruning, Monte Carlo speedup, and dynamic segmentation. We present a case study comparing the proposed approach with existing techniques on real data. Experimental results show that the proposed algorithm, with our algorithmic refinements, yields substantial computational savings without reducing result quality.The work in this thesis is the first step towards understanding the immense challenges and novel applications of summarizing spatial network-based observations for transportation safety, public safety, disaster response, etc. In this thesis, we have formally modeled spatial and spatio-temporal network datasets and have begun to explore new data analytics techniques (e.g., K-Main Routes) that may have many uses in real-world applications. We conclude this thesis by exploring additional data summarization techniques for spatial networks, their computational challenges, and possible directions for future work.

Keywords

Description

University of Minnesota Ph.D. dissertation. May 2014. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); ix, 107 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Oliver, Dev. (2014). Scalable data analytics techniques for summarizing spatial network-based observations. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/165053.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.