Browsing by Author "Oliver, Dev"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Crime pattern analysis: A spatial frequent pattern mining approach(2012-05-10) Shekhar, Shashi; Mohan, Pradeep; Oliver, Dev; Zhou, XunCrime pattern analysis (CPA) is the process of analytical reasoning facilitated by an understanding about the nature of an underlying spatial framework that generates crime. For example, law enforcement agencies may seek to identify regions of sudden increase in crime activity, namely, crime outbreaks. Many analytical tools facilitate this reasoning process by providing support for techniques such as hotspot analysis. However, in practice, police departments are desirous of scalable tools for existing techniques and new insights including, interaction between different crime types. Identifying new insights using scalable tools may help reduce the human effort that may be required in CPA. Formally, given a spatial crime dataset and other information familiar to law enforcement agencies, the CPA process identifies interesting, potentially useful and previously unknown crime patterns. For example, analysis of an urban crime dataset might reveal that downtown bars frequently lead to assaults just after bar closing. However, CPA is challenging due to: (a) the large size of crime datasets, and (b) a potentially large collection of interesting crime patterns. This chapter explores, spatial frequent pattern mining (SFPM), which is a spatial data driven approach for CPA and describes SFPM in the context of one type of CPA, outbreak detection. We present a case study to discover interesting, useful and non-trivial crime outbreaks in a dataset from Lincoln, NE. A review of emerging trends and new research needs in CPA methods for study to discover interesting, useful and non-trivial crime outbreaks in a dataset from outbreak detection is also presented.Item Scalable data analytics techniques for summarizing spatial network-based observations(2014-05) Oliver, DevSummarizing spatial network-based observations (SSNO) entails finding a compact description or representation of large spatial or spatio-temporal datasets. For example, transportation planners and engineers may need to identify road segments that pose risks for pedestrians and require redesign. However, SSNO is computationally challenging for the following reasons: (1) There may be a large number of k-subsets of connected components in the network, (2) Patterns may not obey the monotonicity property, and (3) There may be a large number of candidates. To address the challenge of the large number of k-subsets of connected components in the network}, we explored the problem of spatial network activity summarization. In spatial network activity summarization (SNAS), we are given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports) and the goal is to find k shortest paths that summarize the activities. SNAS is important for applications where observations occur along linear paths such as roadways, train tracks, etc. Previous work has focused on either geometry or subgraph-based approaches (e.g., only one path), and cannot summarize activities using multiple paths. This work proposes a K-Main Routes (KMR) approach that discovers k shortest paths to summarize activities. To improve performance, KMR uses network Voronoi, divide and conquer, and pruning strategies. Experimental results on synthetic and real data show that KMR with our performance-tuning decisions yields substantial computational savings without reducing summary path coverage. We also present a case study comparing KMR with K-Means on real data.To address the challenge of patterns not obeying the monotonicity property, we studied the problem of geo-referenced time-series summarization. Given a set of regions with activity counts at each time instant (e.g., a listing of countries with number of mass protests or disease cases over time) and a spatial neighbor relation, geo-referenced time-series summarization (GTS) finds k-full trees that maximize activity coverage. GTS has important potential societal applications such as understanding the spread of political unrest, disease, crimes, etc. Previous approaches for spatio-temporal data mining detect anomalous or unusual areas and do not summarize activities. We propose a k-full tree (kFT) approach for GTS which features an algorithmic refinement (i.e., voronoi partition assignment) that leads to computational savings without affecting result quality. Analytical and experimental results show that the algorithmic refinement substantially reduces computational cost. We also present a case study that shows the output of our approach on Arab Spring data. To address the challenge of a large number of candidates, we tackled the problem of significant linear hotspot discovery (SLHD). Given a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports), SLHD finds shortest paths in the spatial network where the concentration of activities is unusually high (i.e., statistically significant). SLHD is important for societal applications in transportation safety, public safety, or public health such as finding paths with significant concentrations of accidents, crimes, or diseases. SaTScan may miss many significant paths since a large fraction of the area bounded by circles for activities on a path will be empty. Previous network-based approaches only consider a small fraction of the network and only one significant network component (e.g., path). We propose novel algorithms for discovering statistically significant linear hotspots using the ideas of likelihood pruning, Monte Carlo speedup, and dynamic segmentation. We present a case study comparing the proposed approach with existing techniques on real data. Experimental results show that the proposed algorithm, with our algorithmic refinements, yields substantial computational savings without reducing result quality.The work in this thesis is the first step towards understanding the immense challenges and novel applications of summarizing spatial network-based observations for transportation safety, public safety, disaster response, etc. In this thesis, we have formally modeled spatial and spatio-temporal network datasets and have begun to explore new data analytics techniques (e.g., K-Main Routes) that may have many uses in real-world applications. We conclude this thesis by exploring additional data summarization techniques for spatial networks, their computational challenges, and possible directions for future work.Item Significant Linear Hotspot Discovery(2016-11-18) Tang, Xun; Eftelioglu, Emre; Oliver, Dev; Shekhar, ShashiGiven a spatial network and a collection of activities (e.g., pedestrian fatality reports, crime reports), Significant Linear Hotspot Discovery (SLHD) finds all shortest paths in the spatial network where the concentration of activities is statistically significantly high. SLHD is important for societal applications in transportation safety or public safety such as finding paths with significant concentrations of accidents or crimes. SLHD is challenging because 1) there are a potentially large number of candidate paths (? 1016) in a given dataset with millions of activities and road network nodes and 2) test statistic (e.g., density ratio) is not monotonic. Hotspot detection approaches on Euclidean space (e.g., SaTScan) may miss significant paths since a large fraction of an area bounded by shapes in Euclidean space for activities on a path will be empty. Previous network-based approaches consider only paths between road intersections but not activities. This paper proposes novel models and algorithms for discovering statistically significant linear hotspots using the algorithms of neighbor node filter, shortest path tree pruning, and Monte Carlo speedup. We present case studies comparing the proposed approaches with existing techniques on real data. Experimental results show that the proposed algorithms yield substantial computational savings without reducing result quality.