Browsing by Author "Kang, James"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Context Inclusive Function Evaluation: A Case Study with EM-Based Multi-Scale Multi-Granular Image Classification(2008-07-30) Gandhi, Vijay; Kang, James; Shekhar, Shashi; Ju, Junchang; Kolaczyk, Eric D.; Gopal, SucharitaMany statistical queries such as maximum likelihood estimation involve finding the best candidate model given a set of candidate models and a quality estimation function. This problem is common in important applications like land-use classification at multiple spatial resolutions from remote sensing raster data. Such a problem is computationally challenging due to the significant computation cost to evaluate the quality estimation function for each candidate model. For example, a recently proposed method of multi-iscale, multi-granular classification has high computational overhead of function evaluation for various candidate models independently before comparison. In contrast, we propose an upper bound based context-inclusive approach that reduces computational overhead based on the context, i.e. the value of the quality estimation function for the best candidate model so far. We also prove that an upper bound exists for each candidate model and the proposed algorithm is correct. Experimental results using land-use classification at multiple spatial resolutions from satellite imagery show that the proposed approach reduces the computational cost significantly.Item Discovering Flow Anomalies: A SWEET Approach(2009-03-09) Kang, James; Shekhar, Shashi; Wennen, Christine; Novak, PaigeGiven a percentage-threshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mis-matched sensor readings exceed the given percentage-threshold. Discovering flow anomalies (FA) is an important problem due to applications such as environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining FAs is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. t-test) are suited for detecting transient FAs (i.e., time instants of significant mis-matches across consecutive sensors) and cannot detect persistent FAs (i.e., long variable time-windows with a high fraction of time instant transient FAs) due to a lack of a pre-defined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistence-Thresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient FAs, using a smart counter and efficient pruning techniques. Analytical evaluation show that the proposed method is correct and complete. Experimental evaluation using synthetic and real datasets shows our proposed approach outperforms Naive alternatives.Item Identifying Clusters in Marked Spatial Point Processes: A Summary of Results(2006-03-20) Mane, Sandeep; Kang, James; Shekhar, Shashi; Srivastava, Jaideep; Murray, Carson; Pusey, AnneClustering of marked spatial point process is an important problem in many application domains (e.g. Behavioral Ecology). Classical clustering approaches handle homogeneous spatial points and hence cannot cluster marked spatial point process. In this paper, we propose a novel intuitive approach, Merge Algorithm, to hierarchically cluster marked spatial point process. This approach treats all spatial point processes in a dendrogram's sub-tree as a single spatial point process while clustering. The resulting dendrogram for marked spatial point process needs be analyzed by a domain expert to identify clusters. To remove the subjective nature of the clusters identified, we propose a novel statistical method, Cluster Identification Algorithm, to partition a dendrogram into clusters. This approach identifies (cuts) a dendrogram's sub-tree as a cluster if that subtree's intra-subtree similarity is significantly higher than inter-subtree similarity. Experiments with Jane Goodall Institute's chimpanzee ecological dataset from the Gombe National Park, Tanzania which shows that our proposed methods identified clusters which were compatible to those identified by domain experts.Item Spatial Data Mining(2014-10-01) Shekhar, Shashi; Evans, Michael R.; Kang, JamesExplosive growth in geospatial data and the emergence of new spatial technologies emphasize the need for automated discovery of spatial knowledge. Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from large spatial databases. The complexity of spatial data and intrinsic spatial rela- tionships limits the usefulness of conventional data mining techniques for extracting spatial patterns. In this chapter, we explore the emerging field of spatial data mining, focusing on four major topics: prediction and classification, outlier detection, co-location mining, and clustering. We conclude with a look at future research needs.Item Spatial Databases(2007-09-19) Gandhi, Vijay; Kang, James; Shekhar, ShashiSpatial database research has continued to advance greatly since three decades ago, addressing the growing data management and analysis needs of spatial applications. This research has produced a taxonomy of models for space, conceptual models, spatial query languages and query processing, spatial file organization and indexes, and spatial data mining. However, emerging needs for spatial database systems include the handling of 3D spatial data, temporal dimension with spatial data, and spatial data visualization. In addition, the rise of new systems such as sensor networks and multi-core processors is likely to have an impact in spatial databases. The goal of this paper is to provide a broad overview of the recent advancements in spatial databases and research needs in each area.