Browsing by Author "Zhang, Pusheng"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item A Unified Approach to Spatial Outlier Detection(2001-12-10) Shekhar, Shashi; Lu, Chang-tien; Zhang, PushengSpatial outliers represent locations which are significantly different from their neighborhoods even though they may not be significantly different from the entire population. Identification of spatialoutliers can lead to the discovery of unexpected, interesting, and implicit knowledge, such as local instability. In this paper, we first provide a general definition of $S$-outliers for spatial outliers. This definition subsumes the traditional definitions of spatial outliers. Second, we characterize the computation structure of spatial outlier detection methods andpresent scalable algorithms. Third, we provide a cost model of the proposed algorithms. Finally, we provide experimental evaluations of our algorithms using a Minneapolis-St. Paul(Twin Cities) traffic data set.Item Correlation Analysis of Spatial Time Series Datasets: A Filter-and-Refine Approach(2002-12-03) Zhang, Pusheng; Huang, Yan; Shekhar, Shashi; Kumar, VipinA spatial time series dataset is a collection of time series, each referencing a location in a common spatial framework. Correlation analysis is often used to identify pairs of interacting elements from the cross product of two spatial time series datasets. However, the computational cost of correlation analysis is very high when the dimension of the time series and the number of locationsin the spatial frameworks are large. The key contribution of this paper is the use of spatial autocorrelation among spatial neighboring time series to reduce the computational cost. A filter-and-refine algorithm based on coning, i.e.group of locations, is proposed to reduce the cost of correlation analysis over a pair of spatial time series datasets. Cone-level correlation computation can be used to eliminate (filter out) a large number of element pairs whosecorrelation is clearly below (or above) a given threshold. Element pair correlation needs to be computed for remaining pairs. Using algebraic cost models and experimental studies with Earth science datasets, we show that the filter-and-refine approach can save a large fraction of the computational cost, particularly when the minimal correlation threshold is high.Item Data Mining and Visualization of Twin-Cities Traffic Data(2001-03-08) Shekhar, Shashi; Lu, Chang-tien; Chawla, Sanjay; Zhang, PushengData Mining(DM) is the process of extracting implicit, valuable, and interesting information from large sets of data. As huge amounts of data have been stored in traffic and transportation databases, data warehouses, geographic information systems, and other information repositories, data mining is receiving substantial interest from both academia and industry. The Twin-Cities traffic archival stores sensor network measurements collected from the freeway system in the Twin-Cities metropolitan area. In this paper, we construct a traffic data warehousing model which facilitates on-line analytical processing(OLAP), employ current data mining techniques to analyze the Twin-Cities traffic data set, and visualize the discoveries on the highway map. We also discuss some research issues in mining traffic and transportation data.Item Detecting Graph-based Spatial Outliers: Algorithms and Applications(2001-03-08) Shekhar, Shashi; Lu, Chang-tien; Zhang, PushengIdentification of outliers can lead to the discovery of unexpected, interesting, and implicit knowledge. Existing methods are designed for detecting spatial outliers in multidimensional geometric data sets, where a distance metric is available. In this paper, we focus on detecting spatial outliers in graph structured data sets. We define tests for spatial outliers in graph structured data sets, analyze the statistical foundation underlying our approach, design a fast algorithm to detect spatial outliers, provide the cost model for outlier detection procedures. In addition, we provide experimental results from the application of our algorithm on a Minneapolis-St. Paul(Twin-Cities) traffic dataset to show its effectiveness and usefulness.Item Exploiting a Page Level Upper Bound for Multi-Type Nearest Neighbor Queries(2005-03-31) Ma, Xiaobin; Shekhar, Shashi; Xiong, Hui; Zhang, PushengGiven a query point and a collection of spatial features, a multi-type nearest neighbor query finds the shortest tour for the query point in a way such that only one instance of each feature type is visited during the tour. For example, a tourist may be interested in finding the shortest tour which starts at a hotel and passes through a post office, a gas station, and a grocery store. The multi-type nearest query problem is different from the traditional nearest neighbor query problem, since there are many objects for each feature type and the shortest tour should pass through only one object from each feature type. In this paper, we propose R-tree based optimal solutions, which exploit a page-level upper bound for efficient computation. Also, since this problem is a generalized Traveling Salesman Problem (TSP) and is NP-hard, we provide several heuristic methods for the case that there are a large number of feature types in the data. Finally, experimental results are provided to show the strength of the proposed algorithms and design decisions related to performance tuning.Item Mining Time-Profiled Associations: A Preliminary Study(2005-04-04) Yoo, Jin Soung; Zhang, Pusheng; Shekhar, ShashiA time-profiled association is an association pattern consistent with a query sequence along time, e.g., identifying interacting relationship of droughts and wild fires in Australia with the El Nino phenomenon in the past 50 years. Association patterns by traditional association rule mining approaches reveal the generic dependency among variables, however, the evolution of these patterns along time is not captured. Hence the time-profiled association mining is used to incorporate the temporal evolution of association patterns and identify the co-occurred patterns consistent along time. Mining time-profiled associations is computational challenging due to large size of itemset space and long time points in practice. A naive approach of mining time-profiled associations can be characterized using a two-phase paradigm. The first phase generates the statistical parameter (e.g., support) sequences along time, and the second phase retrieves similar sequences with the query sequence. However, exponentially increasing computational costs of generating all combinatorial candidate itemsets become prohibitively expensive for the previous work. In this paper, we propose a novel one-step algorithm to unify the generation of the sequence of statistical parameters and sequence retrieval. The proposed algorithm substantially reduces the itemset search space by pruning candidate itemsets based on the monotone property of lower bounding measure of sequence of statistical parameters. Experimental results show that our algorithm outperforms the naive approach.