Browsing by Author "Huang, Yan"
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
Item A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects(2003-09-22) Xiong, Hui; Shekhar, Shashi; Huang, Yan; Kumar, Vipin; Ma, Xiaobin; Soung Yoo, JinCo-location patterns are subsets of spatial features (e.g. freeways, frontage roads) usually located together in geographic space. Recent literature has provided a transaction-free approach to discover co-location patterns over spatial point data sets to avoid potential loss of proximity relationship information in partitioning continuous geographic space into transactions. This paper provides a more general transaction-free approach to mine data sets with extended spatial objects, e.g. line-strings and polygons. Key challenges include modeling of neighborhood and relationships among extended spatial objects as well as controlling of related geometric computation costs. Based on a buffer-based definition of neighborhoods, a new model of finding co-location patterns over extended spatial objects has been proposed. Furthermore, this paper presents two pruning approaches, namely a prevalence-based pruning approach and a geometric filter-and-refine approach. Experimental evaluation with a real data set (the roadmap of Minneapolis and St.~Paul metropolitan area) shows that the geometric filter-and-refine approach can speed up the prevalence-based pruning approach by a factor of 30 to 40. Finally, the extended co-location mining algorithm proposed in this paper has been used to select most challenging field test routes for a novel GPS-based approach to accessing road user charges.Item Accessibility and non-work destination choice: a microscopic analysis of GPS travel data(2014-01) Huang, YanThe advancements of GPS and GIS technologies provide new opportunities for investigating vehicle trip generation and destination choice at the microscopic level. This research models how land use and road network structure influence non-work, non-home vehicle trip generation and non-work destination choice in the context of trip chains, using the in-vehicle GPS travel data in the Minneapolis-St. Paul Metropolitan Area. This research includes three key parts: modeling non-work vehicle trip generation, modeling non-work, single-destination choice, and modeling non-work, two-destination choice. This research contributes to methodologies in modeling single-destination choice and multiple-destination choice and tests several hypotheses which were not investigated before. In modeling non-work vehicle trip generation, this research identifies the correlation of trips made by the same individual in the trip generation models. To control for this effect, five mixed-effects models are systematically applied: mixed-effects linear model, mixed-effects log-linear model, mixed-effects negative binomial model, and mixed-effects ordered logistic model. The mixed-effects ordered logistic model produces the highest goodness of fit for our data and therefore is recommended. In modeling non-work, single-destination choice, this research proposes a new method to build choice sets which combines survival analysis and random sampling. A systematic comparison of the goodness of fit of models with various choice set sizes is also performed to determine an appropriate choice set size. In modeling non-work, multiple-destination choice, this research proposes and compare three new approaches to build choice sets for two-destination choice in the context of trip chains. The outcomes of these approaches are empirically compared and we recommend the major/minor-destination approach for modeling two-destination choice. The modeling procedure can be expanded to trip chains with more than two destinations. Our empirical findings reveal that: (1) Although accessibility around home is not found to have statistically significant effects on non-work vehicle trips, the diversity of services within 10 to 15 minutes and 15 and 20 minutes from home can help reduce the number of non-work vehicle trips. (2) Accessibility and diversity of services at destinations influence destination choice but they do not exert the same level of impact. The major destination in a trip chain tends to influence the decision more than the minor destination. (3) The more dissimilar the two destinations in a trip chain are, the more attractive the trip chain is. 4) Route-specific network measures such as turn index, speed discontinuity, axis of travel, and trip chains' travel time saving ratio display statistically significant effects on destination choice. Our findings have implications on transportation planning for creating flourishing retail clusters and reducing the amount of vehicle travel.Item Correlation Analysis of Spatial Time Series Datasets: A Filter-and-Refine Approach(2002-12-03) Zhang, Pusheng; Huang, Yan; Shekhar, Shashi; Kumar, VipinA spatial time series dataset is a collection of time series, each referencing a location in a common spatial framework. Correlation analysis is often used to identify pairs of interacting elements from the cross product of two spatial time series datasets. However, the computational cost of correlation analysis is very high when the dimension of the time series and the number of locationsin the spatial frameworks are large. The key contribution of this paper is the use of spatial autocorrelation among spatial neighboring time series to reduce the computational cost. A filter-and-refine algorithm based on coning, i.e.group of locations, is proposed to reduce the cost of correlation analysis over a pair of spatial time series datasets. Cone-level correlation computation can be used to eliminate (filter out) a large number of element pairs whosecorrelation is clearly below (or above) a given threshold. Element pair correlation needs to be computed for remaining pairs. Using algebraic cost models and experimental studies with Earth science datasets, we show that the filter-and-refine approach can save a large fraction of the computational cost, particularly when the minimal correlation threshold is high.Item Dictionary Design Algorithms for Vector Map Compression(2002-01-05) Shekhar, Shashi; Huang, Yan; Djugash, JudyVector maps (e.g. road maps) are important in a variety ofapplications including mobile computing. Due to the large size of vector maps, only a small part of maps (e.g. relevant to current location of the vehicle) can be cached in hand-held or in-vehicle devices used for mobile computing. Compression techniques for vector maps can help cache larger subsets of maps and reduce the communicationcosts of downloading newer subsets of maps during travel.Dictionary-based compression technique one common means of data compression. This paper explores the problem of designing dictionaries for dictionary based compression techniques for vector maps. We propose a novel clustering-based dictionary design. The proposed approach adapts the dictionary to a given dataset, yielding better approximation. Experimental evaluation shows that when the dictionary size is fixed, the proposed clustering-based technique achieves better accuracy compared with conventional approaches.Item Discovering Co-location Patterns from Spatial Datasets: A General Approach(2002-10-10) Huang, Yan; Shekhar, Shashi; Xiong, HuiGiven a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support-based pruning. We propose a notion of user-specifiedneighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose a family of algorithms to mine frequent spatial co-location patterns. Experimental results are provided to show the strength of each algorithm and design decisions related to performance tuning.Item Discovering Spatial Co-location Patterns: A summary of Results(2001-02-20) Shekhar, Shashi; Huang, YanGiven a collection of boolean spatial fea-tures, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of ecology dataset may reveal the frequent co-location of fire ignition source feature with needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial event) may correspond to items in association rules over market-basket dataset, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) as well as association rule mining algorithms using support based pruning. We propose to use a notion of user-specified neighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose an algorithm to mine frequent spatial co-location patterns and analyze its correctness and completeness. We plan to carry out experimental evaluations and performance tuning in the near future.Item The Ergodic Theorem and Markov Chain Strong Laws(2015-08) Huang, YanThe purpose of this paper is to explain the pointwise Ergodic Theorem and then to apply it to stationary Markov Chains. The Ergodic Theorem is a theorem which shows that the time-averages of a stationary sequence of random variables converge almost surely, and also gives a way to evaluate the limit of these averages. In the setting of Markov chains, the Ergodic Theorem can be used to obtain an important convergence fact about Markov chains.Item Performance Evaluation of Co-location Miner(2002-05-01) Shekhar, Shashi; Huang, Yan; Xiong, HuiGiven a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needlevegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) as well as association rule mining algorithms using supportbased pruning. We recently defined the problem of mining spatial co-location patterns and proposed the Co-location Miner, an algorithm for mining co-locations. In this paper, we present an experimental performance evaluation of Co-location Miner. For the purpose of comparison, we consider two other approaches, namely the pure geometric approach and the pure combinatorial approach. Empirical evaluation shows that the pure geometric method performs much better than the pure combinatorial method when generating size 2 co-locations; however, it becomes much slower when generating co-locations with more than 2 features. Co-location Miner integrates the best features of the above two approaches and provides the best overall performance. Experimental results also show that Co-location Miner is robust in the face of noise and scales up gracefully with increases in the number of spatial feature types, maximum size of co-location patterns, and the number of instancesof spatial features.Item The Multi-resolution Co-location Miner: A New Algorithm to Find Co-location Patterns in Spatial Dataset(2002-05-01) Shekhar, Shashi; Huang, YanGiven a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types(also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support based pruning.In our recent work, we proposed a notion of user-specified neighborhoods in place oftransactions to specify groups of items, new interest measures for spatial co-location patterns which are robust in the face of potentially infinite overlapping neighborhoods,and an algorithm to mine frequent spatial co-location patterns and analyzed its correctness,and completeness. The Co-location Miner generates candidateprevalent co-locations in the spatial feature level and generates table instances for the candidate co-locations to check their prevalence. When the false candidate prevalent co-location set is large, the performance of the Co-location Miner decreases. Due to spatial autocorrelation, the locations of individual spatial features of a point data set are often clustered spatially, the Co-location Miner is computationally expensive without taking spatialautocorrelation into consideration. In this paper, a new algorithm called Multi-resolution Co-location Miner is presented. The proposed algorithm has two logical phases, namely filter and refinement. The filter phase summarizes the original point dataset into a smaller lattice dataset using space partitioning which allows the computation of the upper bounds of the interest measures. It eliminates many non-interesting co-locations, reducing the set of candidates to be explored by the refinement phase, which computes the true values of the interest measures. We show that the proposed algorithm is correct and complete and the proposed algorithm is several times faster than the traditional Co-location Miner algorithm in a dataset with spatially autocorrelation by experiments.