Query Processing in Spatial Database Systems: Declustering and Clustering Techniques
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Query Processing in Spatial Database Systems: Declustering and Clustering Techniques
Authors
Published Date
1997
Publisher
Type
Report
Abstract
Range and join queries are the two most important queries in a spatial database system.
In spatial query processing, spatial objects are compared with each other using spatial
relationships. A spatial range-query is an operation that returns objects from a set of spatial
objects which satisfy a spatial predicate with a given range. A spatial join operation
combines objects of two or more relations, according to a predicate defined on the spatial
attributes of the objects. The research question in this thesis concerns how to parallelize
the spatial range and join query processing in order to support a high performance spatial
database application. A spatial query can be parallelized either by function-partitioning
or by data-part itioning. Data-Partitioning techniques divide the data among different processors
and then independently execute the sequential algorithm on each processor. Data
partitioning for the range query operation involves declustering of spatial data, while data
partitioning for the spatial join involves clustering of spatial data. If the static partitioning
methods fail to equally distribute the load among different processors, the load-balance
may be improved by redistributing parts of the data to idle processors using Dynamic
Load-Balancing (DLB) techniques.
Declustering and load-balancing are important issues in the parallelization of rangequery
operations. The current literature provides efficient methods for declustering spatial
point-data. However, there has been little work done towards developing efficient declustering
methods for collections of extended objects like chains of line-segments and polygons. In
this thesis, we provide a framework for declustering collections of extended spatial objects
by identifying the following key issues: (i) the work-load metric, (ii) the spatial-extent of
the work-load, (iii) the dist_ribution of the work-load over the spatial-extent, and (iv) the
declustering method. We identify and experimentally evaluate alternatives for each of these
issues. In addition, we also provide a framework for dynamically balancing the load between different
processors. We experimentally evaluate the proposed declustering and load-balancing
methods on a distributed memory MIMD machine (Cray T3D) and shared-memory machine
(SGI Challenge). Experimental results show that the spatial-extent and the work-load metric
are important issues in developing a declustering method. Experiments also show that
the replication of data is usually needed to facilitate dynamic load-balancing, as the cost of
local processing is often less than the cost of data transfer for extended spatial objects. In
addition, we also show that the effectiveness of dynamic load-balancing techniques can be
improved by using declustering methods to determine the subsets of spatial objects to be
transferred during run-time.
A spatial join is often performed in two steps: a filter step and a refinement step. In
this thesis, we focus on the refinement step of the spatial join. The refinement step of the
spatial join takes as input a sequence of pairs of tuples and checks each tuple to see if the
join predicate is satisfied for that tuple. This is similar to the join index processing done in
traditional relational databases. We develop min-cut graph partitioning based methods for
join processing using a join index. We use min-cut graph partitioning as a new heuristic
for solving the page access sequence problem for fixed size buffer in sequential systems. We
show that the number of page accesses needed to compute a join using join index in a fixed
buffer environment is bounded by the sum of sizes of the base relations and the size of
the cut-set of the page connectivity graph. Since the min-cut graph partitioning aims to
minimize the size of t he cut-set, this proposed heuristic is a direct method. Experiments
with benchmark data sets show that the graph-partitioning based heuristic out performs
the existing methods, particularly when join selectivity is high and buffer space is small.
In parallel systems, the join index partitioninig problem to balance the load and to improve
the data localization can be modelled as min-cut page connectivity graph partitioning
problem. This formulation provides a formal framework for the join index partitioning problem.
In addt ion, it also yields superior ·performance over other methods as shown in our
experiments with shared-disk architecture for processing benchmark data sets.
Keywords
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 97-026
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Ravada, Sivakumar. (1997). Query Processing in Spatial Database Systems: Declustering and Clustering Techniques. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215308.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.