Ravada, Sivakumar2020-09-022020-09-021997https://hdl.handle.net/11299/215308Range and join queries are the two most important queries in a spatial database system. In spatial query processing, spatial objects are compared with each other using spatial relationships. A spatial range-query is an operation that returns objects from a set of spatial objects which satisfy a spatial predicate with a given range. A spatial join operation combines objects of two or more relations, according to a predicate defined on the spatial attributes of the objects. The research question in this thesis concerns how to parallelize the spatial range and join query processing in order to support a high performance spatial database application. A spatial query can be parallelized either by function-partitioning or by data-part itioning. Data-Partitioning techniques divide the data among different processors and then independently execute the sequential algorithm on each processor. Data partitioning for the range query operation involves declustering of spatial data, while data partitioning for the spatial join involves clustering of spatial data. If the static partitioning methods fail to equally distribute the load among different processors, the load-balance may be improved by redistributing parts of the data to idle processors using Dynamic Load-Balancing (DLB) techniques. Declustering and load-balancing are important issues in the parallelization of rangequery operations. The current literature provides efficient methods for declustering spatial point-data. However, there has been little work done towards developing efficient declustering methods for collections of extended objects like chains of line-segments and polygons. In this thesis, we provide a framework for declustering collections of extended spatial objects by identifying the following key issues: (i) the work-load metric, (ii) the spatial-extent of the work-load, (iii) the dist_ribution of the work-load over the spatial-extent, and (iv) the declustering method. We identify and experimentally evaluate alternatives for each of these issues. In addition, we also provide a framework for dynamically balancing the load between different processors. We experimentally evaluate the proposed declustering and load-balancing methods on a distributed memory MIMD machine (Cray T3D) and shared-memory machine (SGI Challenge). Experimental results show that the spatial-extent and the work-load metric are important issues in developing a declustering method. Experiments also show that the replication of data is usually needed to facilitate dynamic load-balancing, as the cost of local processing is often less than the cost of data transfer for extended spatial objects. In addition, we also show that the effectiveness of dynamic load-balancing techniques can be improved by using declustering methods to determine the subsets of spatial objects to be transferred during run-time. A spatial join is often performed in two steps: a filter step and a refinement step. In this thesis, we focus on the refinement step of the spatial join. The refinement step of the spatial join takes as input a sequence of pairs of tuples and checks each tuple to see if the join predicate is satisfied for that tuple. This is similar to the join index processing done in traditional relational databases. We develop min-cut graph partitioning based methods for join processing using a join index. We use min-cut graph partitioning as a new heuristic for solving the page access sequence problem for fixed size buffer in sequential systems. We show that the number of page accesses needed to compute a join using join index in a fixed buffer environment is bounded by the sum of sizes of the base relations and the size of the cut-set of the page connectivity graph. Since the min-cut graph partitioning aims to minimize the size of t he cut-set, this proposed heuristic is a direct method. Experiments with benchmark data sets show that the graph-partitioning based heuristic out performs the existing methods, particularly when join selectivity is high and buffer space is small. In parallel systems, the join index partitioninig problem to balance the load and to improve the data localization can be modelled as min-cut page connectivity graph partitioning problem. This formulation provides a formal framework for the join index partitioning problem. In addt ion, it also yields superior ·performance over other methods as shown in our experiments with shared-disk architecture for processing benchmark data sets.en-USQuery Processing in Spatial Database Systems: Declustering and Clustering TechniquesReport