Efficient geometric algorithms for preference top-k queries, stochastic line arrangements, and proximity problems

Li, Yuan2017-10-092017-10-092017-06https://hdl.handle.net/11299/190539University of Minnesota Ph.D. dissertation. June 2017. Major: Computer Science. Advisor: Ravi Janardan. 1 computer file (PDF); x, 150 pages.Problems arising in diverse real-world applications can often be modeled by geometric objects such as points, lines, and polygons. The goal of this dissertation research is to design efficient algorithms for such geometric problems and provide guarantees on their performance via rigorous theoretical analysis. Three related problems are discussed in this thesis. The first problem revisits the well-known problem of answering preference top-k queries, which arise in a wide range of applications in databases and computational geometry. Given a set of n points, each with d real-valued attributes, the goal is to organize the points into a suitable data structure so that user preference queries can be answered efficiently. A query consists of a d-dimensional vector w, representing a user's preference for each attribute, and an integer k, representing the number of data points to be retrieved. The answer to a query is the k highest-scoring points relative to w, where the score of a point, p, is designed to reflect how well it captures, in aggregate, the user's preferences for the different attributes. This thesis contributes efficient exact solutions in low dimensions (2D and 3D), and a new sampling-based approximation algorithm in higher dimensions. The second problem extends the fundamental geometric concept of a line arrangement to stochastic data. A line arrangement in the plane is a partition of the plane into vertices, edges, and faces. Surprisingly, diverse problems, including the preference top-k query and k-order Voronoi Diagram, essentially boil down to answering questions about the set of k-topmost lines at some abscissa. This thesis considers line arrangements in a new setting, where each line has an associated existence probability representing uncertainty that is inherent in real-world data. An upper-bound is derived on the expected number of changes in the set of k-topmost lines, taken over the entire x-axis, and a worst-case upper bound is given for k = 1. Also, given is an efficient algorithm to compute the most likely k-topmost lines in the arrangement. Applications of this problem including the most likely Voronoi Diagram in R^1 and stochastic preference top-k query are discussed. The third problem discussed is geometric proximity search in both the stochastic setting and the query-retrieval setting. Under the stochastic setting, the thesis considers two fundamental problems, namely, the stochastic closest pair problem and the k most likely nearest neighbor search. In both problems, the data points are assumed to lie on a tree embedded in R^2 and distances are measured along the tree (a so-called tree space). For the former, efficient solutions are given to compute the probability that the closest pair distance of a realization of the input is at least l and to compute the expected closest pair distance. For the latter, the thesis generalizes the concept of most likely Voronoi Diagram from R^1 to tree space and bounds its combinatorial complexity. A data structure for the diagram and an algorithm to construct it are also given. For the query-retrieval version which is considered in R^2, the goal is to retrieve the closest pair within a user-specified query range. The contributions here include efficient data structures and algorithms that have fast query time while using linear or near-linear space for a variety of query shapes. In addition, a generic framework is presented, which returns a closest pair that is no farther apart than the closest pair in a suitably shrunken version of the query range.enalgorithmscomputational geometrydata structurespreference queriesproximity searchstochasticityEfficient geometric algorithms for preference top-k queries, stochastic line arrangements, and proximity problemsThesis or Dissertation