Peng, Cheng2022-03-172022-03-172022-01https://hdl.handle.net/11299/226644University of Minnesota Ph.D. dissertation. 2022. Major: Computer Science. Advisor: Volkan Isler. 1 computer file (PDF); 123 pages.Our lives are becoming more convenient than ever with the fast-growing availability of autonomous systems for applications ranging from indoor cleaning to transportation. Systems and algorithms are being developed at a rapid pace to capture the markets driving these applications. Capturing data to visualize and understand the operating environments in these applications is crucial. Whether it is to capture a High-Definition (HD) images or the semantic information regarding humans and other objects, the need for higher-fidelity data increases and so does the demand for more efficient methods to collect it. In this dissertation, we focus on efficiently collecting and extracting 3D geometric and semantic information around us using affordable robotic systems such as unmanned aerial vehicles. This dissertation is divided into four main parts: In the first part, we work on the problem of selecting a small number of views for high-quality 3D reconstruction and generating a mosaic of a planar scene from an existing trajectory of views. We present a novel reconstruction uncertainty model that directly reflects the triangulation quality in 3D. We model the true uncertainty of each pixel as a cone originating from the camera center, which faithfully represents the uncertainty geometrically. We show that one can select two good views and obtain a reconstruction which is almost as good as merging all possible views from the entire viewing plane. We also show that a coarse camera grid (of resolution proportional to the scene depth) can provide a good reconstruction of the entire world plane. In the second part, we generalize our results to beyond planar scenes. More specifically, we seek a minimum length trajectory for a drone to take images of an unknown scene such that the final 3D reconstruction model obtained from these images is of high quality.We propose an efficient view sampling space called adaptive viewing plane that provides an effective alternative to uniform sampling in 3D space. Our method naturally adapts to scene geometries and produces high-quality reconstructions. Another key contribution is that our method admits a constant factor theoretical bound for the trajectory length comparing to the optimal one. In the third part, we turn our attention to coverage path planning for large-scale urban environments. To approach this problem, we model buildings as a set of surfaces with a fixed number of distinct surface normal. The coverage problem can then be transformed into a novel variant of the Traveling Salesperson Problem with Neighborhoods (generalized cone-TSPN). We present a polynomial-time approximation algorithm to the generalized cone-TSPN problem with a constant factor memory consumption. We use this result to present a method to cover a large-scale city subject to a geometric view quality constraint and present an upper bound which limits the deviation of our performance from the optimal performance. In the last part, we focus on attention-driven view planning for semantic object detection. The previous work pays equal importance to all geometric details for a high-quality representation of the entire scene. However, a complete geometric reconstruction is not necessary for semantic tasks. Therefore, we focus on recovering semantic information from a given environment. More specifically, we want to find the shortest tour such that all objects of interest can be detected successfully. We formulate the detection-based trajectory planning problem as a stochastic Traveling Salesperson Problem with Neighborhoods, and propose a method that obtains a polynomial approximation factor for the trajectory length. Finally, we show that our algorithm can efficiently detect and recognize the license plates for all vehicles on a parking lot from a photo-realistic simulation environment. In general, this dissertation provides insights into problems ranging from passive key-frame selection to active view planning for high-fidelity reconstruction. By exploring the explicit geometry of the underlying spaces, we make progress towards view selection and trajectory planning for high-quality coverage, reconstruction, and detection. As the demand for an HD map of the world increases for autonomous systems and virtual reality, our methods will become even more crucial especially in large-scale environments.en3D reconstructionComputer VisionPath planningRoboticstrajectory planningView planningView Selection for Large Scale 3D ReconstructionThesis or Dissertation