Efficient and Accurate Visual-Inertial Localization and Mapping
2023-01
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Efficient and Accurate Visual-Inertial Localization and Mapping
Authors
Published Date
2023-01
Publisher
Type
Thesis or Dissertation
Abstract
Thanks to the growing availability of low-cost and compact sensors, today’s mobile devices (e.g., smart phones, watches, and glasses) are capable of performing a variety of tasks that enhance our quality of life. Particularly, we can use mobile devices to locate ourselves in an environment and create a map for it, i.e., simultaneous localization and mapping (SLAM), which enables a body of applications in human navigation, augmented reality (AR), and virtual reality (VR). In GPS-denied areas, such as indoors, one of the dominant SLAM solutions is to combine the information from cameras and inertial measurement units (IMU). These systems are usually known as visual-aided inertial navigation systems (VINS). Using VINS to carry out SLAM tasks with a mobile device has been successful over the past few years, primarily due to their low cost and high accuracy. Nevertheless, there are still many challenges in visual-inertial SLAM,and this thesis focuses on addressing four of them. Firstly, since the computational resources are often limited, the estimators of VINS are always facing the trade-off between accuracy and efficiency. Depending on the resources available, VINS estimators can be extremely accurate but slow, e.g., bundle adjustment (BA), or efficient but prone to long-term drift, e.g., visual inertial odometry (VIO). One of the state-of-the-art paradigms is to combine VIO and BA in a multi-threaded manner so as to achieve low-latency localization through the VIO, while correcting its drift with the precise map built by the BA. However, to maintain the efficiency of the VIO, approximation is required when combining the data from the VIO and BA, which leads to suboptimal estimates. The first contribution of this thesis is the introduction of a uniformed VINS estimator, which performs both real-time localization and accurate mapping in a single estimator, as opposed to splitting into two. Since no further fusion between the localization and mapping is necessary, the estimator is able to achieve both efficiency and accuracy. Secondly, in practice, there are scenarios where a dense map of the environment is necessary. The maps created by VINS, however, are typically represented as a sparse 3D point cloud that is only concentrated in the textured regions of the environment. These maps offer essential information for localization but limited clues about the scene’s structure. The recent development in deep learning enables us to estimate such a dense map even from a single image, while the geometric information from multi-view is not exploited in these systems. In order to take advantage of the strengths of both deep learning and multi-view geometry based SLAM systems, this thesis proposes a method that produces a dense map of the environment by combining the information from deep neural networks and SLAM, which achieves high depth accuracy. Thirdly, when a pre-built map of the environment is available, it is important to localize against the map efficiently and robustly. To perform this task, we usually determine the position and orientation (pose) of the camera from observations of n mapped points, i.e., Perspective-n-Points (PnP), where a minimal solver for Perspective-3-Points (P3P) is required for rejecting outliers. Numerous P3P solvers have been published since 1841; however, the majority of them solve for the distance between the camera and the mapped points first, after which expensive post-processing is employed in order to recover the camera pose. The contribution of this thesis to the P3P problem is that we propose an algebraic method that directly solves for the camera pose, which is shown to be more efficient and robust than previous state-of-the-art approaches. Lastly, VINS rely heavily on the quality of visual information, while in practice, such information may not always be reliable due to challenging scenes (e.g., low-texture regions or moving objects). In these cases, the IMU can still offer information for localization, but its estimates drift quickly because of the rapidly accumulating errors. On a person-held device, one may learn a human-walking model to provide additional information for reducing the IMU’s drift, i.e., pedestrian dead reckoning (PDR), which has become more and more reliable thanks to the recent advance in deep learning. Nonetheless, the errors of such systems still accumulate with time, albeit at a lower rate. To address the drifting issue, a map is necessary, and to this end, this thesis proposes a novel system to incorporate a map with a PDR system, which is able to localize the device with only an IMU sensor while ensuring bounded position errors.
Keywords
Description
University of Minnesota Ph.D. dissertation. January 2023. Major: Computer Science. Advisor: Stergios Roumeliotis. 1 computer file (PDF); viii, 96 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Suggested citation
KE, TONG. (2023). Efficient and Accurate Visual-Inertial Localization and Mapping. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/257123.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.