On the Efficiency and Consistency of Visual-Inertial Localization and Mapping

Simultaneous localization and mapping (SLAM) is a prerequisite for a wide range of applications, such as robot navigation in GPS-denied areas, autonomous driving, and augmented or virtual reality. As inertial measurement unit (IMU) and camera are becoming ubiquitous, visual-inertial SLAM (VI-SLAM) systems have prevailed in these applications, in part because of the complementary sensing capabilities and the decreasing costs and size of the sensors. Although successful VI-SLAM systems have been developed over the past decade, there still exist many challenges that limit the performance of such systems, especially when deployed on resource-constrained mobile devices (e.g., smart phones, tablets, and wearable computers). In this dissertation, we seek to address three key challenges for improving the efficiency, accuracy, and consistency of VI-SLAM. The first part of this dissertation considers the problem of short-term VI-SLAM, aka visual-inertial odometry (VIO), where the system focuses its optimization over only a bounded-size sliding window of recent states (poses and features), for achieving constant processing time. While high computational efficiency is of critical importance for such systems, one of the main limitations of existing VIO algorithms is the requirement of using double-precision arithmetic for implementation, due to the ill-conditioning of the VIO problem. To address this issue, we present a square-root inverse sliding-window filter for highly efficient and accurate VIO. By maintaining and updating the upper-triangular Cholesky factor of the Hessian matrix, our estimator can yield the same effective precision of regular filters while using only half of the wordlength, thus enabling single-precision implementation. This leads to significant speedups as compared to double-precision alternatives, especially on mobile devices with co-processors that provide a 4-fold processing speed acceleration for 32-bit floating-point operations. In the second part of this dissertation, we study the case when VI-SLAM systems are deployed on mobile platforms that have restricted motions (e.g., ground robots or self-driving cars). In such cases, we observe that the localization errors of VI-SLAM systems are significantly larger than those on the platforms moving freely in the 3D space. We investigate this issue and discover that the restricted motion that ground robots often undergo (e.g., constant speed or acceleration, or no rotation) alters the observability properties of VI-SLAM and renders additional unobservable directions (e.g., the scale, or roll and pitch angles). As a result, little or no information can be obtained along these directions in the estimates, which will degrade the localization accuracy of the employed VI-SLAM estimator. To address this limitation, we extend the VI-SLAM system to incorporate extra information, from wheel-encoder data and planar-motion constraints, which leads to significant improvements in positioning accuracy for wheeled robots moving primarily on a plane. Lastly, we address the long-term VI-SLAM problem. In such systems, in addition to the local optimization of the short-term VI-SLAM, global adjustment of past states is performed using loop-closure measurements (reobservations to previously-mapped features), so as to reduce global drifts in the estimates for long-term accuracy. In order to achieve real-time operation, however, existing approaches often assume previously-estimated states to be perfectly known (e.g., previous keyframes or maps), which leads to inconsistent estimates. This means that the estimated covariance is unduly small and does not represent correctly the uncertainty of the current state estimates, and combining these overly optimistic estimates with new measurements later on will further degrade the accuracy of the system. Instead, based on the idea of the Schmidt-Kalman filter, we derive a new consistent approximate method in the information domain, which has linear memory requirements and adjustable (constant to linear) processing cost. By employing this method with different configurations, we realize an efficient and accurate long-term VI-SLAM system, the RISE-SLAM, which improves estimation consistency.

Keywords

Estimation and filtering

Estimation consistency

Observability analysis

Simultaneous localization and mapping

Visual-inertial odometry

Visual-inertial SLAM

Description

University of Minnesota Ph.D. dissertation. February 2024. Major: Electrical/Computer Engineering. Advisor: Stergios Roumeliotis. 1 computer file (PDF); xi, 184 pages.

Collections

Dissertations

Suggested citation

Wu, Kejian. (2024). On the Efficiency and Consistency of Visual-Inertial Localization and Mapping. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/262002.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University of Minnesota Twin Cities

University Digital Conservancy

On the Efficiency and Consistency of Visual-Inertial Localization and Mapping

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation