On the Efficiency and Consistency of Visual-Inertial Localization and Mapping

Wu, Kejian2024-03-292024-03-292024-02https://hdl.handle.net/11299/262002University of Minnesota Ph.D. dissertation. February 2024. Major: Electrical/Computer Engineering. Advisor: Stergios Roumeliotis. 1 computer file (PDF); xi, 184 pages.Simultaneous localization and mapping (SLAM) is a prerequisite for a wide range of applications, such as robot navigation in GPS-denied areas, autonomous driving, and augmented or virtual reality. As inertial measurement unit (IMU) and camera are becoming ubiquitous, visual-inertial SLAM (VI-SLAM) systems have prevailed in these applications, in part because of the complementary sensing capabilities and the decreasing costs and size of the sensors. Although successful VI-SLAM systems have been developed over the past decade, there still exist many challenges that limit the performance of such systems, especially when deployed on resource-constrained mobile devices (e.g., smart phones, tablets, and wearable computers). In this dissertation, we seek to address three key challenges for improving the efficiency, accuracy, and consistency of VI-SLAM. The first part of this dissertation considers the problem of short-term VI-SLAM, aka visual-inertial odometry (VIO), where the system focuses its optimization over only a bounded-size sliding window of recent states (poses and features), for achieving constant processing time. While high computational efficiency is of critical importance for such systems, one of the main limitations of existing VIO algorithms is the requirement of using double-precision arithmetic for implementation, due to the ill-conditioning of the VIO problem. To address this issue, we present a square-root inverse sliding-window filter for highly efficient and accurate VIO. By maintaining and updating the upper-triangular Cholesky factor of the Hessian matrix, our estimator can yield the same effective precision of regular filters while using only half of the wordlength, thus enabling single-precision implementation. This leads to significant speedups as compared to double-precision alternatives, especially on mobile devices with co-processors that provide a 4-fold processing speed acceleration for 32-bit floating-point operations. In the second part of this dissertation, we study the case when VI-SLAM systems are deployed on mobile platforms that have restricted motions (e.g., ground robots or self-driving cars). In such cases, we observe that the localization errors of VI-SLAM systems are significantly larger than those on the platforms moving freely in the 3D space. We investigate this issue and discover that the restricted motion that ground robots often undergo (e.g., constant speed or acceleration, or no rotation) alters the observability properties of VI-SLAM and renders additional unobservable directions (e.g., the scale, or roll and pitch angles). As a result, little or no information can be obtained along these directions in the estimates, which will degrade the localization accuracy of the employed VI-SLAM estimator. To address this limitation, we extend the VI-SLAM system to incorporate extra information, from wheel-encoder data and planar-motion constraints, which leads to significant improvements in positioning accuracy for wheeled robots moving primarily on a plane. Lastly, we address the long-term VI-SLAM problem. In such systems, in addition to the local optimization of the short-term VI-SLAM, global adjustment of past states is performed using loop-closure measurements (reobservations to previously-mapped features), so as to reduce global drifts in the estimates for long-term accuracy. In order to achieve real-time operation, however, existing approaches often assume previously-estimated states to be perfectly known (e.g., previous keyframes or maps), which leads to inconsistent estimates. This means that the estimated covariance is unduly small and does not represent correctly the uncertainty of the current state estimates, and combining these overly optimistic estimates with new measurements later on will further degrade the accuracy of the system. Instead, based on the idea of the Schmidt-Kalman filter, we derive a new consistent approximate method in the information domain, which has linear memory requirements and adjustable (constant to linear) processing cost. By employing this method with different configurations, we realize an efficient and accurate long-term VI-SLAM system, the RISE-SLAM, which improves estimation consistency.enEstimation and filteringEstimation consistencyObservability analysisSimultaneous localization and mappingVisual-inertial odometryVisual-inertial SLAMOn the Efficiency and Consistency of Visual-Inertial Localization and MappingThesis or Dissertation