scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-calibration

01 Jan 2011-The International Journal of Robotics Research (SAGE Publications)-Vol. 30, Iss: 1, pp 56-79
TL;DR: This paper describes an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between a camera and an inertial measurement unit (IMU), which demonstrates accurate estimation of both the calibration parameters and the local scene structure.
Abstract: Visual and inertial sensors, in combination, are able to provide accurate motion estimates and are well suited for use in many robot navigation tasks. However, correct data fusion, and hence overall performance, depends on careful calibration of the rigid body transform between the sensors. Obtaining this calibration information is typically difficult and time-consuming, and normally requires additional equipment. In this paper we describe an algorithm, based on the unscented Kalman filter, for self-calibration of the transform between a camera and an inertial measurement unit (IMU). Our formulation rests on a differential geometric analysis of the observability of the camera—IMU system; this analysis shows that the sensor-to-sensor transform, the IMU gyroscope and accelerometer biases, the local gravity vector, and the metric scene structure can be recovered from camera and IMU measurements alone. While calibrating the transform we simultaneously localize the IMU and build a map of the surroundings, all without additional hardware or prior knowledge about the environment in which a robot is operating. We present results from simulation studies and from experiments with a monocular camera and a low-cost IMU, which demonstrate accurate estimation of both the calibration parameters and the local scene structure.
Citations
More filters
Journal ArticleDOI
TL;DR: This work forms a rigorously probabilistic cost function that combines reprojection errors of landmarks and inertial terms and compares the performance to an implementation of a state-of-the-art stochastic cloning sliding-window filter.
Abstract: Combining visual and inertial measurements has become popular in mobile robotics, since the two sensing modalities offer complementary characteristics that make them the ideal choice for accurate visual-inertial odometry or simultaneous localization and mapping SLAM. While historically the problem has been addressed with filtering, advancements in visual estimation suggest that nonlinear optimization offers superior accuracy, while still tractable in complexity thanks to the sparsity of the underlying problem. Taking inspiration from these findings, we formulate a rigorously probabilistic cost function that combines reprojection errors of landmarks and inertial terms. The problem is kept tractable and thus ensuring real-time operation by limiting the optimization to a bounded window of keyframes through marginalization. Keyframes may be spaced in time by arbitrary intervals, while still related by linearized inertial terms. We present evaluation results on complementary datasets recorded with our custom-built stereo visual-inertial hardware that accurately synchronizes accelerometer and gyroscope measurements with imagery. A comparison of both a stereo and monocular version of our algorithm with and without online extrinsics estimation is shown with respect to ground truth. Furthermore, we compare the performance to an implementation of a state-of-the-art stochastic cloning sliding-window filter. This competitive reference implementation performs tightly coupled filtering-based visual-inertial odometry. While our approach declaredly demands more computation, we show its superior performance in terms of accuracy.

1,472 citations


Cites background from "Visual-Inertial Sensor Fusion: Loca..."

  • ...…cameras (Jia and Evans, 2012; Li et al., 2013), offline (Lobo and Dias, 2007; Mirzaei and Roumeliotis, 2007, 2008) and online (Jones and Soatto, 2011; Kelly and Sukhatme, 2011; Dong-Si and Mourikis, 2012; Weiss et al., 2012) calibration of the relative position and orientation of camera and IMU....

    [...]

  • ...…2013), iterated EKFs (IEKFs) (Strelow and Singh, 2003, 2004) and unscented Kalman filters (UKFs) (Shin and El-Sheimy, 2004; Ebcin and Veth, 2007; Kelly and Sukhatme, 2011) to name a few, which over the years showed an impressive improvement in precision and a reduction computational complexity....

    [...]

  • ..., 2013), offline (Lobo and Dias, 2007; Mirzaei and Roumeliotis, 2007, 2008) and online (Weiss et al., 2012; Kelly and Sukhatme, 2011; Jones and Soatto, 2011; Dong-Si and Mourikis, 2012) calibration of the relative position and orientation of camera and IMU....

    [...]

  • ..., 2013), Iterated EKFs (IEKFs) (Strelow and Singh, 2004, 2003) and Unscented Kalman Filters (UKFs) (Shin and El-Sheimy, 2004; Ebcin and Veth, 2007; Kelly and Sukhatme, 2011) to name a few, which over the years showed an impressive improvement in precision and a reduction computational complexity....

    [...]

Journal ArticleDOI
TL;DR: A novel, real-time EKF-based VIO algorithm is proposed, which achieves consistent estimation by ensuring the correct observability properties of its linearized system model, and performing online estimation of the camera-to-inertial measurement unit (IMU) calibration parameters.
Abstract: In this paper, we focus on the problem of motion tracking in unknown environments using visual and inertial sensors. We term this estimation task visual-inertial odometry (VIO), in analogy to the well-known visual-odometry problem. We present a detailed study of extended Kalman filter (EKF)-based VIO algorithms, by comparing both their theoretical properties and empirical performance. We show that an EKF formulation where the state vector comprises a sliding window of poses (the multi-state-constraint Kalman filter (MSCKF)) attains better accuracy, consistency, and computational efficiency than the simultaneous localization and mapping (SLAM) formulation of the EKF, in which the state vector contains the current pose and the features seen by the camera. Moreover, we prove that both types of EKF approaches are inconsistent, due to the way in which Jacobians are computed. Specifically, we show that the observability properties of the EKF's linearized system models do not match those of the underlying system, which causes the filters to underestimate the uncertainty in the state estimates. Based on our analysis, we propose a novel, real-time EKF-based VIO algorithm, which achieves consistent estimation by (i) ensuring the correct observability properties of its linearized system model, and (ii) performing online estimation of the camera-to-inertial measurement unit (IMU) calibration parameters. This algorithm, which we term MSCKF 2.0, is shown to achieve accuracy and consistency higher than even an iterative, sliding-window fixed-lag smoother, in both Monte Carlo simulations and real-world testing.

670 citations


Cites background or methods from "Visual-Inertial Sensor Fusion: Loca..."

  • ...…the EKF Jacobians are computed, even though the IMU’s rotation about gravity (the yaw) is not observable in VIO (see, e.g., (Jones and Soatto, 2011; Kelly and Sukhatme, 2011; Martinelli, 2012)), it appears to be observable in the linearized system model used by the MSCKF, and the same occurs in…...

    [...]

  • ...The observability properties of the nonlinear system in visual–inertial navigation have recently been studied in (Jones and Soatto, 2011; Kelly and Sukhatme, 2011; Martinelli, 2012)....

    [...]

  • ...…present-day algorithms in this class are either extended Kalman filter (EKF)-based methods (Mourikis and Roumeliotis, 2007; Jones and Soatto, 2011; Kelly and Sukhatme, 2011), or methods utilizing iterative minimization over a window of states (Konolige and Agrawal, 2008; Dong-Si and Mourikis,…...

    [...]

  • ...Moreover, based on the analysis of (Jones and Soatto, 2011; Kelly and Sukhatme, 2011), we know that the camera-to-IMU transformation is observable for general trajectories....

    [...]

  • ..., (Jones and Soatto, 2011; Kelly and Sukhatme, 2011; Martinelli, 2012)), it appears to be observable in the linearized system model used by the MSCKF, and the same occurs in EKF-SLAM....

    [...]

Proceedings ArticleDOI
01 Sep 2015
TL;DR: A monocular visual-inertial odometry algorithm which achieves accurate tracking performance while exhibiting a very high level of robustness by directly using pixel intensity errors of image patches, leading to a truly power-up-and-go state estimation system.
Abstract: In this paper, we present a monocular visual-inertial odometry algorithm which, by directly using pixel intensity errors of image patches, achieves accurate tracking performance while exhibiting a very high level of robustness. After detection, the tracking of the multilevel patch features is closely coupled to the underlying extended Kalman filter (EKF) by directly using the intensity errors as innovation term during the update step. We follow a purely robocentric approach where the location of 3D landmarks are always estimated with respect to the current camera pose. Furthermore, we decompose landmark positions into a bearing vector and a distance parametrization whereby we employ a minimal representation of differences on a corresponding σ-Algebra in order to achieve better consistency and to improve the computational performance. Due to the robocentric, inverse-distance landmark parametrization, the framework does not require any initialization procedure, leading to a truly power-up-and-go state estimation system. The presented approach is successfully evaluated in a set of highly dynamic hand-held experiments as well as directly employed in the control loop of a multirotor unmanned aerial vehicle (UAV).

665 citations


Cites background or methods from "Visual-Inertial Sensor Fusion: Loca..."

  • ...[5], additional IMU measurements can be relatively simply integrated into the ego-motion estimation, whereby calibration parameters can be co-estimated online [14], [12]....

    [...]

  • ...While targeting a simple and consistent approach and avoiding ad-hoc solutions, we adapt the structure of the standard visual-inertial EKF-SLAM formulation [14], [12]....

    [...]

  • ...Along the lines of other visual-inertial EKF approaches ([14], [12]) we fully integrate visual features into the state of the Kalman filter (see also section II-A)....

    [...]

  • ...The overall structure of the filter is derived from the one employed in [14], [12]: The inertial measurements are used to propagate the state of the filter, while the visual information is taken into account during the filter update steps....

    [...]

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A novel framework for jointly estimating the temporal offset between measurements of different sensors and their spatial displacements with respect to each other is presented, enabled by continuous-time batch estimation and extends previous work by seamlessly incorporating time offsets within the rigorous theoretical framework of maximum likelihood estimation.
Abstract: In order to increase accuracy and robustness in state estimation for robotics, a growing number of applications rely on data from multiple complementary sensors. For the best performance in sensor fusion, these different sensors must be spatially and temporally registered with respect to each other. To this end, a number of approaches have been developed to estimate these system parameters in a two stage process, first estimating the time offset and subsequently solving for the spatial transformation between sensors. In this work, we present on a novel framework for jointly estimating the temporal offset between measurements of different sensors and their spatial displacements with respect to each other. The approach is enabled by continuous-time batch estimation and extends previous work by seamlessly incorporating time offsets within the rigorous theoretical framework of maximum likelihood estimation. Experimental results for a camera to inertial measurement unit (IMU) calibration prove the ability of this framework to accurately estimate time offsets up to a fraction of the smallest measurement period.

626 citations

Proceedings ArticleDOI
05 Nov 2018
TL;DR: This tutorial provides principled methods to quantitatively evaluate the quality of an estimated trajectory from visual(-inertial) odometry (VO/VIO), which is the foundation of benchmarking the accuracy of different algorithms.
Abstract: In this tutorial, we provide principled methods to quantitatively evaluate the quality of an estimated trajectory from visual(-inertial) odometry (VO/VIO), which is the foundation of benchmarking the accuracy of different algorithms. First, we show how to determine the transformation type to use in trajectory alignment based on the specific sensing modality (i.e., monocular, stereo and visual-inertial). Second, we describe commonly used error metrics (i.e., the absolute trajectory error and the relative error) and their strengths and weaknesses. To make the methodology presented for VO/VIO applicable to other setups, we also generalize our formulation to any given sensing modality. To facilitate the reproducibility of related research, we publicly release our implementation of the methods described in this tutorial.

456 citations


Cites background from "Visual-Inertial Sensor Fusion: Loca..."

  • ...This yaw-only rigid body transformation (one DoF rotation plus a translation) corresponds to the four unobservable DoFs for visual-inertial systems [12]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations


"Visual-Inertial Sensor Fusion: Loca..." refers methods in this paper

  • ...SIFT features are invariant to changes in scale and rotation, and partially invariant to changes in illumination; a fast C implementation of SIFT is available (Vedaldi and Fulkerson 2009)....

    [...]

  • ...We employ SIFT Lowe (2004) as our feature detector....

    [...]

Book
01 Jan 1983

34,729 citations

Book
01 Nov 1996

8,608 citations


"Visual-Inertial Sensor Fusion: Loca..." refers methods in this paper

  • ...The matrix square root of Pa ( tk−1) is found by Cholesky decomposition (Golub and Loan 1996)....

    [...]

  • ...The matrix square root of P+a ( tk−1) is found by Cholesky decomposition (Golub and Loan 1996)....

    [...]

Book
01 Jan 1985
TL;DR: In this paper, a systematic feedback design theory for solving the problems of asymptotic tracking and disturbance rejection for linear distributed parameter systems is presented, which is intended to support the development of flight controllers for increasing the high angle of attack or high agility capabilities of existing and future generations of aircraft.
Abstract: : The principal goal of this three years research effort was to enhance the research base which would support efforts to systematically control, or take advantage of, dominant nonlinear or distributed parameter effects in the evolution of complex dynamical systems. Such an enhancement is intended to support the development of flight controllers for increasing the high angle of attack or high agility capabilities of existing and future generations of aircraft and missiles. The principal investigating team has succeeded in the development of a systematic methodology for designing feedback control laws solving the problems of asymptotic tracking and disturbance rejection for nonlinear systems with unknown, or uncertain, real parameters. Another successful research project was the development of a systematic feedback design theory for solving the problems of asymptotic tracking and disturbance rejection for linear distributed parameter systems. The technical details which needed to be overcome are discussed more fully in this final report.

8,525 citations

Journal ArticleDOI
08 Nov 2004
TL;DR: The motivation, development, use, and implications of the UT are reviewed, which show it to be more accurate, easier to implement, and uses the same order of calculations as linearization.
Abstract: The extended Kalman filter (EKF) is probably the most widely used estimation algorithm for nonlinear systems. However, more than 35 years of experience in the estimation community has shown that is difficult to implement, difficult to tune, and only reliable for systems that are almost linear on the time scale of the updates. Many of these difficulties arise from its use of linearization. To overcome this limitation, the unscented transformation (UT) was developed as a method to propagate mean and covariance information through nonlinear transformations. It is more accurate, easier to implement, and uses the same order of calculations as linearization. This paper reviews the motivation, development, use, and implications of the UT.

6,098 citations


"Visual-Inertial Sensor Fusion: Loca..." refers background or methods in this paper

  • ...For Gaussian state distributions, the posterior estimate produced by the UKF is accurate to the third order, while the EKF estimate is accurate to the first order only10 (van der Merwe and Wan 2004)....

    [...]

  • ...For example, Davison et al. (2007) describe an extended Kalman filter (EKF)-based system that is able to localize a camera in a room-sized environment....

    [...]

  • ...Their algorithm uses an iterated EKF to fuse IMU data with camera measurements of known corner points on a planar calibration target....

    [...]

  • ...Our filter implementation augments the state vector and state covariance matrix with a process noise component, as described by Julier and Uhlmann (2004), xa( tk) = [ x( tk) n( tk) ] , (60) where xa( tk) is the augmented state vector, of size N , at time tk , and n( tk) is the 12 × 1 process noise…...

    [...]

  • ...Our choice of the UKF is motivated by its superior performance compared with the EKF for many non-linear problems....

    [...]