scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Event-Based Visual Inertial Odometry

TL;DR: This paper presents the first algorithm to fuse a purely event-based tracking algorithm with an inertial measurement unit, to provide accurate metric tracking of a cameras full 6dof pose.
Abstract: Event-based cameras provide a new visual sensing model by detecting changes in image intensity asynchronously across all pixels on the camera. By providing these events at extremely high rates (up to 1MHz), they allow for sensing in both high speed and high dynamic range situations where traditional cameras may fail. In this paper, we present the first algorithm to fuse a purely event-based tracking algorithm with an inertial measurement unit, to provide accurate metric tracking of a cameras full 6dof pose. Our algorithm is asynchronous, and provides measurement updates at a rate proportional to the camera velocity. The algorithm selects features in the image plane, and tracks spatiotemporal windows around these features within the event stream. An Extended Kalman Filter with a structureless measurement model then fuses the feature tracks with the output of the IMU. The camera poses from the filter are then used to initialize the next step of the tracker and reject failed tracks. We show that our method successfully tracks camera motion on the Event-Camera Dataset in a number of challenging situations.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras.
Abstract: Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of is), very high dynamic range (140dB vs. 60dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

697 citations


Cites background or methods from "Event-Based Visual Inertial Odometr..."

  • ...ks are extracted from the events, and then these point trajectories on the image plane are fused with IMU measurements using state-of-the-art VIO algorithms, such as [207], [208], [209]. For example, [119] tracked features using [116], and combined them with IMU data by means of the Kalman filter in [207]. Recently, [49] proposed to synthesize motion-compensated event images [115] and then detect-and-tr...

    [...]

  • ... exploit the fine temporal information of individual events for estimation, and therefore tend to depart from traditional computer vision algorithms [23], [26], [32], [49], [115], [116], [117], [118], [119], [120]. The review [7] quantitatively compares accuracy and computational cost for frame-based versus event-driven optical flow. Events are processed differently depending on their representation. Som...

    [...]

  • ...ilt in [116] from motioncompensated events, producing point-set–based templates to which new events are registered. These features allowed to tackle the moving camera scenario in natural scenes [46], [119]. Also in this scenario, [49] proposed to apply traditional feature detectors [167] and trackers [168] on patches of motion-compensated event images [115]. Hence, motioncompensated events provide a us...

    [...]

  • ... [204]18, [182], [205], [250], [251]. The most popular one is described in [205], which has been used to benchmark visual odometry and visual-inertial odometry methods [26], [49], [50], [115], [118], [119], [203]. This dataset is also popular to evaluate corner detectors [122], [123] and feature trackers [46], [159]. Datasets for recognition are currently of limited size compared to traditional compute...

    [...]

Proceedings ArticleDOI
21 May 2018
TL;DR: This paper evaluates an array of publicly-available VIO pipelines on different hardware configurations, including several single-board computer systems that are typically found on flying robots, and considers the pose estimation accuracy, per-frame processing time, and CPU and memory load while processing the EuRoC datasets.
Abstract: Flying robots require a combination of accuracy and low latency in their state estimation in order to achieve stable and robust flight. However, due to the power and payload constraints of aerial platforms, state estimation algorithms must provide these qualities under the computational constraints of embedded hardware. Cameras and inertial measurement units (IMUs) satisfy these power and payload constraints, so visual-inertial odometry (VIO) algorithms are popular choices for state estimation in these scenarios, in addition to their ability to operate without external localization from motion capture or global positioning systems. It is not clear from existing results in the literature, however, which VIO algorithms perform well under the accuracy, latency, and computational constraints of a flying robot with onboard state estimation. This paper evaluates an array of publicly-available VIO pipelines (MSCKF, OKVIS, ROVIO, VINS-Mono, SVO+MSF, and SVO+GTSAM) on different hardware configurations, including several single-board computer systems that are typically found on flying robots. The evaluation considers the pose estimation accuracy, per-frame processing time, and CPU and memory load while processing the EuRoC datasets, which contain six degree of freedom (6DoF) trajectories typical of flying robots. We present our complete results as a benchmark for the research community.

358 citations


Cites methods from "Event-Based Visual Inertial Odometr..."

  • ...The extended Kalman filter backend in [29] implements this formulation of the MSCKF for event-based camera inputs, but has been adapted to feature tracks from standard cameras....

    [...]

Journal ArticleDOI
09 Feb 2018
TL;DR: This letter presents a large dataset with a synchronized stereo pair event based camera system, carried on a handheld rig, flown by a hexacopter, driven on top of a car, and mounted on a motorcycle, in a variety of different illumination levels and environments.
Abstract: Event-based cameras are a new passive sensing modality with a number of benefits over traditional cameras, including extremely low latency, asynchronous data acquisition, high dynamic range, and very low power consumption. There has been a lot of recent interest and development in applying algorithms to use the events to perform a variety of three-dimensional perception tasks, such as feature tracking, visual odometry, and stereo depth estimation. However, there currently lacks the wealth of labeled data that exists for traditional cameras to be used for both testing and development. In this letter, we present a large dataset with a synchronized stereo pair event based camera system, carried on a handheld rig, flown by a hexacopter, driven on top of a car, and mounted on a motorcycle, in a variety of different illumination levels and environments. From each camera, we provide the event stream, grayscale images, and inertial measurement unit (IMU) readings. In addition, we utilize a combination of IMU, a rigidly mounted lidar system, indoor and outdoor motion capture, and GPS to provide accurate pose and depth images for each camera at up to 100 Hz. For comparison, we also provide synchronized grayscale images and IMU readings from a frame-based stereo camera system.

280 citations


Cites methods from "Event-Based Visual Inertial Odometr..."

  • ...The authors in [17] and [18] proposed novel methods to perform feature tracking in the event space, which they extended in [19] and [20] to perform visual and visual inertial odometry, respectively....

    [...]

Journal ArticleDOI
TL;DR: Event cameras as discussed by the authors are bio-inspired sensors that differ from conventional frame cameras: instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes.
Abstract: Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

277 citations

Proceedings ArticleDOI
02 Dec 2019
TL;DR: In this article, a grid-based representation for event cameras is proposed, which can learn the input event representation together with the task dedicated network in an end-to-end manner.
Abstract: Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events”. They have appealing advantages over frame based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatio-temporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations by means of strictly differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.

212 citations

References
More filters
Proceedings ArticleDOI
16 Jun 2012
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Abstract: Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti

11,283 citations


"Event-Based Visual Inertial Odometr..." refers background or methods in this paper

  • ...In each EM step, the template point sets {l̃j} are subsampled using sphere decimation [8], with radius 1 pixel....

    [...]

  • ...In Table 1, we present the mean position error as a percentage of total distance traveled and rotation error over distance traveled for each sequence, which are common metrics for VIO applications [8]....

    [...]

Proceedings ArticleDOI
21 Jun 1994
TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.
Abstract: No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well and correspond to physical points in the world is still hard. We propose a feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world. These methods are based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work under affine image transformations. We test performance with several simulations and experiments. >

8,432 citations


"Event-Based Visual Inertial Odometr..." refers methods in this paper

  • ...The actual corner detection is performed with FAST corners [17], with the image split into cells of fixed size, and the corner with the highest Shi-Tomasi score [18] within each cell being selected, as in [5]....

    [...]

Journal ArticleDOI
TL;DR: A new heuristic for feature detection is presented and, using machine learning, a feature detector is derived from this which can fully process live PAL video using less than 5 percent of the available processing time.
Abstract: The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is important because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate. Three advances are described in this paper. First, we present a new heuristic for feature detection and, using machine learning, we derive a feature detector from this which can fully process live PAL video using less than 5 percent of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115 percent, SIFT 195 percent). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that, despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and of very high quality.

1,847 citations


"Event-Based Visual Inertial Odometr..." refers methods in this paper

  • ...The actual corner detection is performed with FAST corners [17], with the image split into cells of fixed size, and the corner with the highest Shi-Tomasi score [18] within each cell being selected, as in [5]....

    [...]

Proceedings ArticleDOI
29 Sep 2014
TL;DR: A semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods and applied to micro-aerial-vehicle state-estimation in GPS-denied environments is proposed.
Abstract: We propose a semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods. The semi-direct approach eliminates the need of costly feature extraction and robust matching techniques for motion estimation. Our algorithm operates directly on pixel intensities, which results in subpixel precision at high frame-rates. A probabilistic mapping method that explicitly models outlier measurements is used to estimate 3D points, which results in fewer outliers and more reliable points. Precise and high frame-rate motion estimation brings increased robustness in scenes of little, repetitive, and high-frequency texture. The algorithm is applied to micro-aerial-vehicle state-estimation in GPS-denied environments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Semi-direct Visual Odometry) and release our implementation as open-source software.

1,814 citations


"Event-Based Visual Inertial Odometr..." refers methods in this paper

  • ...The actual corner detection is performed with FAST corners [17], with the image split into cells of fixed size, and the corner with the highest Shi-Tomasi score [18] within each cell being selected, as in [5]....

    [...]

Proceedings ArticleDOI
10 Apr 2007
TL;DR: The primary contribution of this work is the derivation of a measurement model that is able to express the geometric constraints that arise when a static feature is observed from multiple camera poses, and is optimal, up to linearization errors.
Abstract: In this paper, we present an extended Kalman filter (EKF)-based algorithm for real-time vision-aided inertial navigation. The primary contribution of this work is the derivation of a measurement model that is able to express the geometric constraints that arise when a static feature is observed from multiple camera poses. This measurement model does not require including the 3D feature position in the state vector of the EKF and is optimal, up to linearization errors. The vision-aided inertial navigation algorithm we propose has computational complexity only linear in the number of features, and is capable of high-precision pose estimation in large-scale real-world environments. The performance of the algorithm is demonstrated in extensive experimental results, involving a camera/IMU system localizing within an urban area.

1,435 citations


"Event-Based Visual Inertial Odometr..." refers background or methods in this paper

  • ...We then left multiply r by the left null space, A, of the feature Jacobian, HF , as in [14], to eliminate the feature position up to a first order approximation:...

    [...]

  • ...Similar to the MSCKF [14], we eliminate the depth from the measurement equations so that we do not have to keep triangulated features in the state vector....

    [...]

  • ...For compactness, we do not expand on the fine details of the filter, and instead refer interested readers to [13] and [14]....

    [...]

  • ...As in [14], we perform one final step to reduce the dimensionality of the above residual....

    [...]

  • ...6 then employs an Extended Kalman Filter with a structureless vision model, as first introduced in [14]....

    [...]