Motion Estimation Made Easy: Evolution and Trends in Visual Odometry
01 Jan 2019-pp 305-331
TL;DR: This chapter attempts to introduce this topic for beginners covering different aspects of vision-based motion estimation task and a list of different datasets for visual odometry and allied research areas are provided for a ready reference.
Abstract: With rapid advancements in the area of mobile robotics and industrial automation, a growing need has arisen towards accurate navigation and localization of moving objects. Camera-based motion estimation is one such technique which is gaining huge popularity owing to its simplicity and use of limited resources in generating motion path. In this chapter, an attempt is made to introduce this topic for beginners covering different aspects of vision-based motion estimation task. The theoretical section provides a brief on different computer vision fundamentals specific to pose estimation task followed by a systematic discussion on the visual odometry (VO) schemes under different categories. The evolution of VO schemes over last few decades is discussed under two broad categories, that is, geometric and non-geometric approaches. The geometric approaches are further detailed under three different classes, that is, feature-based, appearance-based, and a hybrid of feature and appearance based schemes. The non-geometric approach is one of the recent paradigm shift from conventional pose estimation technique and is discussed in a separate section. Towards the end, a list of different datasets for visual odometry and allied research areas are provided for a ready reference.
Citations
More filters
••
TL;DR: In this article, the authors surveyed state-of-the-art visual odometry and visual inertial odometry (VIO) approaches and compared the latest research works in this field.
Abstract: Vision-based localization systems, namely visual odometry (VO) and visual inertial odometry (VIO), have attracted great attention recently. They are regarded as critical modules for building fully autonomous systems. The simplicity of visual and inertial state estimators, along with their applicability in resource-constrained platforms motivated robotic community to research and develop novel approaches that maximize their robustness and reliability. In this paper, we surveyed state-of-the-art VO and VIO approaches. In addition, studies related to localization in visually degraded environments are also reviewed. The reviewed VO techniques and related studies have been analyzed in terms of key design aspects including appearance, feature, and learning based approaches. On the other hand, research studies related to VIO have been categorized based on the degree and type of fusion process into loosely-coupled, semi-tightly coupled, or tightly-coupled approaches and filtering or optimization-based paradigms. This paper provides an overview of the main components of visual localization, key design aspects highlighting the pros and cons of each approach, and compares the latest research works in this field. Finally, a detailed discussion of the challenges associated with the reviewed approaches and future research considerations are formulated.
24 citations
••
TL;DR: An attempt is made to remove the redundant features from the VO pipeline that do not have a significant effect on the estimation process with a probabilistic approach based on fast mutual information (MI) computation suggested here as the basis for removing features.
Abstract: Visual odometry (VO) is one of the promising techniques that estimates pose using the camera and does not necessarily require other sensor aiding. With increasing automation and the use of miniaturized systems such as mobile devices, wearable gadgets, & gaming consoles, demand for efficient algorithms have risen. In this paper, an attempt is made to remove the redundant features from the VO pipeline that do not have a significant effect on the estimation process. A probabilistic approach based on fast mutual information (MI) computation is suggested here as the basis for removing features. The MI value acts as a beacon for selecting distinct features while eliminating the redundant ones, thus improving the overall system speed and reducing storage requirements. The proposed MI-based feature selection framework for VO has been experimented on the KITTI vision benchmark suite and EuRoC MAV datasets available publicly. The estimated trajectory results have shown that the proposed technique is better in terms of computational efficiency and has similar accuracy as compared to the normal VO pipeline. Further investigations have also been carried out over the VSLAM framework to test its applicability in a real-time system.
7 citations
••
TL;DR: This paper conducts a comprehensive review of sensor modalities, including Inertial Measurement Units (IMUs), Light Detection and Ranging (LiDAR), radio detection and ranging (radar), and cameras, as well as applications of polymers in these sensors, for indoor odometry.
Abstract: Although Global Navigation Satellite Systems (GNSSs) generally provide adequate accuracy for outdoor localization, this is not the case for indoor environments, due to signal obstruction. Therefore, a self-contained localization scheme is beneficial under such circumstances. Modern sensors and algorithms endow moving robots with the capability to perceive their environment, and enable the deployment of novel localization schemes, such as odometry, or Simultaneous Localization and Mapping (SLAM). The former focuses on incremental localization, while the latter stores an interpretable map of the environment concurrently. In this context, this paper conducts a comprehensive review of sensor modalities, including Inertial Measurement Units (IMUs), Light Detection and Ranging (LiDAR), radio detection and ranging (radar), and cameras, as well as applications of polymers in these sensors, for indoor odometry. Furthermore, analysis and discussion of the algorithms and the fusion frameworks for pose estimation and odometry with these sensors are performed. Therefore, this paper straightens the pathway of indoor odometry from principle to application. Finally, some future prospects are discussed.
4 citations
01 Jan 2010
TL;DR: A dense structure model is developed for stereo image based Simultaneous Localization And Mapping (SLAM) to model dense environment structure incrementally by robustly integrating disparity maps from current and previous time instants.
Abstract: In this paper a dense structure model is developed for stereo image based Simultaneous Localization And Mapping (SLAM). It is proposed to model dense environment structure incrementally by robustly integrating disparity maps from current and previous time instants. In this way disparities can be refined over time to favor consistent 3D structure over noise. The analytical search bounds for disparities are transferred into the current map to allow efficient re-localization. The cost function is image-based and it is minimized by combining Iteratively Reweighted Least-Squares (IRLS) with exhaustive search for finding motion and disparity parameters respectively.
3 citations
••
TL;DR: An in-depth review of state-of-the-art visual and point cloud odometry methods, along with a direct performance comparison of some of these techniques in the autonomous driving context, and addresses how current AI advances constitute a way to overcome the current development plateau.
Abstract: The expansion of autonomous driving operations requires the research and development of accurate and reliable self-localization approaches. These include visual odometry methods, in which accuracy is potentially superior to GNSS-based techniques while also working in signal-denied areas. This paper presents an in-depth review of state-of-the-art visual and point cloud odometry methods, along with a direct performance comparison of some of these techniques in the autonomous driving context. The evaluated methods include camera, LiDAR, and multi-modal approaches, featuring knowledge and learning-based algorithms, which are compared from a common perspective. This set is subject to a series of tests on road driving public datasets, from which the performance of these techniques is benchmarked and quantitatively measured. Furthermore, we closely discuss their effectiveness against challenging conditions such as pronounced lighting variations, open spaces, and the presence of dynamic objects in the scene. The research demonstrates increased accuracy in point cloud-based methods by surpassing visual techniques by roughly 33.14% in trajectory error. This survey also identifies a performance stagnation in state-of-the-art methodologies, especially in complex conditions. We also examine how multi-modal architectures can circumvent individual sensor limitations. This aligns with the benchmarking results, where the multi-modal algorithms exhibit greater consistency across all scenarios, outperforming the best LiDAR method (CT-ICP) by 5.68% in translational drift. Additionally, we address how current AI advances constitute a way to overcome the current development plateau.
3 citations
References
More filters
••
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing
23,396 citations
••
20 Sep 1999TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
16,989 citations
••
07 May 2006TL;DR: A novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF's strong performance.
13,011 citations
•
24 Aug 1981TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.
12,944 citations
••
06 Nov 2011TL;DR: This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.
Abstract: Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.
8,702 citations