scispace - formally typeset
Proceedings ArticleDOI: 10.1109/SSPS.2017.8071595

Improved pose estimation by inlier refinement for visual odometry

01 May 2017-pp 224-228
Abstract: Visual odometry is a well-known technique that is used to compute the rotation and translation of any moving vehicle with the help of camera mounted over it. This task of vision-based navigation is used for different applications such as autonomous navigation, motion tracking and obstacle detection, etc. This paper illustrates an approach for estimating vehicle motion by detecting and matching scale invariant SURF feature over consequent image frames. These set of matched feature are passed through an outlier removal and inlier selection methodology sequentially in order to remove the inconsistent features. Additionally, the proposed scheme incorporates the bucketing technique to ensure spatial distribution of feature in the overall image space. The proposed scheme of inlier selection-cum-outlier rejection has been applied on the KITTI dataset available online and is found to work satisfactorily as compared to the individual outlier rejection or inlier selection mechanism.

...read more

Topics: Visual odometry (61%), Feature (computer vision) (55%), Pose (54%) ...read more
Citations
  More

Book ChapterDOI: 10.1007/978-3-030-03000-1_13
Shashi Poddar1, Rahul Kottath2, Vinod Karar1Institutions (2)
01 Jan 2019-
Abstract: With rapid advancements in the area of mobile robotics and industrial automation, a growing need has arisen towards accurate navigation and localization of moving objects. Camera-based motion estimation is one such technique which is gaining huge popularity owing to its simplicity and use of limited resources in generating motion path. In this chapter, an attempt is made to introduce this topic for beginners covering different aspects of vision-based motion estimation task. The theoretical section provides a brief on different computer vision fundamentals specific to pose estimation task followed by a systematic discussion on the visual odometry (VO) schemes under different categories. The evolution of VO schemes over last few decades is discussed under two broad categories, that is, geometric and non-geometric approaches. The geometric approaches are further detailed under three different classes, that is, feature-based, appearance-based, and a hybrid of feature and appearance based schemes. The non-geometric approach is one of the recent paradigm shift from conventional pose estimation technique and is discussed in a separate section. Towards the end, a list of different datasets for visual odometry and allied research areas are provided for a ready reference.

...read more

Topics: Visual odometry (65%), Motion estimation (55%), Pose (55%)

7 Citations


Book ChapterDOI: 10.1007/978-981-32-9291-8_1
Abstract: Visual odometry is a popular technique used to estimate motion in GPS-challenged environment, whose accuracy depends on the features extracted from the images. In past attempts to improved feature distinctiveness, these features have become complex and lengthier, requiring more storage space and computational power for matching. In this paper, an attempt is made toward reducing the length of these feature descriptors while maintaining a similar accuracy in pose estimation. Elimination of feature indices based on variance analysis on feature column sets is proposed and experimented in this paper. The features with reduced descriptor length are applied over the 3D-2D visual odometry pipeline and experimented on KITTI dataset for evaluating its efficacy. The proposed scheme of variance-based descriptor length reduction is found to reduce the overall time taken by the motion estimation framework while estimating the transformation with similar accuracy as that with full-length feature vector.

...read more

Topics: Visual odometry (63%), Feature (computer vision) (59%), Feature vector (57%) ...read more
References
  More

Journal ArticleDOI: 10.1145/358669.358692
Martin A. Fischler1, Robert C. Bolles1Institutions (1)
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

...read more

Topics: RANSAC (69%), Smoothing (55%), Image processing (54%) ...read more

20,503 Citations


Open accessBook ChapterDOI: 10.1007/11744023_32
Herbert Bay1, Tinne Tuytelaars2, Luc Van Gool1Institutions (2)
07 May 2006-
Abstract: In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF's strong performance.

...read more

  • Fig. 2. Left: Detected interest points for a Sunflower field. This kind of scenes shows clearly the nature of the features from Hessian-based detectors. Middle: Haar wavelet types used for SURF. Right: Detail of the Graffiti scene showing the size of the descriptor window at different scales.
    Fig. 2. Left: Detected interest points for a Sunflower field. This kind of scenes shows clearly the nature of the features from Hessian-based detectors. Middle: Haar wavelet types used for SURF. Right: Detail of the Graffiti scene showing the size of the descriptor window at different scales.
  • Fig. 5. An example image from the reference set (left) and the test set (right). Note the difference in viewpoint and colours.
    Fig. 5. An example image from the reference set (left) and the test set (right). Note the difference in viewpoint and colours.
  • Fig. 6. Repeatability score for image sequences, from left to right and top to bottom, Wall and Graffiti (Viewpoint Change), Leuven (Lighting Change) and Boat (Zoom and Rotation)
    Fig. 6. Repeatability score for image sequences, from left to right and top to bottom, Wall and Graffiti (Viewpoint Change), Leuven (Lighting Change) and Boat (Zoom and Rotation)
  • Table 1. Thresholds, number of detected points and calculation time for the detectors in our comparison. (First image of Graffiti scene, 800 × 640).
    Table 1. Thresholds, number of detected points and calculation time for the detectors in our comparison. (First image of Graffiti scene, 800 × 640).
  • Table 2. Computation times for the joint detector - descriptor implementations, tested on the first image of the Graffiti sequence. The thresholds are adapted in order to detect the same number of interest points for all methods. These relative speeds are also representative for other images.
    Table 2. Computation times for the joint detector - descriptor implementations, tested on the first image of the Graffiti sequence. The thresholds are adapted in order to detect the same number of interest points for all methods. These relative speeds are also representative for other images.
  • + 4

Topics: Scale-invariant feature transform (57%), GLOH (56%), Interest point detection (54%) ...read more

12,404 Citations


Proceedings ArticleDOI: 10.1109/CVPR.2012.6248074
Andreas Geiger1, Philip Lenz1, Raquel Urtasun2Institutions (2)
16 Jun 2012-
Abstract: Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti

...read more

Topics: Visual odometry (61%), Stereo cameras (58%), Optical flow (53%) ...read more

7,520 Citations


Open accessJournal ArticleDOI: 10.1177/0278364913491297
Abstract: We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations, and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets, and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.

...read more

  • Fig. 1. Recording Platform. Our VW Passat station wagon is equipped with four video cameras (two color and two grayscale cameras), a rotating 3D laser scanner and a combined GPS/IMU inertial navigation system.
    Fig. 1. Recording Platform. Our VW Passat station wagon is equipped with four video cameras (two color and two grayscale cameras), a rotating 3D laser scanner and a combined GPS/IMU inertial navigation system.
  • Fig. 10. Egomotion, Sequence Count and Length. This figure show (from-left-to-right) the egomotion (velocity and acceleration) of our recording platform for the whole dataset. Note that we excluded sequences with a purely static observer from these statistics. The length of the available sequences is shown as a histogram counting the number of frames per sequence. The rightmost figure shows the number of frames/images per scene category.
    Fig. 10. Egomotion, Sequence Count and Length. This figure show (from-left-to-right) the egomotion (velocity and acceleration) of our recording platform for the whole dataset. Note that we excluded sequences with a purely static observer from these statistics. The length of the available sequences is shown as a histogram counting the number of frames per sequence. The rightmost figure shows the number of frames/images per scene category.
  • Fig. 9. Number of Object Labels per Class and Image. This figure shows how often an object occurs in an image. Since our labeling efforts focused on cars and pedestrians, these are the most predominant classes here.
    Fig. 9. Number of Object Labels per Class and Image. This figure shows how often an object occurs in an image. Since our labeling efforts focused on cars and pedestrians, these are the most predominant classes here.
  • Fig. 4. Structure of the provided Zip-Files and their location within a global file structure that stores all KITTI sequences. Here, ’date’ and ’drive’ are placeholders, and ’image 0x’ refers to the 4 video camera streams.
    Fig. 4. Structure of the provided Zip-Files and their location within a global file structure that stores all KITTI sequences. Here, ’date’ and ’drive’ are placeholders, and ’image 0x’ refers to the 4 video camera streams.
  • Fig. 3. Sensor Setup. This figure illustrates the dimensions and mounting positions of the sensors (red) with respect to the vehicle body. Heights above ground are marked in green and measured with respect to the road surface. Transformations between sensors are shown in blue.
    Fig. 3. Sensor Setup. This figure illustrates the dimensions and mounting positions of the sensors (red) with respect to the vehicle body. Heights above ground are marked in green and measured with respect to the road surface. Transformations between sensors are shown in blue.
  • + 5

Topics: Stereo cameras (58%), Object detection (54%), Inertial navigation system (51%) ...read more

4,713 Citations


Open accessJournal ArticleDOI: 10.1016/0004-3702(95)00022-4
Abstract: This paper proposes a robust approach to image matching by exploiting the only available geometric constraint, namely, the epipolar constraint. The images are uncalibrated, namely the motion between them and the camera parameters are not known. Thus, the images can be taken by different cameras or a single camera at different time instants. If we make an exhaustive search for the epipolar geometry, the complexity is prohibitively high. The idea underlying our approach is to use classical techniques (correlation and relaxation methods in our particular implementation) to find an initial set of matches, and then use a robust technique—the Least Median of Squares (LMedS)—to discard false matches in this set. The epipolar geometry can then be accurately estimated using a meaningful image criterion. More matches are eventually found, as in stereo matching, by using the recovered epipolar geometry. A large number of experiments have been carried out, and very good results have been obtained. Regarding the relaxation technique, we define a new measure of matching support, which allows a higher tolerance to deformation with respect to rigid transformations in the image plane and a smaller contribution for distant matches than for nearby ones. A new strategy for updating matches is developed, which only selects those matches having both high matching support and low matching ambiguity. The update strategy is different from the classical “winner-take-all”, which is easily stuck at a local minimum, and also from “loser-take-nothing”, which is usually very slow. The proposed algorithm has been widely tested and works remarkably well in a scene with many repetitive patterns.

...read more

Topics: Epipolar geometry (69%), Fundamental matrix (computer vision) (60%), Image processing (53%) ...read more

1,540 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20201
20191
Network Information
Related Papers (5)