scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Piecewise Rigid Scene Flow

01 Dec 2013-pp 1377-1384
TL;DR: A novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments is introduced that achieves leading performance levels, exceeding competing3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixel-to-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial super pixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: A novel model and dataset for 3D scene flow estimation with an application to autonomous driving by representing each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object.
Abstract: This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which cannot be handled by existing methods.

1,918 citations


Cites background or methods from "Piecewise Rigid Scene Flow"

  • ...Finally, we also include the results of Vogel’s piece-wise rigid scene flow (PRSF) approach [37]....

    [...]

  • ...Similarly, the more challenging KITTI benchmark [12] has been leveraged for evaluation in [35, 37]....

    [...]

  • ...While a number of methods have recently demonstrated impressive performance in this context [25,35,37,39], none of them explicitly takes advantage of the fact that such scenes can often be considered as a small collection of independently moving 3D objects which Figure 1: Scene Flow Results on the proposed Dataset....

    [...]

  • ...[35,37] proposed a slantedplane model which assigns each pixel to an image segment and each segment to one of several rigidly moving 3D plane proposals, thus casting the task as a discrete optimization problem which can be solved using α-expansion and QPBO [26]....

    [...]

  • ...In contrast to [35, 37], we model the 3D structure of the scene as a collection of planar patches and the 3D motion of these patches by a small number of rigidly moving objects which we optimize jointly....

    [...]

Proceedings ArticleDOI
TL;DR: In this article, a large-scale synthetic stereo video dataset is proposed to enable training and evaluation of optical flow estimation with a convolutional network and disparity estimation with CNNs.
Abstract: Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluating scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.

1,759 citations

Journal Article
TL;DR: In this paper, the first stage of many stereo algorithms, matching cost computation, is addressed by learning a similarity measure on small image patches using a convolutional neural network, and then a series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter.
Abstract: We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for this task: one tuned for speed, the other for accuracy. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.

860 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work trains a convolutional neural network to predict how well two image patches match and uses it to compute the stereo matching cost, which achieves an error rate of 2.61% on the KITTI stereo dataset.
Abstract: We present a method for extracting depth information from a rectified image pair. We train a convolutional neural network to predict how well two image patches match and use it to compute the stereo matching cost. The cost is refined by cross-based cost aggregation and semiglobal matching, followed by a left-right consistency check to eliminate errors in the occluded regions. Our stereo method achieves an error rate of 2.61% on the KITTI stereo dataset and is currently (August 2014) the top performing method on this dataset.

762 citations


Additional excerpts

  • ...[17] 4....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: A new optimization algorithm is proposed for the authors' SLIC-like objective which preserves connecteness of image segments and exploits shape regularization in the form of boundary length and can be achieved an order of magnitude faster than competing approaches.
Abstract: In this paper we propose a slanted plane model for jointly recovering an image segmentation, a dense depth estimate as well as boundary labels (such as occlusion boundaries) from a static scene given two frames of a stereo pair captured from a moving vehicle. Towards this goal we propose a new optimization algorithm for our SLIC-like objective which preserves connecteness of image segments and exploits shape regularization in the form of boundary length. We demonstrate the performance of our approach in the challenging stereo and flow KITTI benchmarks and show superior results to the state-of-the-art. Importantly, these results can be achieved an order of magnitude faster than competing approaches.

368 citations


Cites background or methods from "Piecewise Rigid Scene Flow"

  • ...Unfortunately, these slanted plane methods have involved time-consuming optimization algorithms (several minutes per frame) such as particle belief propagation [30,31] or algorithms based on plane proposals with fusion moves and iterated cut-based segmentations [26]....

    [...]

  • ...Importantly, it does so at least an order of magnitude faster than existing slanted plane methods [30,31,26], while outperforming the state-of-the-art on the challenging KITTI benchmark [9]....

    [...]

  • ...Current leading techniques are slanted plane methods, which assume that the 3D scene is piece-wise planar and the motion is rigid or piece-wise rigid [30,31,26]....

    [...]

  • ...[26] handles scenes with moving objects using a segmentation of a reference image with both a planar surface and a six dimensional rigid motion associated with each image segment....

    [...]

  • ...Previous work employed particle methods to solve continuous-discrete problems by forming a sequence of discrete MRFs, which can be minimized using message passing algorithms [30,31] or fusion moves with QPBO [26]....

    [...]

References
More filters
Proceedings ArticleDOI
16 Jun 2012
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Abstract: Today, visual recognition systems are still rarely employed in robotics applications. Perhaps one of the main reasons for this is the lack of demanding benchmarks that mimic such scenarios. In this paper, we take advantage of our autonomous driving platform to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection. Our recording platform is equipped with four high resolution video cameras, a Velodyne laser scanner and a state-of-the-art localization system. Our benchmarks comprise 389 stereo and optical flow image pairs, stereo visual odometry sequences of 39.2 km length, and more than 200k 3D object annotations captured in cluttered scenarios (up to 15 cars and 30 pedestrians are visible per image). Results from state-of-the-art algorithms reveal that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world. Our goal is to reduce this bias by providing challenging benchmarks with novel difficulties to the computer vision community. Our benchmarks are available online at: www.cvlibs.net/datasets/kitti

11,283 citations


"Piecewise Rigid Scene Flow" refers methods in this paper

  • ...For quantitative evaluation we test our method on twoframe stereo pairs from the KITTI dataset [8]....

    [...]

Journal ArticleDOI
TL;DR: This paper describes the Semi-Global Matching (SGM) stereo method, which uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images and demonstrates a tolerance against a wide range of radiometric transformations.
Abstract: This paper describes the semiglobal matching (SGM) stereo method. It uses a pixelwise, mutual information (Ml)-based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions. The discussion also addresses occlusion detection, subpixel refinement, and multibaseline matching. Additionally, postprocessing steps for removing outliers, recovering from specific problems of structured environments, and the interpolation of gaps are presented. Finally, strategies for processing almost arbitrarily large images and fusion of disparity images using orthographic projection are proposed. A comparison on standard stereo images shows that SGM is among the currently top-ranked algorithms and is best, if subpixel accuracy is considered. The complexity is linear to the number of pixels and disparity range, which results in a runtime of just 1-2 seconds on typical test images. An in depth evaluation of the Ml-based matching cost demonstrates a tolerance against a wide range of radiometric transformations. Finally, examples of reconstructions from huge aerial frame and pushbroom images demonstrate that the presented ideas are working well on practical problems.

3,302 citations


"Piecewise Rigid Scene Flow" refers background or methods in this paper

  • ...As baselines we use our implementations of three methods: L1-regularized 3D scene flow (LSF, [3]); locally rigid 3D scene flow (Rig, [22]); and independently derived 2D stereo (semi-global matching [9]) and optical flow (census data term, total generalized variation [25] regularization), indicated by (S+F)....

    [...]

  • ...Yet, despite significant progress in both stereo [4, 9, 26] and 2D optical flow estimation [5, 16, 17], existing 3D scene flow techniques [e....

    [...]

Book ChapterDOI
11 May 2004
TL;DR: By proving that this scheme implements a coarse-to-fine warping strategy, this work gives a theoretical foundation for warping which has been used on a mainly experimental basis so far and demonstrates its excellent robustness under noise.
Abstract: We study an energy functional for computing optical flow that combines three assumptions: a brightness constancy assumption, a gradient constancy assumption, and a discontinuity-preserving spatio-temporal smoothness constraint. In order to allow for large displacements, linearisations in the two data terms are strictly avoided. We present a consistent numerical scheme based on two nested fixed point iterations. By proving that this scheme implements a coarse-to-fine warping strategy, we give a theoretical foundation for warping which has been used on a mainly experimental basis so far. Our evaluation demonstrates that the novel method gives significantly smaller angular errors than previous techniques for optical flow estimation. We show that it is fairly insensitive to parameter variations, and we demonstrate its excellent robustness under noise.

2,902 citations


"Piecewise Rigid Scene Flow" refers background in this paper

  • ...Yet, despite significant progress in both stereo [4, 9, 26] and 2D optical flow estimation [5, 16, 17], existing 3D scene flow techniques [e....

    [...]

Book ChapterDOI
07 May 1994
TL;DR: A new approach to the correspondence problem that makes use of non-parametric local transforms as the basis for correlation, which can result in improved performance near object boundaries when compared with conventional methods such as normalized correlation.
Abstract: We propose a new approach to the correspondence problem that makes use of non-parametric local transforms as the basis for correlation. Non-parametric local transforms rely on the relative ordering of local intensity values, and not on the intensity values themselves. Correlation using such transforms can tolerate a significant number of outliers. This can result in improved performance near object boundaries when compared with conventional methods such as normalized correlation. We introduce two non-parametric local transforms: the rank transform, which measures local intensity, and the census transform, which summarizes local image structure. We describe some properties of these transforms, and demonstrate their utility on both synthetic and real data.

1,952 citations


"Piecewise Rigid Scene Flow" refers background or methods in this paper

  • ...In detail, we scale the Hamming distances by 1/24 for the census data term, see [28], and set λ = 10μ, γ = 1, κ = 1....

    [...]

  • ...Alternative choices include the more robust census transform [28]....

    [...]

  • ...We address the challenging lighting conditions using the census transform [28] over a 7×7 neighborhood to measure the data fidelity ρ, which has been shown to cope well with complex outdoor lighting [14]....

    [...]

Journal ArticleDOI
TL;DR: A system for representing moving images with sets of overlapping layers that is more flexible than standard image transforms and can capture many important properties of natural image sequences.
Abstract: We describe a system for representing moving images with sets of overlapping layers. Each layer contains an intensity map that defines the additive values of each pixel, along with an alpha map that serves as a mask indicating the transparency. The layers are ordered in depth and they occlude each other in accord with the rules of compositing. Velocity maps define how the layers are to be warped over time. The layered representation is more flexible than standard image transforms and can capture many important properties of natural image sequences. We describe some methods for decomposing image sequences into layers using motion analysis, and we discuss how the representation may be used for image coding and other applications. >

1,360 citations


"Piecewise Rigid Scene Flow" refers background in this paper

  • ...In the context of stereo disparity and optical flow, explicit modeling of discontinuities by means of segmentation or layer-based formulations has a long history [23] and has recently gained renewed attention: Bleyer et al....

    [...]