scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A layered stereo matching algorithm using image segmentation and global visibility constraints

01 May 2005-Isprs Journal of Photogrammetry and Remote Sensing (Elsevier)-Vol. 59, Iss: 3, pp 128-150
TL;DR: Qualitative and quantitative results obtained for benchmark image pairs show that the proposed algorithm outperforms most state-of-the-art matching algorithms currently listed on the Middlebury stereo evaluation website.
Abstract: This work describes a stereo algorithm that takes advantage of image segmentation, assuming that disparity varies smoothly inside a segment of homogeneous colour and depth discontinuities coincide with segment borders. Image segmentation allows our method to generate correct disparity estimates in large untextured regions and precisely localize depth boundaries. The disparity inside a segment is represented by a planar equation. To derive the plane model, an initial disparity map is generated. We use a window-based approach that exploits the results of segmentation. The size of the match window is chosen adaptively. A segment's planar model is then derived by robust least squared error fitting using the initial disparity map. In a layer extraction step, disparity segments that are found to be similar according to a plane dissimilarity measurement are combined to form a single robust layer. We apply a modified mean-shift algorithm to extract clusters of similar disparity segments. Segments of the same cluster build a layer, the plane parameters of which are computed from its spatial extent using the initial disparity map. We then optimize the assignment of segments to layers using a global cost function. The quality of the disparity map is measured by warping the reference image to the second view and comparing it with the real image. Z-buffering enforces visibility and allows the explicit detection of occlusions. The cost function measures the colour dissimilarity between the warped and real views, and penalizes occlusions and neighbouring segments that are assigned to different layers. Since the problem of finding the assignment of segments to layers that minimizes this cost function is N ⁢ P -complete, an efficient greedy algorithm is applied to find a local minimum. Layer extraction and assignment are alternately applied. Qualitative and quantitative results obtained for benchmark image pairs show that the proposed algorithm outperforms most state-of-the-art matching algorithms currently listed on the Middlebury stereo evaluation website. The technique achieves particularly good results in areas with depth discontinuities and related occlusions, where missing stereo information is substituted from surrounding regions. Furthermore, we apply the algorithm to a self-recorded image set and show 3D visualizations of the derived results.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper describes the Semi-Global Matching (SGM) stereo method, which uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images and demonstrates a tolerance against a wide range of radiometric transformations.
Abstract: This paper describes the semiglobal matching (SGM) stereo method. It uses a pixelwise, mutual information (Ml)-based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions. The discussion also addresses occlusion detection, subpixel refinement, and multibaseline matching. Additionally, postprocessing steps for removing outliers, recovering from specific problems of structured environments, and the interpolation of gaps are presented. Finally, strategies for processing almost arbitrarily large images and fusion of disparity images using orthographic projection are proposed. A comparison on standard stereo images shows that SGM is among the currently top-ranked algorithms and is best, if subpixel accuracy is considered. The complexity is linear to the number of pixels and disparity range, which results in a runtime of just 1-2 seconds on typical test images. An in depth evaluation of the Ml-based matching cost demonstrates a tolerance against a wide range of radiometric transformations. Finally, examples of reconstructions from huge aerial frame and pushbroom images demonstrate that the presented ideas are working well on practical problems.

3,302 citations


Additional excerpts

  • ...Ç...

    [...]

Proceedings ArticleDOI
20 Aug 2006
TL;DR: A novel stereo matching algorithm is proposed that utilizes color segmentation on the reference image and a self-adapting matching score that maximizes the number of reliable correspondences that is more robust to outliers.
Abstract: A novel stereo matching algorithm is proposed that utilizes color segmentation on the reference image and a self-adapting matching score that maximizes the number of reliable correspondences. The scene structure is modeled by a set of planar surface patches which are estimated using a new technique that is more robust to outliers. Instead of assigning a disparity value to each pixel, a disparity plane is assigned to each segment. The optimal disparity plane labeling is approximated by applying belief propagation. Experimental results using the Middlebury stereo test bed demonstrate the superior performance of the proposed method

969 citations


Cites background or methods from "A layered stereo matching algorithm..."

  • ...Stereo matching continues to be an active research area as is proven by a large number of recent publications dedicated to this topic [1, 2, 4, 6, 9, 12]....

    [...]

  • ...Recently, segment-based methods [1, 2, 4, 6, 11] have attracted attention due to their good performance....

    [...]

  • ...Fourth, an optimal disparity plane assignment (optimal labeling) is approximated using greedy [1, 11] or graph cuts [2, 4, 6] optimization....

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper proposes a matching network which is able to produce very accurate results in less than a second of GPU computation, and exploits a product layer which simply computes the inner product between the two representations of a siamese architecture.
Abstract: In the past year, convolutional neural networks have been shown to perform extremely well for stereo estimation. However, current architectures rely on siamese networks which exploit concatenation followed by further processing layers, requiring a minute of GPU computation per image pair. In contrast, in this paper we propose a matching network which is able to produce very accurate results in less than a second of GPU computation. Towards this goal, we exploit a product layer which simply computes the inner product between the two representations of a siamese architecture. We train our network by treating the problem as multi-class classification, where the classes are all possible disparities. This allows us to get calibrated scores, which result in much better matching performance when compared to existing approaches.

822 citations


Cites background from "A layered stereo matching algorithm..."

  • ...They have a long history, dating back to [2] and were shown to be very successful on the Middleburry benchmark [22, 15, 3, 24] as well as on KITTI [25, 26, 27]....

    [...]

Proceedings ArticleDOI
01 Jan 2011
TL;DR: The method reconstructs highly slanted surfaces and achieves impressive disparity details with sub-pixel precision and allows for explicit treatment of occlusions and can handle large untextured regions.
Abstract: Common local stereo methods match support windows at integer-valued disparities. The implicit assumption that pixels within the support region have constant disparity does not hold for slanted surfaces and leads to a bias towards reconstructing frontoparallel surfaces. This work overcomes this bias by estimating an individual 3D plane at each pixel onto which the support region is projected. The major challenge of this approach is to find a pixel’s optimal 3D plane among all possible planes whose number is infinite. We show that an ideal algorithm to solve this problem is PatchMatch [1] that we extend to find an approximate nearest neighbor according to a plane. In addition to Patch-Match’s spatial propagation scheme, we propose (1) view propagation where planes are propagated among left and right views of the stereo pair and (2) temporal propagation where planes are propagated from preceding and consecutive frames of a video when doing temporal stereo. Adaptive support weights are used in matching cost aggregation to improve results at disparity borders. We also show that our slanted support windows can be used to compute a cost volume for global stereo methods, which allows for explicit treatment of occlusions and can handle large untextured regions. In the results we demonstrate that our method reconstructs highly slanted surfaces and achieves impressive disparity details with sub-pixel precision. In the Middlebury table, our method is currently top-performer among local methods and takes rank 2 among approximately 110 competitors if sub-pixel precision is considered.

687 citations


Cites methods from "A layered stereo matching algorithm..."

  • ..., [2, 12]) extract several planes using an initial disparity map in the first step....

    [...]

Journal ArticleDOI
01 Apr 2011
TL;DR: This paper describes efficient coding methods for video and depth data, and synthesis methods are presented, which mitigate errors from depth estimation and coding, for the generation of views.
Abstract: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.

420 citations


Cites background from "A layered stereo matching algorithm..."

  • ...Usually, depth estimation algorithms attempt to match corresponding signal components in two or more original cameras, using a matching function [44] with different area support and size [ 3 ]....

    [...]

References
More filters
Journal ArticleDOI
ZhenQiu Zhang1
TL;DR: A flexible technique to easily calibrate a camera that only requires the camera to observe a planar pattern shown at a few (at least two) different orientations is proposed and advances 3D computer vision one more step from laboratory environments to real world use.
Abstract: We propose a flexible technique to easily calibrate a camera. It only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and real data have been used to test the proposed technique and very good results have been obtained. Compared with classical techniques which use expensive equipment such as two or three orthogonal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision one more step from laboratory environments to real world use.

13,200 citations


"A layered stereo matching algorithm..." refers methods in this paper

  • ...We calibrated the cameras using the method described in Zhang (2000) and transformed the images into epipolar geometry....

    [...]

  • ...A similar approach was taken by Zhang and Kambhamettu (2002). We follow their idea to measure the reliability of a segment’s disparity information by the density of valid points....

    [...]

Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations


"A layered stereo matching algorithm..." refers background or methods in this paper

  • ...The reader is referred to Scharstein and Szeliski (2002) for an extensive survey of prior work....

    [...]

  • ...…the graph-based method of Kolmogorov and Zabih (2002), a belief propagation algorithm (Sun et al., 2003), an algorithm using dynamic programming (Birchfield and Tomasi, 1999b) and an implementation of sum-ofsquared-differences (SSD) by Scharstein and Szeliski (2002) that uses shiftable windows....

    [...]

  • ...We demonstrated the performance of the proposed algorithm using the test bed of Scharstein and Szeliski (2002)....

    [...]

  • ...We evaluated our algorithm using the test bed proposed by Scharstein and Szeliski (2002)....

    [...]

  • ...For quantitative evaluation, Scharstein and Szeliski (2002) measure the percentage of unoccluded pixels whose absolute disparity error is greater than one....

    [...]

Journal ArticleDOI
TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
Abstract: Many tasks in computer vision involve assigning a label (such as disparity) to every pixel. A common constraint is that the labels should vary smoothly almost everywhere while preserving sharp discontinuities that may exist, e.g., at object boundaries. These tasks are naturally stated in terms of energy minimization. The authors consider a wide class of energies with various smoothness constraints. Global minimization of these energy functions is NP-hard even in the simplest discontinuity-preserving case. Therefore, our focus is on efficient approximation algorithms. We present two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves. These moves can simultaneously change the labels of arbitrarily large sets of pixels. In contrast, many standard algorithms (including simulated annealing) use small moves where only one pixel changes its label at a time. Our expansion algorithm finds a labeling within a known factor of the global minimum, while our swap algorithm handles more general energy functions. Both of these algorithms allow important cases of discontinuity preserving energies. We experimentally demonstrate the effectiveness of our approach for image restoration, stereo and motion. On real data with ground truth, we achieve 98 percent accuracy.

7,413 citations

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This paper proposes two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed, and generates a labeling such that there is no expansion move that decreases the energy.
Abstract: In this paper we address the problem of minimizing a large class of energy functions that occur in early vision. The major restriction is that the energy function's smoothness term must only involve pairs of pixels. We propose two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed. The first move we consider is an /spl alpha/-/spl beta/-swap: for a pair of labels /spl alpha/,/spl beta/, this move exchanges the labels between an arbitrary set of pixels labeled a and another arbitrary set labeled /spl beta/. Our first algorithm generates a labeling such that there is no swap move that decreases the energy. The second move we consider is an /spl alpha/-expansion: for a label a, this move assigns an arbitrary set of pixels the label /spl alpha/. Our second algorithm, which requires the smoothness term to be a metric, generates a labeling such that there is no expansion move that decreases the energy. Moreover, this solution is within a known factor of the global minimum. We experimentally demonstrate the effectiveness of our approach on image restoration, stereo and motion.

3,199 citations


"A layered stereo matching algorithm..." refers methods in this paper

  • ...Boykov et al. (2001) present an efficient greedy algorithm based on graph cuts....

    [...]

  • ...(Hirschmüller et al., 2002; Mühlmann et al., 2002). In our work, too, we take advantage of the efficient incremental computation of the DSI in the generation of the initial disparity map. Furthermore, we use different window sizes starting with smaller windows in order to preserve fine image details wherever possible. Disparity estimates for less-textured regions are then derived by using larger window sizes. Cooperative approaches (Zitnick and Kanade, 2000; Zhang and Kambhamettu, 2002; Mayer, 2003) locally compute matching scores using match windows. Nevertheless, they show “global behaviour” by iteratively refining the correlation scores using the uniqueness and continuity constraints. Zhang and Kambhamettu (2002) take advantage of image segmentation in the calculation of the initial matching scores. Furthermore, they exploit the results of the segmentation in their choice of local support area, preventing the support area from overlapping a depth discontinuity. Similarly to Zhang and Kambhamettu (2002), we use the output of image segmentation to propagate reliable disparity information inside a segment....

    [...]

  • ...(Hirschmüller et al., 2002; Mühlmann et al., 2002). In our work, too, we take advantage of the efficient incremental computation of the DSI in the generation of the initial disparity map. Furthermore, we use different window sizes starting with smaller windows in order to preserve fine image details wherever possible. Disparity estimates for less-textured regions are then derived by using larger window sizes. Cooperative approaches (Zitnick and Kanade, 2000; Zhang and Kambhamettu, 2002; Mayer, 2003) locally compute matching scores using match windows. Nevertheless, they show “global behaviour” by iteratively refining the correlation scores using the uniqueness and continuity constraints. Zhang and Kambhamettu (2002) take advantage of image segmentation in the calculation of the initial matching scores....

    [...]

Proceedings ArticleDOI
18 Jun 2003
TL;DR: A method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light that does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors.
Abstract: Progress in stereo algorithm performance is quickly outpacing the ability of existing stereo data sets to discriminate among the best-performing algorithms, motivating the need for more challenging scenes with accurate ground truth information. This paper describes a method for acquiring high-complexity stereo image pairs with pixel-accurate correspondence information using structured light. Unlike traditional range-sensing approaches, our method does not require the calibration of the light sources and yields registered disparity maps between all pairs of cameras and illumination projectors. We present new stereo data sets acquired with our method and demonstrate their suitability for stereo algorithm evaluation. Our results are available at http://www.middlebury.edu/stereo/.

1,840 citations


"A layered stereo matching algorithm..." refers methods in this paper

  • ...We further evaluated the proposed algorithm on a more complex scene using the Teddy test set taken from Scharstein and Szeliski (2003)....

    [...]

  • ...Furthermore, we present disparity maps for a more complex scene that was taken from Scharstein and Szeliski (2003) and for a self-recorded stereo pair....

    [...]

  • ...Furthermore, we applied our method to a more complex image pair taken from Scharstein and Szeliski (2003) and to self-recorded data....

    [...]

  • ...16c. Pixels for which the method of Scharstein and Szeliski (2003) fails to produce the ground truth are coloured black....

    [...]