scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

View synthesis based on Conditional Random Fields and graph cuts

TL;DR: A novel method to synthesize intermediate views from two stereo images and disparity maps that is robust to errors in disparity maps and provides an explicit probabilistic model to select the best candidate for each disoccluded pixel efficiently with Conditional Random Fields and graph-cuts is proposed.
Abstract: We propose a novel method to synthesize intermediate views from two stereo images and disparity maps that is robust to errors in disparity maps. The proposed method computes a placement matrix from each disparity map that can be used to correct errors when warping pixels from reference view to virtual view. The second contribution is a new hole filling method that uses depth, edge, and segmentation information to aid the process of filling disoccluded pixels. The proposed method selects pixels from segmented regions that are connected to the disoccluded region as candidates to fill the disoccluded pixels. We also provide an explicit probabilistic model to select the best candidate for each disoccluded pixel efficiently with Conditional Random Fields (CRFs) and graph-cuts.
Citations
More filters
Proceedings ArticleDOI
16 May 2011
TL;DR: An effective virtual view synthesis approach, which utilizes the technology of depth-image-based rendering (DIBR), which is effective and reliable in both of subjective and objective evaluations.
Abstract: We propose an effective virtual view synthesis approach, which utilizes the technology of depth-image-based rendering (DIBR). In our scheme, two reference color images and their associated depth maps are used to generate the arbitrary virtual viewpoint. Firstly, the main and auxiliary viewpoint images are warped to the virtual viewpoint. After that, the cracks and error points are removed to enhance the image quality. Then, we complement the disocclusions of the virtual viewpoint image warped from the main viewpoint with the help of the auxiliary viewpoint. In order to reduce the color incontinuity of the virtual view, the brightness of the two reference viewpoint images are adjusted. Finally, the holes are filled by the depth-assistance asymmetric dilation inpainting method. Simulations show that the view synthesis approach is effective and reliable in both of subjective and objective evaluations.

35 citations


Cites background from "View synthesis based on Conditional..."

  • ...Another critical problem is that disocclusion areas will appear in the virtual views, so how to fill the disocclusions is an active research aspect [4, 5]....

    [...]

  • ...In order to solve this problem, view synthesis has been studied extensively in many research laboratories and institutes [3, 4, 5, 6]....

    [...]

Journal ArticleDOI
TL;DR: A novel superpixel extraction algorithm using a higher order energy optimization framework is proposed in this paper that generates better results with well-aligned boundaries and homogeneous effects than the existing superpixel algorithms.
Abstract: A novel superpixel extraction algorithm using a higher order energy optimization framework is proposed in this paper. We first adopt the $k$ -means clustering technique to quickly get an initial superpixel result. Then a higher order energy function is employed to optimize and refine these initial superpixels. We use a more general higher order energy function that includes a first-order data term, a second-order smoothness term, and a higher order term. The presegments are employed to provide the prior information of sufficient edges and segment regions for our higher order energy term. According to the texture measurement in different local regions, our algorithm adaptively computes the proper ratios of different energy terms to obtain a better superpixel performance. The experimental results demonstrate that our method using the higher order energy generates better results with well-aligned boundaries and homogeneous effects than the existing superpixel algorithms.

29 citations


Cites methods from "View synthesis based on Conditional..."

  • ...SUPERPIXELS [2] group together pixels of similar characteristics and are widely used in a variety of image processing algorithms such as segmentation [19], [20], [30], [31], [34], synthesis [8], [16], saliency detection [23], [25], [34], and tracking [4]....

    [...]

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper presents a new method for view synthesis that is both fast and accurate and has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.
Abstract: From a rectified stereo image pair, the task of view synthesis is to generate images from any viewpoint along the baseline. The main difficulty of the problem is how to fill occluded regions. In this paper, we present a new method for view synthesis that is both fast and accurate. Occlusions are filled using color and disparity information to produce consistent pixel estimates. Results are comparable to current state-of-the-art methods in terms of objective measures while computation time is drastically reduced. This work has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.

27 citations


Cites background or methods or result from "View synthesis based on Conditional..."

  • ...The PSNR value is only 1 dB lower than the result in [2], but has the same SSIM value....

    [...]

  • ...The proposed method gives comparable performance to that of [2] while providing a factor of 6 increase in speed....

    [...]

  • ...In comparing our results with those in [2], the same images and disparity maps (all from the Middlebury Database) and computing environment were used....

    [...]

  • ...The PSNR values in dB and the SSIM indices are given in Table 1, compared with those reported in [2]....

    [...]

  • ...The proposed method was implemented entirely in MATLAB, while the graph cuts and mean shift portion of the code in [2] was done in C++....

    [...]

Proceedings ArticleDOI
29 Oct 2014
TL;DR: Experimental results show that the proposed method significantly improves the inter-view consistency for multiview images and depth maps, compared to those of previous methods.
Abstract: This paper proposes a new inter-view consistent hole filling method in view extrapolation for multi-view image generation. In stereopsis, inter-view consistency regarding structure, color, and luminance is one of the crucial factors that affect the overall viewing quality of three-dimensional image contents. In particular, the inter-view inconsistency could induce visual stress on the human visual system. To ensure the inter-view consistency, the proposed method suggests a hole filling method in an order from the nearest to farthest view to the reference view by propagating the filled color information in the preceding view. In addition, a novel depth map filling method is incorporated to achieve the inter-view consistency. Experimental results show that the proposed method significantly improves the inter-view consistency for multiview images and depth maps, compared to those of previous methods.

27 citations


Additional excerpts

  • ...Index Terms— Inter-view consistency, view synthesis, multi-view image, hole filling....

    [...]

Journal ArticleDOI
TL;DR: A novel stereo view synthesis algorithm that is highly accurate with respect to inter-view consistency, thus to enabling stereo contents to be viewed on the autostereoscopic displays and the implementation of a simplified GPU accelerated version of the approach and its implementation in CUDA.
Abstract: In this paper we present a novel stereo view synthesis algorithm that is highly accurate with respect to inter-view consistency, thus to enabling stereo contents to be viewed on the autostereoscopic displays. The algorithm finds identical occluded regions within each virtual view and aligns them together to extract a surrounding background layer. The background layer for each occluded region is then used with an exemplar based inpainting method to synthesize all virtual views simultaneously. Our algorithm requires the alignment and extraction of background layers for each occluded region; however, these two steps are done efficiently with lower computational complexity in comparison to previous approaches using the exemplar based inpainting algorithms. Thus, it is more efficient than existing algorithms that synthesize one virtual view at a time. This paper also describes the implementation of a simplified GPU accelerated version of the approach and its implementation in CUDA. Our CUDA method has sublinear complexity in terms of the number of views that need to be generated, which makes it especially useful for generating content for autostereoscopic displays that require many views to operate. An objective of our work is to allow the user to change depth and viewing perspective on the fly. Therefore, to further accelerate the CUDA variant of our approach, we present a modified version of our method to warp the background pixels from reference views to a middle view to recover background pixels. We then use an exemplar based inpainting method to fill in the occluded regions. We use warping of the foreground from the reference images and background from the filled regions to synthesize new virtual views on the fly. Our experimental results indicate that the simplified CUDA implementation decreases running time by orders of magnitude with negligible loss in quality.

21 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations


"View synthesis based on Conditional..." refers methods in this paper

  • ...To fill the disoccluded regions, we use [6] to segment the refined virtual view....

    [...]

  • ...In the initial step, color segments in the left IL and right IR reference images are extracted by [6]....

    [...]

Journal ArticleDOI
TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
Abstract: Many tasks in computer vision involve assigning a label (such as disparity) to every pixel. A common constraint is that the labels should vary smoothly almost everywhere while preserving sharp discontinuities that may exist, e.g., at object boundaries. These tasks are naturally stated in terms of energy minimization. The authors consider a wide class of energies with various smoothness constraints. Global minimization of these energy functions is NP-hard even in the simplest discontinuity-preserving case. Therefore, our focus is on efficient approximation algorithms. We present two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves. These moves can simultaneously change the labels of arbitrarily large sets of pixels. In contrast, many standard algorithms (including simulated annealing) use small moves where only one pixel changes its label at a time. Our expansion algorithm finds a labeling within a known factor of the global minimum, while our swap algorithm handles more general energy functions. Both of these algorithms allow important cases of discontinuity preserving energies. We experimentally demonstrate the effectiveness of our approach for image restoration, stereo and motion. On real data with ground truth, we achieve 98 percent accuracy.

7,413 citations


"View synthesis based on Conditional..." refers methods in this paper

  • ...We use the winner-take-all approach and a fast alpha-expansion graph cuts algorithm [7] to infer the final synthesized image....

    [...]

Proceedings ArticleDOI
TL;DR: Details of a system that allows for an evolutionary introduction of depth perception into the existing 2D digital TV framework are presented and a comparison with the classical approach of "stereoscopic" video is compared.
Abstract: This paper presents details of a system that allows for an evolutionary introduction of depth perception into the existing 2D digital TV framework. The work is part of the European Information Society Technologies (IST) project “Advanced Three-Dimensional Television System Technologies” (ATTEST), an activity, where industries, research centers and universities have joined forces to design a backwards-compatible, flexible and modular broadcast 3D-TV system. At the very heart of the described new concept is the generation and distribution of a novel data representation format, which consists of monoscopic color video and associated perpixel depth information. From these data, one or more “virtual” views of a real-world scene can be synthesized in real-time at the receiver side (i. e. a 3D-TV set-top box) by means of so-called depth-image-based rendering (DIBR) techniques. This publication will provide: (1) a detailed description of the fundamentals of this new approach on 3D-TV; (2) a comparison with the classical approach of “stereoscopic” video; (3) a short introduction to DIBR techniques in general; (4) the development of a specific DIBR algorithm that can be used for the efficient generation of high-quality “virtual” stereoscopic views; (5) a number of implementation details that are specific to the current state of the development; (6) research on the backwards-compatible compression and transmission of 3D imagery using state-of-the-art MPEG (Moving Pictures Expert Group) tools.

1,560 citations


"View synthesis based on Conditional..." refers methods in this paper

  • ...In [1], a Gaussian filter is used to smooth the disparity map in the preprocessing step to eliminate disocclusion pixels....

    [...]

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper has constructed a large number of stereo datasets with ground-truth disparities, and a subset of these datasets are used to learn the parameters of conditional random fields (CRFs) and presents experimental results illustrating the potential of this approach for automatically learning the Parameters of models with richer structure than standard hand-tuned MRF models.
Abstract: State-of-the-art stereo vision algorithms utilize color changes as important cues for object boundaries. Most methods impose heuristic restrictions or priors on disparities, for example by modulating local smoothness costs with intensity gradients. In this paper we seek to replace such heuristics with explicit probabilistic models of disparities and intensities learned from real images. We have constructed a large number of stereo datasets with ground-truth disparities, and we use a subset of these datasets to learn the parameters of conditional random fields (CRFs). We present experimental results illustrating the potential of our approach for automatically learning the parameters of models with richer structure than standard hand-tuned MRF models.

893 citations


"View synthesis based on Conditional..." refers methods in this paper

  • ...In this work, we use a lattice structured CRF and graph cuts minimization framework proposed in [4, 5] for stereo vision....

    [...]

  • ...In our experiment, the images from the Middlebury data set [4] are used to evaluate the proposed method....

    [...]