scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Image-based rendering using image-based priors

01 Jan 2003-pp 1176-1183
TL;DR: The paper's second contribution is to constrain the generated views to lie in the space of images whose texture statistics are those of the input images, which amounts to an image-based prior on the reconstruction which regularizes the solution, yielding realistic synthetic views.
Abstract: Given a set of images acquired from known viewpoints, we describe a method for synthesizing the image which would be seen from a new viewpoint. In contrast to existing techniques, which explicitly reconstruct the 3D geometry of the scene, we transform the problem to the reconstruction of colour rather than depth. This retains the benefits of geometric constraints, but projects out the ambiguities in depth estimation which occur in textureless regions. On the other hand, regularization is still needed in order to generate high-quality images. The paper's second contribution is to constrain the generated views to lie in the space of images whose texture statistics are those of the input images. This amounts to an image-based prior on the reconstruction which regularizes the solution, yielding realistic synthetic views. Examples are given of new view generation for cameras interpolated between the acquisition viewpoints - which enables synthetic steadicam stabilization of a sequence with a high level of realism.
Citations
More filters
Journal ArticleDOI
27 Jul 2009
TL;DR: This paper presents interactive image editing tools using a new randomized algorithm for quickly finding approximate nearest-neighbor matches between image patches, and proposes additional intuitive constraints on the synthesis process that offer the user a level of control unavailable in previous methods.
Abstract: This paper presents interactive image editing tools using a new randomized algorithm for quickly finding approximate nearest-neighbor matches between image patches. Previous research in graphics and vision has leveraged such nearest-neighbor searches to provide a variety of high-level digital image editing tools. However, the cost of computing a field of such matches for an entire image has eluded previous efforts to provide interactive performance. Our algorithm offers substantial performance improvements over the previous state of the art (20-100x), enabling its use in interactive editing tools. The key insights driving the algorithm are that some good patch matches can be found via random sampling, and that natural coherence in the imagery allows us to propagate such matches quickly to surrounding areas. We offer theoretical analysis of the convergence properties of the algorithm, as well as empirical and practical evidence for its high quality and performance. This one simple algorithm forms the basis for a variety of tools -- image retargeting, completion and reshuffling -- that can be used together in the context of a high-level image editing application. Finally, we propose additional intuitive constraints on the synthesis process that offer the user a level of control unavailable in previous methods.

2,888 citations

Journal ArticleDOI
01 Aug 2004
TL;DR: This paper shows how high-quality video-based rendering of dynamic scenes can be accomplished using multiple synchronized video streams combined with novel image-based modeling and rendering algorithms, and develops a novel temporal two-layer compressed representation that handles matting.
Abstract: The ability to interactively control viewpoint while watching a video is an exciting application of image-based rendering. The goal of our work is to render dynamic scenes with interactive viewpoint control using a relatively small number of video cameras. In this paper, we show how high-quality video-based rendering of dynamic scenes can be accomplished using multiple synchronized video streams combined with novel image-based modeling and rendering algorithms. Once these video streams have been processed, we can synthesize any intermediate view between cameras at any time, with the potential for space-time manipulation.In our approach, we first use a novel color segmentation-based stereo algorithm to generate high-quality photoconsistent correspondences across all camera views. Mattes for areas near depth discontinuities are then automatically extracted to reduce artifacts during view synthesis. Finally, a novel temporal two-layer compressed representation that handles matting is developed for rendering at interactive rates.

1,677 citations


Cites methods from "Image-based rendering using image-b..."

  • ...A method to avoid such visible artifacts is to use image priors [Fitzgibbon et al. 2003], but it is not clear if such as technique can be used for real-time rendering....

    [...]

Proceedings ArticleDOI
20 Jun 2005
TL;DR: A framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks, developed using a Products-of-Experts framework.
Abstract: We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach extends traditional Markov random field (MRF) models by learning potential functions over extended pixel neighborhoods. Field potentials are modeled using a Products-of-Experts framework that exploits nonlinear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field of Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with and even outperform specialized techniques.

1,167 citations

Journal ArticleDOI
TL;DR: The approach provides a practical method for learning high-order Markov random field models with potential functions that extend over large pixel neighborhoods with non-linear functions of many linear filter responses.
Abstract: We develop a framework for learning generic, expressive image priors that capture the statistics of natural scenes and can be used for a variety of machine vision tasks. The approach provides a practical method for learning high-order Markov random field (MRF) models with potential functions that extend over large pixel neighborhoods. These clique potentials are modeled using the Product-of-Experts framework that uses non-linear functions of many linear filter responses. In contrast to previous MRF approaches all parameters, including the linear filters themselves, are learned from training data. We demonstrate the capabilities of this Field-of-Experts model with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the model is trained on a generic image database and is not tuned toward a specific application, we obtain results that compete with specialized techniques.

848 citations

Posted Content
TL;DR: Empirical evaluation demonstrates the effectiveness of the unsupervised learning framework for monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and pose estimation performs favorably compared to established SLAM systems under comparable input settings.
Abstract: We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences. We achieve this by simultaneously training depth and camera pose estimation networks using the task of view synthesis as the supervisory signal. The networks are thus coupled via the view synthesis objective during training, but can be applied independently at test time. Empirical evaluation on the KITTI dataset demonstrates the effectiveness of our approach: 1) monocular depth performing comparably with supervised methods that use either ground-truth pose or depth for training, and 2) pose estimation performing favorably with established SLAM systems under comparable input settings.

717 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations


"Image-based rendering using image-b..." refers background in this paper

  • ...of stereo reconstructions [11, 19, 20 ], volumetric techniques such as space carving [3, 13, 15, 21, 26], and other volumetric approaches [24]....

    [...]

Journal ArticleDOI
Julian Besag1
TL;DR: In this paper, the authors proposed an iterative method for scene reconstruction based on a non-degenerate Markov Random Field (MRF) model, where the local characteristics of the original scene can be represented by a nondegenerate MRF and the reconstruction can be estimated according to standard criteria.
Abstract: may 7th, 1986, Professor A. F. M. Smith in the Chair] SUMMARY A continuous two-dimensional region is partitioned into a fine rectangular array of sites or "pixels", each pixel having a particular "colour" belonging to a prescribed finite set. The true colouring of the region is unknown but, associated with each pixel, there is a possibly multivariate record which conveys imperfect information about its colour according to a known statistical model. The aim is to reconstruct the true scene, with the additional knowledge that pixels close together tend to have the same or similar colours. In this paper, it is assumed that the local characteristics of the true scene can be represented by a nondegenerate Markov random field. Such information can be combined with the records by Bayes' theorem and the true scene can be estimated according to standard criteria. However, the computational burden is enormous and the reconstruction may reflect undesirable largescale properties of the random field. Thus, a simple, iterative method of reconstruction is proposed, which does not depend on these large-scale characteristics. The method is illustrated by computer simulations in which the original scene is not directly related to the assumed random field. Some complications, including parameter estimation, are discussed. Potential applications are mentioned briefly.

4,490 citations


"Image-based rendering using image-b..." refers methods in this paper

  • ...For this work, we have implemented a variant of the iterated conditional modes (ICM) algorithm ( Besag, 1986 ), alternately optimizing the photoconsistency and texture priors....

    [...]

Proceedings ArticleDOI
01 Aug 1996
TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
Abstract: A number of techniques have been proposed for flying through scenes by redisplaying previously rendered or digitized views. Techniques have also been proposed for interpolating between views by warping input images, using depth information or correspondences between multiple images. In this paper, we describe a simple and robust method for generating new views from arbitrary camera positions without depth information or feature matching, simply by combining and resampling the available images. The key to this technique lies in interpreting the input images as 2D slices of a 4D function the light field. This function completely characterizes the flow of light through unobstructed space in a static scene with fixed illumination. We describe a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views. We hav e created light fields from large arrays of both rendered and digitized images. The latter are acquired using a video camera mounted on a computer-controlled gantry. Once a light field has been created, new views may be constructed in real time by extracting slices in appropriate directions. Since the success of the method depends on having a high sample rate, we describe a compression system that is able to compress the light fields we have generated by more than a factor of 100:1 with very little loss of fidelity. We also address the issues of antialiasing during creation, and resampling during slice extraction. CR Categories: I.3.2 [Computer Graphics]: Picture/Image Generation — Digitizing and scanning, Viewing algorithms; I.4.2 [Computer Graphics]: Compression — Approximate methods Additional keywords: image-based rendering, light field, holographic stereogram, vector quantization, epipolar analysis

4,426 citations


"Image-based rendering using image-b..." refers methods in this paper

  • ...Implicitgeometry techniques (Gortler et al., 1996; Levoy and Hanrahan, 1996; Matusik et al., 2002; McMillan and Bishop, 1995) assemble the pixels of the synthesized view from the rays sampled by the pixels of the input images....

    [...]

Proceedings ArticleDOI
01 Aug 1996
TL;DR: A new method for capturing the complete appearance of both synthetic and real world objects and scenes, representing this information, and then using this representation to render images of the object from new camera positions.
Abstract: This paper discusses a new method for capturing the complete appearance of both synthetic and real world objects and scenes, representing this information, and then using this representation to render images of the object from new camera positions. Unlike the shape capture process traditionally used in computer vision and the rendering process traditionally used in computer graphics, our approach does not rely on geometric representations. Instead we sample and reconstruct a 4D function, which we call a Lumigraph. The Lumigraph is a subset of the complete plenoptic function that describes the flow of light at all positions in all directions. With the Lumigraph, new images of the object can be generated very quickly, independent of the geometric or illumination complexity of the scene or object. The paper discusses a complete working system including the capture of samples, the construction of the Lumigraph, and the subsequent rendering of images from this new representation.

2,986 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: A non-parametric method for texture synthesis that aims at preserving as much local structure as possible and produces good results for a wide variety of synthetic and real-world textures.
Abstract: A non-parametric method for texture synthesis is proposed. The texture synthesis process grows a new image outward from an initial seed, one pixel at a time. A Markov random field model is assumed, and the conditional distribution of a pixel given all its neighbors synthesized so far is estimated by querying the sample image and finding all similar neighborhoods. The degree of randomness is controlled by a single perceptually intuitive parameter. The method aims at preserving as much local structure as possible and produces good results for a wide variety of synthetic and real-world textures.

2,972 citations


"Image-based rendering using image-b..." refers background or methods in this paper

  • ...Our texture representation, as a library of exemplar image patches, derives from this and from the recent tecture synthesis literature ( Efros and Leung, 1999; Wei and Levoy, 2000)....

    [...]

  • ...As the form of ptexture is typically very difficult to represent analytically (Huang and Mumford, 1999), we follow ( Efros and Leung, 1999; Freeman and Pasztor, 1999) and represent our texture prior as a library of texture patches....

    [...]