scispace - formally typeset
Search or ask a question
Topic

View synthesis

About: View synthesis is a research topic. Over the lifetime, 1701 publications have been published within this topic receiving 42333 citations.


Papers
More filters
Proceedings ArticleDOI
01 Jan 2003
TL;DR: A persistent representation of occupancy is maintained in spite of occlusion without enforcing a particular parametric shape model, and a MAP solution for estimating layer parameters which are consistent across views is formulated.
Abstract: We propose a multiple view layered representation for tracking and segmentation of multiple objects in a scene. Existing layered approaches are dominated by the single view case and generally exploit only motion cues. We extend this to integrate static, dynamic and structural cues over a pair of views. The goal is to update coherent correspondence information sequentially, producing a multi-object tracker as a natural byproduct. We formulate a MAP solution for estimating layer parameters which are consistent across views, with the EM algorithm used to determine both the hidden segmentation labelling and motion parameters. A persistent representation of occupancy is maintained in spite of occlusion without enforcing a particular parametric shape model. An immediate application is dynamic novel view synthesis, for which our layered approach offers a direct and convenient representation.

23 citations

Book ChapterDOI
01 Jan 2003
TL;DR: This work states that the goal of automatic recovery of camera motion and scene structure from video sequences has been a staple of computer vision research for over a decade and now represents one of the success stories ofComputer vision.
Abstract: The goal of automatic recovery of camera motion and scene structure from video sequences has been a staple of computer vision research for over a decade. As an area of endeavour, it has seen both steady and explosive progress over time, and now represents one of the success stories of computer vision. This task, automatic camera tracking or “matchmoving”, is the sine qua non of modern special effects, allowing the seamless insertion of computer generated objects onto live-action backgrounds (figure 2.1 shows an example). It has moved from a research problem for a small number of uncalibrated images to commercial software which can automatically track cameras through thousands of frames [1]. In addition, camera tracking is an important preprocess for many computer vision algorithms such as multiple-view shape reconstruction, novel view synthesis and autonomous vehicle navigation.

23 citations

Journal ArticleDOI
TL;DR: This paper considers the two DIBR algorithms used in the Moving Picture Experts Group view synthesis reference software, and develops a scheme for the encoder to estimate the distortion of the synthesized virtual view at the decoder when the reference texture and depth sequences experience transmission errors such as packet loss.
Abstract: Depth-image-based rendering (DIBR) is frequently used in multiview video applications such as free-viewpoint television. In this paper, we consider the two DIBR algorithms used in the Moving Picture Experts Group view synthesis reference software, and develop a scheme for the encoder to estimate the distortion of the synthesized virtual view at the decoder when the reference texture and depth sequences experience transmission errors such as packet loss. We first develop a graphical model to analyze how random errors in the reference depth image affect the synthesized virtual view. The warping competition rule adopted in the DIBR algorithms is explicitly represented by the graphical model. We then consider the case where packet loss occurs to both the encoded texture and depth images during transmission and develop a recursive optimal distribution estimation (RODE) method to calculate the per-pixel texture and depth probability distributions in each frame of the reference views. The RODE is then integrated with the graphical model method to estimate the distortion in the synthesized view caused by packet loss. Experimental results verify the accuracy of the graphical model method, the RODE, and the combined estimation scheme.

23 citations

Proceedings ArticleDOI
TL;DR: A view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views is introduced, which has an improved rate-distortion performance and preserves the structure of the perceived light fields better.
Abstract: Full parallax light field displays require high pixel density and huge amounts of data. Compression is a necessary tool used by 3D display systems to cope with the high bandwidth requirements. One of the formats adopted by MPEG for 3D video coding standards is the use of multiple views with associated depth maps. Depth maps enable the coding of a reduced number of views, and are used by compression and synthesis software to reconstruct the light field. However, most of the developed coding and synthesis tools target linearly arranged cameras with small baselines. Here we propose to use the 3D video coding format for full parallax light field coding. We introduce a view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views. We determine the minimal requirements for view sub-sampling and present the rate-distortion performance of our proposal. We also compare our method with established video compression techniques, such as H.264/AVC, H.264/MVC, and the new 3D video coding algorithm, 3DV-ATM. Our results show that our method not only has an improved rate-distortion performance, it also preserves the structure of the perceived light fields better.

23 citations

Proceedings Article
01 Jan 2021
TL;DR: MINE as mentioned in this paper predicts a 4-channel image (RGB and volume density) at arbitrary depth values to jointly reconstruct the camera frustum and fill in occluded contents, which can then be easily rendered into novel RGB or depth views using differentiable rendering.
Abstract: In this paper, we propose MINE to perform novel view synthesis and depth estimation via dense 3D reconstruction from a single image. Our approach is a continuous depth generalization of the Multiplane Images (MPI) by introducing the NEural radiance fields (NeRF). Given a single image as input, MINE predicts a 4-channel image (RGB and volume density) at arbitrary depth values to jointly reconstruct the camera frustum and fill in occluded contents. The reconstructed and inpainted frustum can then be easily rendered into novel RGB or depth views using differentiable rendering. Extensive experiments on RealEstate10K, KITTI and Flowers Light Fields show that our MINE outperforms state-of-the-art by a large margin in novel view synthesis. We also achieve competitive results in depth estimation on iBims-1 and NYU-v2 without annotated depth supervision. Our source code is available at this https URL

23 citations


Network Information
Related Topics (5)
Image segmentation
79.6K papers, 1.8M citations
86% related
Feature (computer vision)
128.2K papers, 1.7M citations
86% related
Object detection
46.1K papers, 1.3M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202354
2022117
2021189
2020158
2019114
2018102