scispace - formally typeset
Search or ask a question
Topic

View synthesis

About: View synthesis is a research topic. Over the lifetime, 1701 publications have been published within this topic receiving 42333 citations.


Papers
More filters
Proceedings ArticleDOI
26 Sep 2001
TL;DR: This work addresses the application of computer vision to semi-immersive teleconferencing, and presents a prototype vision system synthesising a physically plausible video of a speaker to be displayed at a remote conferencing station.
Abstract: We address the application of computer vision to semi-immersive teleconferencing, and present a prototype vision system synthesising a physically plausible video of a speaker to be displayed at a remote conferencing station. The main system components are a hierarchical, efficient large-baseline disparity estimation and a view synthesis module. We illustrate and discuss some results with a real-speaker sequence. We regard the development of such a system in the domain of advanced teleconferencing as the main contribution of this work.

18 citations

Proceedings ArticleDOI
01 Oct 2007
TL;DR: This paper presents an efficient image-based rendering system capable of performing online stereo matching and view synthesis at high speed, completely on the graphics processing unit (GPU).
Abstract: This paper presents an efficient image-based rendering system capable of performing online stereo matching and view synthesis at high speed, completely on the graphics processing unit (GPU). Given two rectified stereo images, our algorithm first extracts the disparity map with a stream-centric dense depth estimation approach. For high-quality view synthesis, multi-label masks are then automatically generated to postprocess occlusions and ambiguously estimated regions adaptively. To allow even faster interactive view generation, an alternative forward warping method is also integrated. The experiments show that photorealistic intermediate views of high image quality are yielded by our algorithm. The optimized implementation also provides the state-of-the-art stereo analysis and view synthesis speed, achieving over 47 fps with 450x375 stereo images and 60 disparity levels on an Nvidia GeForce 7900 graphics card.

18 citations

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed depth view synthesis method provides high-quality depth images for the current view and the proposed VSP modes provide high coding gains, especially on the anchor frames.
Abstract: The view synthesis prediction (VSP) method utilizes interview correlations between views by generating an additional reference frame in the multiview video coding. This paper describes a multiview depth video coding scheme that incorporates depth view synthesis and additional prediction modes. In the proposed scheme, we exploit the reconstructed neighboring depth frame to generate an additional reference depth image for the current viewpoint to be coded using the depth image-based-rendering technique. In order to generate high-quality reference depth images, we used pre-processing on depth, depth image warping, and two types of hole filling methods depending on the number of available reference views. After synthesizing the additional depth image, we encode the depth video using the proposed additional prediction modes named VSP modes; those additional modes refer to the synthesized depth image. In particular, the VSP_SKIP mode refers to the co-located block of the synthesized frame without the coding motion vectors and residual data, which gives most of the coding gains. Experimental results demonstrate that the proposed depth view synthesis method provides high-quality depth images for the current view and the proposed VSP modes provide high coding gains, especially on the anchor frames.

18 citations

Proceedings Article
01 Jan 2019
TL;DR: This work devise an approach that exploits known geometric properties of the scene (per-frame camera extrinsics and depth) in order to warp reference views into the new ones, and obtains images that are geometrically consistent with all the views in the scene camera system.
Abstract: Given a set of a reference RGBD views of an indoor environment, and a new viewpoint, our goal is to predict the view from that location. Prior work on new-view generation has predominantly focused on significantly constrained scenarios, typically involving artificially rendered views of isolated CAD models. Here we tackle a much more challenging version of the problem. We devise an approach that exploits known geometric properties of the scene (per-frame camera extrinsics and depth) in order to warp reference views into the new ones. The defects in the generated views are handled by a novel RGBD inpainting network, PerspectiveNet, that is fine-tuned for a given scene in order to obtain images that are geometrically consistent with all the views in the scene camera system. Experiments conducted on the ScanNet and SceneNet datasets reveal performance superior to strong baselines.

18 citations

Book ChapterDOI
23 Aug 2020
TL;DR: A novel siamese network is introduced that employs a gating layer for better reconstruction of the latent volumetric representation and, consequently, final visual results, and a novel loss is introduced to explicitly enforce consistency across generated views both in space and in time.
Abstract: Novel view video synthesis aims to synthesize novel viewpoints videos given input captures of a human performance taken from multiple reference viewpoints and over consecutive time steps. Despite great advances in model-free novel view synthesis, existing methods present three limitations when applied to complex and time-varying human performance. First, these methods (and related datasets) mainly consider simple and symmetric objects. Second, they do not enforce explicit consistency across generated views. Third, they focus on static and non-moving objects. The fine-grained details of a human subject can therefore suffer from inconsistencies when synthesized across different viewpoints or time steps. To tackle these challenges, we introduce a human-specific framework that employs a learned 3D-aware representation. Specifically, we first introduce a novel siamese network that employs a gating layer for better reconstruction of the latent volumetric representation and, consequently, final visual results. Moreover, features from consecutive time steps are shared inside the network to improve temporal consistency. Second, we introduce a novel loss to explicitly enforce consistency across generated views both in space and in time. Third, we present the Multi-View Human Action (MVHA) dataset, consisting of near 1200 synthetic human performance captured from 54 viewpoints. Experiments on the MVHA, Pose-Varying Human Model and ShapeNet datasets show that our method outperforms the state-of-the-art baselines both in view generation quality and spatio-temporal consistency.

18 citations


Network Information
Related Topics (5)
Image segmentation
79.6K papers, 1.8M citations
86% related
Feature (computer vision)
128.2K papers, 1.7M citations
86% related
Object detection
46.1K papers, 1.3M citations
85% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Feature extraction
111.8K papers, 2.1M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202354
2022117
2021189
2020158
2019114
2018102