Showing papers on "View synthesis published in 2016"

PDF

Open Access

Book Chapter•DOI•

[...]

Tinghui Zhou¹, Shubham Tulsiani¹, Weilun Sun¹, Jitendra Malik¹, Alexei A. Efros¹ - Show less +1 more•Institutions (1)

08 Oct 2016

TL;DR: This work addresses the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints and shows that for both objects and scenes, this approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

...read moreread less

Abstract: We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

...read moreread less

660 citations

Proceedings Article•DOI•

Deep Stereo: Learning to Predict New Views from the World's Imagery

[...]

John Flynn, Ivan Neulander¹, James Philbin, Noah Snavely¹•Institutions (1)

Google¹

01 Jun 2016

TL;DR: This work presents a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets, and is the first to apply deep learning to the problem ofnew view synthesis from sets of real-world, natural imagery.

...read moreread less

Abstract: Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision [22, 33], but their use in graphics problems has been limited ([23, 7] are notable recent exceptions). In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches, which consist of multiple complex stages of processing, each of which requires careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network, which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system, which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. We show view interpolation results on imagery from the KITTI dataset [12], from data from [1] as well as on Google Street View images. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.

...read moreread less

551 citations

Journal Article•DOI•

Learning-based view synthesis for light field cameras

[...]

Nima Khademi Kalantari¹, Ting-Chun Wang¹, Ravi Ramamoorthi¹•Institutions (1)

University of California¹

11 Nov 2016

TL;DR: In this paper, a learning-based approach is proposed to synthesize new views from a sparse set of input views using two sequential convolutional neural networks to model disparity and color estimation components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images.

...read moreread less

Abstract: With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

...read moreread less

435 citations

Journal Article•DOI•

Learning-Based View Synthesis for Light Field Cameras

[...]

Nima Khademi Kalantari¹, Ting-Chun Wang¹, Ravi Ramamoorthi¹•Institutions (1)

University of California¹

09 Sep 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a novel learning-based approach to synthesize new views from a sparse set of input views that could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

...read moreread less

427 citations

Journal Article•DOI•

Depth Image Based View Synthesis: New Insights and Perspectives on Hole Generation and Filling

[...]

Ce Zhu¹, Shuai Li¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Mar 2016-IEEE Transactions on Broadcasting

TL;DR: This paper provides a fundamental examination of hole generation mechanism in the DIBR oriented view synthesis process and proposes utilizing the occluded information to identify and locate the relevant background pixels around a hole.

...read moreread less

Abstract: View synthesis with depth-image-based rendering (DIBR) has attracted great interest in that it can provide a virtual image at any arbitrary viewpoint in 3-D video and free-viewpoint TV. An inherent problem in the DIBR view synthesis is occurrence of holes in a synthesized image, which is also known as disocclusion problem. The disoccluded regions need to be handled properly in order to generate a synthesized view of good quality. This paper provides a fundamental examination of hole generation mechanism in the DIBR oriented view synthesis process. A necessary and sufficient condition of hole generation is first shown, and the corresponding hole location and length is obtained analytically. Furthermore, in view that the conventional hole filling algorithms may fail to fill up a hole correctly when lacking (adequate) visible background information, we propose utilizing the occluded (invisible) information to identify and locate the relevant background pixels around a hole. We then make use of the visible and invisible background information together to perform hole filling. Experimental results validate our hole generation model demonstrating agreement to our analytical results, while our proposed hole filling approach shows superior performance in terms of visual quality of synthesized views.

...read moreread less

67 citations

Journal Article•DOI•

Multiview Plus Depth Video Coding With Temporal Prediction View Synthesis

[...]

Andrei Purica¹, Elie Gabriel Mora¹, Beatrice Pesquet-Popescu¹, Marco Cagnazzo¹, Bogdan Ionescu² - Show less +1 more•Institutions (2)

Institut Mines-Télécom¹, Politehnica University of Bucharest²

01 Feb 2016-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This work proposes a new coding scheme for 3-D High Efficiency Video Coding (HEVC) that allows it to take full advantage of temporal correlations in the intermediate view and improve the existing synthesis from adjacent views.

...read moreread less

Abstract: Multiview video (MVV) plus depths formats use view synthesis to build intermediate views from existing adjacent views at the receiver side. Traditional view synthesis exploits the disparity information to interpolate an intermediate view by considered inter-view correlations. However, temporal correlation between different frames of the intermediate view can be used to improve the synthesis. We propose a new coding scheme for 3-D High Efficiency Video Coding (HEVC) that allows us to take full advantage of temporal correlations in the intermediate view and improve the existing synthesis from adjacent views. We use optical flow techniques to derive dense motion vector fields (MVF) from the adjacent views and then warp them at the level of the intermediate view. This allows us to construct multiple temporal predictions of the synthesized frame. A second contribution is an adaptive fusion method that judiciously selects between temporal and inter-view prediction to eliminate artifacts associated with each prediction type. The proposed system is compared against the state-of-the-art view synthesis reference software 1-D Fast technique used in 3-D HEVC standardization. Three intermediary views are synthesized. Gains of up to 1.21-dB Bjontegaard Delta peak SNR are shown when evaluated on several standard MVV test sequences.

...read moreread less

56 citations

Proceedings Article•DOI•

A Hole Filling Approach Based on Background Reconstruction for View Synthesis in 3D Video

[...]

Guibo Luo¹, Yuesheng Zhu¹, Zhaotian Li¹, Liming Zhang²•Institutions (2)

Peking University¹, University of Macau²

01 Jun 2016

TL;DR: A hole filling approach based on background reconstruction is proposed, in which the temporal correlation information in both the 2D video and its corresponding depth map are exploited to construct a background video to eliminate holes in the synthetized video.

...read moreread less

Abstract: The depth image based rendering (DIBR) plays a key role in 3D video synthesis, by which other virtual views can be generated from a 2D video and its depth map. However, in the synthesis process, the background occluded by the foreground objects might be exposed in the new view, resulting in some holes in the synthetized video. In this paper, a hole filling approach based on background reconstruction is proposed, in which the temporal correlation information in both the 2D video and its corresponding depth map are exploited to construct a background video. To construct a clean background video, the foreground objects are detected and removed. Also motion compensation is applied to make the background reconstruction model suitable for moving camera scenario. Each frame is projected to the current plane where a modified Gaussian mixture model is performed. The constructed background video is used to eliminate the holes in the synthetized video. Our experimental results have indicated that the proposed approach has better quality of the synthetized 3D video compared with the other methods.

...read moreread less

50 citations

Journal Article•DOI•

In-Network View Synthesis for Interactive Multiview Video Systems

[...]

Laura Toni¹, Gene Cheung², Pascal Frossard¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, National Institute of Informatics²

01 May 2016-IEEE Transactions on Multimedia

TL;DR: A new reference view selection problem is cast that seeks the subset of views minimizing the distortion over a view navigation window defined by the user under bandwidth constraints, and an effective polynomial time algorithm using dynamic programming is proposed to solve the optimization problem.

...read moreread less

Abstract: In multiview applications, camera views can be used as reference views to synthesize additional virtual viewpoints, allowing users to freely navigate within a 3D scene. However, bandwidth constraints may restrict the number of reference views sent to clients, limiting the quality of the synthesized viewpoints. In this work, we study the problem of in-network reference view synthesis aimed at improving the navigation quality at the clients. We consider a distributed cloud network architecture, where data stored in a main cloud is delivered to end users with the help of cloudlets, i.e., resource-rich proxies close to the users. We argue that, in case of limited bandwidth from the cloudlet to the users, re-sampling at the couldlet the viewpoints of the 3D scene (i.e., synthesizing novel virtual views in the cloudlets to be used as new references to the decoder) is beneficial compared to mere subsampling of the original set of camera views. We therefore cast a new reference view selection problem that seeks the subset of views minimizing the distortion over a view navigation window defined by the user under bandwidth constraints. We prove that the problem is NP-hard, and we propose an effective polynomial time algorithm using dynamic programming to solve the optimization problem under general assumptions that cover most of the multiview scenarios in practice. Simulation results confirm the performance gain offered by virtual view synthesis in the network.

...read moreread less

32 citations

Patent•

Deepstereo: learning to predict new views from real world imagery

[...]

John Flynn¹, Keith Noah Snavely¹, Ivan Neulander¹, James Philbin¹•Institutions (1)

Google¹

13 May 2016

TL;DR: In this article, a system and method of deep learning using deep networks to predict new views from existing images may generate and improve models and representations from large-scale data, which can be used in graphics generation.

...read moreread less

Abstract: A system and method of deep learning using deep networks to predict new views from existing images may generate and improve models and representations from large-scale data. This system and method of deep learning may employ a deep architecture performing new view synthesis directly from pixels, trained from large numbers of posed image sets. A system employing this type of deep network may produce pixels of an unseen view based on pixels of neighboring views, lending itself to applications in graphics generation.

...read moreread less

31 citations

Journal Article•DOI•

A Virtual View PSNR Estimation Method for 3-D Videos

[...]

Hui Yuan¹, Sam Kwong², Xu Wang³, Yun Zhang⁴, Fengrong Li⁴ - Show less +1 more•Institutions (4)

Shandong University¹, City University of Hong Kong², Shenzhen University³, Chinese Academy of Sciences⁴

01 Mar 2016-IEEE Transactions on Broadcasting

TL;DR: The virtual view synthesis procedure and the distortion propagation from existing views to virtual views are analyzed in detail, and then a virtual view distortion/PSNR estimation method is derived and Experimental results demonstrate that the proposed method could estimate PSNRs of virtual views accurately.

...read moreread less

Abstract: In three-dimensional videos (3-DVs) with ${n}$ -view texture videos plus ${n}$ -view depth maps, virtual views can be synthesized from neighboring texture videos and the associated depth maps. To evaluate the system performance or guide the rate-distortion-optimization process of 3-DV coding, the distortion/PSNR of the virtual view should be calculated by measuring the quality difference between the virtual view synthesized by compressed 3-DVs with one synthesized by uncompressed 3-DVs, which increases the complexity of a 3-DV system. In order to reduce the complexity of 3-DV system, it is better to estimate virtual view distortions/PSNR directly without rendering virtual views. In this paper, the virtual view synthesis procedure and the distortion propagation from existing views to virtual views are analyzed in detail, and then a virtual view distortion/PSNR estimation method is derived. Experimental results demonstrate that the proposed method could estimate PSNRs of virtual views accurately. The squared correlation coefficient and root of mean squared error between the estimated PSNRs by the proposed method and the actual PSNRs are 0.998 and 2.012 on average for all the tested sequences. Since the proposed method is implemented row-by-row independently, it is also friendly for parallel design. The execute time for each row of pictures with $1024 {\times }768$ resolution is only 0.079 s, while for pictures with $1920 {\times }1088$ resolution it is only 0.155 s.

...read moreread less

30 citations

Posted Content•

View Synthesis by Appearance Flow

[...]

Tinghui Zhou¹, Shubham Tulsiani¹, Weilun Sun¹, Jitendra Malik¹, Alexei A. Efros¹ - Show less +1 more•Institutions (1)

University of California¹

11 May 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a CNN-based approach is proposed to synthesize novel views of the same object or scene observed from arbitrary viewpoints from an input image. But, instead of synthesizing pixels from scratch, they learn to copy them from the input image, which is different from our approach.

...read moreread less

Abstract: We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows -- 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

...read moreread less

Journal Article•DOI•

Estimation of Virtual View Synthesis Distortion Toward Virtual View Position

[...]

Lu Fang¹, Yijian Xiang², Ngai-Man Cheung³, Feng Wu¹•Institutions (3)

University of Science and Technology of China¹, Washington University in St. Louis², Singapore University of Technology and Design³

01 May 2016-IEEE Transactions on Image Processing

TL;DR: An analytical model to estimate the depth-error-induced virtual view synthesis distortion (VVSD) in 3D video, taking the distance between reference and virtual views (virtual view position) into account is proposed.

...read moreread less

Abstract: We propose an analytical model to estimate the depth-error-induced virtual view synthesis distortion (VVSD) in 3D video, taking the distance between reference and virtual views (virtual view position) into account. In particular, we start with a comprehensive preanalysis and discussion over several possible VVSD scenarios. Taking intrinsic characteristic of each scenario into consideration, we specifically classify them into four clusters: 1) overlapping region; 2) disocclusion and boundary region; 3) edge region; and 4) infrequent region. We propose to model VVSD as the linear combination of the distortion under different scenarios (DDSs) weighted by the probability under different scenarios (PDSs). We show analytically that DDS and PDS can be related to the virtual view position using quadratic/biquadratic models and linear models, respectively. Experimental results verify that the proposed model is capable of estimating the relationship between VVSD and the distance between reference and virtual views. Therefore, our model can be used to inform a reference view setup for capturing, or distortion at certain virtual view positions, when depth information is compressed.

...read moreread less

Journal Article•DOI•

New visual coding exploration in MPEG: Super-multiview and free navigation in free viewpoint TV

[...]

Gauthier Lafruit¹, Marek Domanski, Krzysztof Wegner, Tomasz Grajek, Takanori Senoh, Joel Jung, Péter Tamás Kovács, Patrik Goorts, Lode Jorissen, Adrian Munteanu, Beerend Ceulemans, Pablo Carballeira, Sergio García, Masayuki Tanimoto - Show less +10 more•Institutions (1)

Université libre de Bruxelles¹

17 Feb 2016-electronic imaging

TL;DR: In view of supporting future highquality, auto-stereoscopic 3D displays and Free Navigation virtual/augmented reality applications with sparse, arbitrarily arranged camera setups, innovative depth estimation and virtual view synthesis techniques with global optimizations over all camera views should be developed.

...read moreread less

Abstract: ISO/IEC MPEG and ITU-T VCEG have recently jointly issued a new multiview video compression standard, called 3D-HEVC, which reaches unpreceded compression performances for linear,dense camera arrangements. In view of supporting future highquality,auto-stereoscopic 3D displays and Free Navigation virtual/augmented reality applications with sparse, arbitrarily arranged camera setups, innovative depth estimation and virtual view synthesis techniques with global optimizations over all camera views should be developed. Preliminary studies in response to the MPEG-FTV (Free viewpoint TV) Call for Evidence suggest these targets are within reach, with at least 6% bitrate gains over 3DHEVC technology.

...read moreread less

Journal Article•DOI•

Virtual view synthesis using layered depth image generation and depth-based inpainting for filling disocclusions and translucent disocclusions

[...]

Suryanarayana Murthy Muddala¹, Mårten Sjöström¹, Roger Olsson¹•Institutions (1)

Mid Sweden University¹

01 Jul 2016-Journal of Visual Communication and Image Representation

TL;DR: A layered depth image (LDI) is introduced in the original camera view, in which it is proposed to identify and fill occluded background so that when the LDI data is rendered to a virtual view, no disocclusions appear but views with consistent data are produced also handling translucent disocclusion.

...read moreread less

Posted Content•

Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis

[...]

Jimei Yang¹, Scott Reed², Ming-Hsuan Yang¹, Honglak Lee²•Institutions (2)

University of California, Merced¹, University of Michigan²

05 Jan 2016-arXiv: Learning

TL;DR: In this article, a recurrent convolutional encoder-decoder network is proposed to synthesize novel views of a 3D object from a single image, which can capture long-term dependencies along a sequence of transformations.

...read moreread less

Abstract: An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

...read moreread less

Proceedings Article•DOI•

Multiview synthesis — Improved view synthesis for virtual navigation

[...]

Adrian Dziembowski¹, Adam Grzelka¹, Dawid Mieloch¹, Olgierd Stankiewicz¹, Krzysztof Wegner¹, Marek Domanski¹ - Show less +2 more•Institutions (1)

Poznań University of Technology¹

01 Jan 2016

TL;DR: In this approach, virtual views are synthesized using two neighboring real views, but the disoccluded areas are not inpainted, but filled by the information from the further real views — additional views, traditionally not included in the view synthesis.

...read moreread less

Abstract: In the paper, we propose a new method for the virtual view synthesis called as Multiview Synthesis. In our approach, virtual views are synthesized using two neighboring real views, but the disoccluded areas are not inpainted, but filled by the information from the further real views — additional views, traditionally not included in the view synthesis. The whole synthesis is performed with triangles rather than with individual pixels. In the proposed Multiview Synthesis, additional steps of adaptive color correction, blurred edge removal and spatial edge blurring are also included. The experimental results show that both the objective and subjective quality of the synthesized views is significantly higher than the quality of views obtained using the state-of-the-art MPEG reference view synthesis software.

...read moreread less

Journal Article•DOI•

Spatio-temporal consistent depth-image-based rendering using layered depth image and inpainting

[...]

Suryanarayana Murthy Muddala¹, Roger Olsson¹, Mårten Sjöström¹•Institutions (1)

Mid Sweden University¹

29 Feb 2016-Eurasip Journal on Image and Video Processing

TL;DR: Experimental results demonstrate that spatio-temporal inconsistencies are significantly reduced using the proposed method and subjective and objective qualities are improved compared to state-of-the-art reference methods.

...read moreread less

Abstract: Depth-image-based rendering (DIBR) is a commonly used method for synthesizing additional views using video-plus-depth (V+D) format. A critical issue with DIBR-based view synthesis is the lack of information behind foreground objects. This lack is manifested as disocclusions, holes, next to the foreground objects in rendered virtual views as a consequence of the virtual camera “seeing” behind the foreground object. The disocclusions are larger in the extrapolation case, i.e. the single camera case. Texture synthesis methods (inpainting methods) aim to fill these disocclusions by producing plausible texture content. However, virtual views inevitably exhibit both spatial and temporal inconsistencies at the filled disocclusion areas, depending on the scene content. In this paper, we propose a layered depth image (LDI) approach that improves the spatio-temporal consistency. In the process of LDI generation, depth information is used to classify the foreground and background in order to form a static scene sprite from a set of neighboring frames. Occlusions in the LDI are then identified and filled using inpainting, such that no disocclusions appear when the LDI data is rendered to a virtual view. In addition to the depth information, optical flow is computed to extract the stationary parts of the scene and to classify the occlusions in the inpainting process. Experimental results demonstrate that spatio-temporal inconsistencies are significantly reduced using the proposed method. Furthermore, subjective and objective qualities are improved compared to state-of-the-art reference methods.

...read moreread less

Journal Article•DOI•

Depth Map Down-Sampling and Coding Based on Synthesized View Distortion

[...]

Chao Yao¹, Jimin Xiao², Tammam Tillo², Yao Zhao¹, Chunyu Lin¹, Huihui Bai¹ - Show less +2 more•Institutions (2)

Beijing Jiaotong University¹, Xi'an Jiaotong-Liverpool University²

27 Jul 2016-IEEE Transactions on Multimedia

TL;DR: To enhance compression performance, the synthesized view distortion, which is evaluated by emulating the interpolation and the virtual view synthesis process, is used in the optimization objective function for coding mode selection in the video encoder.

...read moreread less

Abstract: In this paper, we propose a depth map down-sampling and coding scheme that minimizes the view synthesis distortion. Moreover, a solution for the optimal depth map down-sampling problem that minimizes the depth-caused distortion in the virtual view by exploiting the depth map and the associated texture information along with the up-sampling method to be used in the decoder side is derived. Furthermore, to enhance compression performance, the synthesized view distortion, which is evaluated by emulating the interpolation and the virtual view synthesis process, is used in the optimization objective function for coding mode selection in the video encoder. Experimental results show that both the proposed depth map down-sampling and encoding methods lead to good performance, and the average bit rate reduction is 2.62 $\%$ compared with 3D-AVC.

...read moreread less

Proceedings Article•DOI•

Efficient MRF-based disocclusion inpainting in multiview video

[...]

Beerend Ceulemans¹, Shao-Ping Lu¹, Gauthier Lafruit², Peter Schelkens¹, Adrian Munteanu¹ - Show less +1 more•Institutions (2)

iMinds¹, Université libre de Bruxelles²

11 Jul 2016

TL;DR: The experimental results show that view synthesis based on the proposed MRF-based inpainting method systematically improves performance over the state-of-the-art in multiview view synthesis.

...read moreread less

Abstract: View synthesis using depth image-based rendering generates virtual viewpoints of a 3D scene based on texture and depth information from a set of available cameras. One of the core components in view synthesis is image inpainting which performs the reconstruction of areas that were occluded in the available cameras but are visible from the virtual viewpoint. Inpainting methods based on Markov random fields (MRFs) have been shown to be very effective in inpainting large areas in images. In this paper, we propose a novel MRF-based in-painting method for multiview video. The proposed method steers the MRF optimization towards completion from background to foreground and exploits the available depth information in order to avoid bleeding artifacts. The proposed approach allows for efficiently filling-in large disocclusion areas and greatly accelerates execution compared to traditional MRF-based inpainting techniques. The experimental results show that view synthesis based on the proposed inpainting method systematically improves performance over the state-of-the-art in multiview view synthesis. Average PSNR gains up to 1.88 dB compared to the MPEG View Synthesis Reference software were observed.

...read moreread less

Journal Article•DOI•

Reference View Selection in DIBR-Based Multiview Coding

[...]

Thomas Maugey¹, Giovanni Petrazzuoli², Pascal Frossard³, Marco Cagnazzo², Beatrice Pesquet-Popescu² - Show less +1 more•Institutions (3)

French Institute for Research in Computer Science and Automation¹, Institut Mines-Télécom², École Polytechnique Fédérale de Lausanne³

15 Feb 2016-IEEE Transactions on Image Processing

TL;DR: This work studies the reference view selection problem and proposes an algorithm for the optimal selection of reference views in multiview coding systems, and formulation of an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized.

...read moreread less

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video, and other emerging multimedia applications require large sets of images, hence larger data volumes and increased resources compared with traditional video services. The significant increase in the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multiview video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the data set. In such coding schemes, the two following questions become fundamental: 1) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? And 2) where to place these key views in the multiview data set? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multiview coding scheme. We show that considering the 3D scene geometry in the reference view, positioning problem brings significant rate–distortion improvements and outperforms the traditional coding strategy that simply selects key frames based on the distance between cameras.

...read moreread less

Proceedings Article•DOI•

Hole-filling for single-view plus-depth based rendering with temporal texture synthesis

[...]

D. M. Motiur Rahaman¹, Manoranjan Paul¹•Institutions (1)

Charles Sturt University¹

11 Jul 2016

TL;DR: A new hole-filling technique using the number of GMM model rather than the background image to identify background/foreground pixels, which provides 0.9~1.7dB PSNR improvement compare to the state-of-the-art method.

...read moreread less

Abstract: View synthesis technique for 3D video and free viewpoint video (FVV) using existing view(s) can avoid the large volume of video data transmission. Existing techniques may concern poor rendering quality by missing pixel values (i.e. creating holes) due to the occluded region, rounding error and disparity discontinuity. To address those problems with the existing techniques uses correlations in spatial texture only or both spatial texture and temporal background. The former techniques (e.g. inpainting) suffer quality degradation due to lack of spatial correlation on the foreground-background boundary areas. The latter techniques (e.g. background update with Gaussian Mixture-based Modelling (GMM)) can improve quality in some occluded areas, however, due to the dependency on warping of background image and spatial correlation they still suffer quality degradation. In this paper, we propose a new hole-filling technique using the number of GMM model rather than the background image to identify background/foreground pixels. The missing pixels of background and foreground are recovered from the background pixel and the weighted average of warped and foreground model pixels respectively. The experimental results show that the proposed approach provides 0.9∼1.7dB PSNR improvement compare to the state-of-the-art method.

...read moreread less

Patent•

Auxiliary data for artifacts –aware view synthesis

[...]

S.C. Chan¹, Xiguang Wei¹•Institutions (1)

University of Hong Kong¹

08 Nov 2016

TL;DR: In this paper, an efficient method is used for extracting objects at partially occluded regions as defined by the auxiliary data from the texture videos to facilitate view synthesis with reduced artifacts.

...read moreread less

Abstract: Original or compressed Auxiliary Data, including possibly major depth discontinuities in the form of shape images, partial occlusion data, associated tuned and control parameters, and depth information of the original video(s), are used to facilitate the interactive display and generation of new views (view synthesis) of conventional 2D, stereo, and multi-view videos in conventional 2D, 3D (stereo) and multi-view or autostereoscopic displays with reduced artifacts. The partial or full occlusion data includes image, depth and opacity data of possibly partially occluded areas to facilitate the reduction of artifacts in the synthesized view. An efficient method is used for extracting objects at partially occluded regions as defined by the auxiliary data from the texture videos to facilitate view synthesis with reduced artifacts. Further, a method for updating the image background and the depth values uses the auxiliary data after extraction of each object to reduce the artifacts due to limited performance of online inpainting of missing data or holes during view synthesis.

...read moreread less

Patent•

Methods for full parallax compressed light field synthesis utilizing depth information

[...]

Danillo B. Graziosi, Zahir Y. Alpaslan, Hussein S. El-Ghoroury

22 Aug 2016

TL;DR: In this article, an innovative view merging method coupled with an efficient hole filling procedure compensates for depth misregistrations and inaccuracies to produce realistic synthesized views for full parallax light field displays is described.

...read moreread less

Abstract: An innovative method for synthesis of compressed light fields is described. Compressed light fields are commonly generated by sub-sampling light field views. The suppressed views must then be synthesized at the display, utilizing information from the compressed light field. The present invention describes a method for view synthesis that utilizes depth information of the scene to reconstruct the absent views. An innovative view merging method coupled with an efficient hole filling procedure compensates for depth misregistrations and inaccuracies to produce realistic synthesized views for full parallax light field displays.

...read moreread less

Proceedings Article•DOI•

Real-time novel-view synthesis for volume rendering using a piecewise-analytic representation

[...]

Gerrit Lochmann¹, Bernhard Reinert, A. Buchacher¹, Tobias Ritschel²•Institutions (2)

University of Koblenz and Landau¹, University College London²

10 Oct 2016

TL;DR: This paper proposes a layered image representation which is re-composed for the novel view with a special reconstruction filter, allowing to produce a typical novel view of 1024×1024 pixels in ca.

...read moreread less

Abstract: Novel-view synthesis can be used to hide latency in a real-time remote rendering setup, to increase frame rate or to produce advanced visual effects such as depth-of-field or motion blur in volumes or stereo and light field imagery. Regrettably, existing real-time solutions are limited to opaque surfaces. Prior art has circumvented the challenge by making volumes opaque i. e., projecting the volume onto representative surfaces for reprojection, omitting correct volumetric effects. This paper proposes a layered image representation which is re-composed for the novel view with a special reconstruction filter. We propose a view-dependent approximation to the volume allowing to produce a typical novel view of 1024×1024 pixels in ca. 25ms on a current GPU. At the heart of our approach is the idea to compress the complex view-dependent emission-absorption function along original view rays into a layered piecewise-analytic emission-absorption representation that can be efficiently ray-cast from a novel view. It does not assume opaque surfaces or approximate color and opacity, can be re-evaluated very efficiently, results in an image identical to the reference from the original view, has correct volumetric shading for novel views and works on a low and fixed number of layers per pixel that fits modern GPU architectures.

...read moreread less

Patent•

Panoramic 3D video generation method for virtual reality equipment

[...]

Jin Xin, Zhanqi Liu, Zhang Xin, Dai Qionghai

21 Sep 2016

TL;DR: In this article, a panoramic 3D video generation method for virtual reality equipment is presented, which includes the following steps: shooting a scene video using a wide-angle camera array, and generating a panorama video through a stitching algorithm; shooting the scene depth map using a depth camera array and generating the panorama depth map video through the stitching algorithm, by detecting the head position of a person in real time, cutting an image of a corresponding position in the panorama video frame as a video of the left view; and then generating a right view image through extrap

...read moreread less

Abstract: The invention discloses a panoramic 3D video generation method for virtual reality equipment, comprising the following steps: shooting a scene video using a wide-angle camera array, and generating a panoramic video through a stitching algorithm; shooting a scene depth map using a depth camera array, and generating a panoramic depth map video through the stitching algorithm; by detecting the head position of a person in real time, cutting an image of a corresponding position in the panoramic video frame as a video of the left view; and generating a right view image through extrapolation based on a virtual view synthesis technology according to the left view image and the corresponding depth map, wherein the two images are stitched into a left-right 3D video which is displayed in virtual reality equipment. By adding the view synthesis technology to a panoramic video displayed by virtual reality equipment, viewers can see the 3D effect of the panoramic video, the scene is more lifelike, and the viewer experience is enhanced.

...read moreread less

Journal Article•DOI•

Efficient Image Warping in Parallel for Multiview Three-Dimensional Displays

[...]

Nan Guo¹, Xinzhu Sang¹, Songlin Xie¹, Peng Wang¹, Chongxiu Yu¹ - Show less +1 more•Institutions (1)

Beijing University of Posts and Telecommunications¹

25 Aug 2016-IEEE\/OSA Journal of Display Technology

TL;DR: A predicted hole mapping (PHM) algorithm is presented, which requires no filling priority and smoothing operation, allowing parallel computation that facilitates a real-time 3D conversion system.

...read moreread less

Abstract: Three-dimensional (3D) display technologies make great process in recent years. View synthesis for 3D content requires the hole-filling, which is a challenging task. The increase of resolution and the number of views for view synthesis brings new challenges on memory and processing speed. A predicted hole mapping (PHM) algorithm is presented, which requires no filling priority and smoothing operation, allowing parallel computation that facilitates a real-time 3D conversion system. In experiments, the proposed PHM is evaluated and compared with some other methods in terms of peak signal to noise ratio and structural similarity index measurement, and the result shows the advantages in the numbers. The method can operate on the 32-view display with 4K × 2K resolution in real time on GPU.

...read moreread less

Proceedings Article•DOI•

Optimization of camera positions for free-navigation applications

[...]

Marek Domanski¹, Adrian Dziembowski¹, Adam Grzelka¹, Dawid Mieloch¹•Institutions (1)

Poznań University of Technology¹

01 Sep 2016

TL;DR: The results show the correlation between the number of occlusions in the scene and a gain from using camera pairs instead of uniformly distributed cameras.

...read moreread less

Abstract: In the article we deal with the problem of camera positioning in sparse multiview systems with applications to free navigation. The limited number of the cameras, though makes the system relatively practical, implies problems with proper depth estimation and virtual view synthesis, due to increased amount of the occluded areas. We present experimental results for the optimal positioning of the cameras, depending on two factors — characteristics of an acquired scene and the multi-camera system (linear or circular camera setup). The results show the correlation between the number of occlusions in the scene and a gain from using camera pairs instead of uniformly distributed cameras.

...read moreread less

Journal Article•DOI•

View synthesis with 3D object segmentation-based asynchronous blending and boundary misalignment rectification

[...]

Jing Liu¹, Chunpeng Li¹, Xuefeng Fan¹, Zhaoqi Wang¹, Min Shi², Jie Yang² - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, North China Electric Power University²

01 Jun 2016

TL;DR: A trilateral depth filter with local texture information, spatial proximity, and color similarity is incorporated to remove the ghost contours by rectifying the misalignment between the depth map and its associated color image.

...read moreread less

Abstract: Numerous depth image-based rendering algorithms have been proposed to synthesize the virtual view for the free viewpoint television. However, inaccuracies in the depth map cause visual artifacts in the virtual view. In this paper, we propose a novel virtual view synthesis framework to create the virtual view of the scene. Here, we incorporate a trilateral depth filter with local texture information, spatial proximity, and color similarity to remove the ghost contours by rectifying the misalignment between the depth map and its associated color image. To further enhance the quality of the synthesized virtual views, we partition the scene into different 3D object segments based on the color image and depth map. Each 3D object segment is warped and blended independently to avoid mixing the pixels belonging to different parts of the scene. The evaluation results indicate that the proposed method significantly improves the quality of the synthesized virtual view compared with other methods and are qualitatively very similar to the ground truth. In addition, it also performs well in real-world scenes.

...read moreread less

Journal Article•DOI•

Optimal depth recovery using image guided TGV with depth confidence for high-quality view synthesis

[...]

Pongsak Lasang¹, Wuttipong Kumwilaisak¹, Yazhou Liu², Sheng Mei Shen³•Institutions (3)

King Mongkut's University of Technology Thonburi¹, Nanjing University of Science and Technology², Panasonic³

01 Aug 2016-Journal of Visual Communication and Image Representation

TL;DR: A confidence-based depth recovery and high quality 3D view synthesis are proposed and Experimental results show that the proposed method yields higher quality recovered depth maps and synthesized image views than other previous methods.

...read moreread less

Journal Article•DOI•

High efficiency depth image-based rendering with simplified inpainting-based hole filling

[...]

Pin-Chen Kuo¹, Jhih-Ming Lin¹, Bin-Da Liu¹, Jar-Ferr Yang¹•Institutions (1)

National Cheng Kung University¹

01 Jul 2016-Multidimensional Systems and Signal Processing

TL;DR: A priority patch inpainting algorithm for hole filling in DIBR algorithms by generating multiple virtual views by applying texture-based interpolation method for crack filling and a prioritized method for selecting the critical patch is proposed to reduce computation time.

...read moreread less

Abstract: Hole and crack filling is the most important issue in depth-image-based rendering (DIBR) algorithms for generating virtual view images when only one view image and one depth map are available. This paper proposes a priority patch inpainting algorithm for hole filling in DIBR algorithms by generating multiple virtual views. A texture-based interpolation method is applied for crack filling. Then, an inpainting-based algorithm is applied patch by patch for hole filling. A prioritized method for selecting the critical patch is also proposed to reduce computation time. Finally, the proposed method is realized on the compute unified device architecture parallel computing platform which runs on a graphics processing unit. Simulation results show that the proposed algorithm is 51-fold faster for virtual view synthesis and achieves better virtual view quality compared to the traditional DIBR algorithm which contains depth preprocessing, warping, and hole filling.

...read moreread less