scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2016"


Book ChapterDOI
08 Oct 2016
TL;DR: This work addresses the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints and shows that for both objects and scenes, this approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.
Abstract: We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

660 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work presents a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets, and is the first to apply deep learning to the problem ofnew view synthesis from sets of real-world, natural imagery.
Abstract: Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision [22, 33], but their use in graphics problems has been limited ([23, 7] are notable recent exceptions). In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches, which consist of multiple complex stages of processing, each of which requires careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network, which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system, which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. We show view interpolation results on imagery from the KITTI dataset [12], from data from [1] as well as on Google Street View images. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.

551 citations


Journal ArticleDOI
11 Nov 2016
TL;DR: In this paper, a learning-based approach is proposed to synthesize new views from a sparse set of input views using two sequential convolutional neural networks to model disparity and color estimation components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images.
Abstract: With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

435 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel learning-based approach to synthesize new views from a sparse set of input views that could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.
Abstract: With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

427 citations


Journal ArticleDOI
TL;DR: This paper provides a fundamental examination of hole generation mechanism in the DIBR oriented view synthesis process and proposes utilizing the occluded information to identify and locate the relevant background pixels around a hole.
Abstract: View synthesis with depth-image-based rendering (DIBR) has attracted great interest in that it can provide a virtual image at any arbitrary viewpoint in 3-D video and free-viewpoint TV. An inherent problem in the DIBR view synthesis is occurrence of holes in a synthesized image, which is also known as disocclusion problem. The disoccluded regions need to be handled properly in order to generate a synthesized view of good quality. This paper provides a fundamental examination of hole generation mechanism in the DIBR oriented view synthesis process. A necessary and sufficient condition of hole generation is first shown, and the corresponding hole location and length is obtained analytically. Furthermore, in view that the conventional hole filling algorithms may fail to fill up a hole correctly when lacking (adequate) visible background information, we propose utilizing the occluded (invisible) information to identify and locate the relevant background pixels around a hole. We then make use of the visible and invisible background information together to perform hole filling. Experimental results validate our hole generation model demonstrating agreement to our analytical results, while our proposed hole filling approach shows superior performance in terms of visual quality of synthesized views.

67 citations


Journal ArticleDOI
TL;DR: This work proposes a new coding scheme for 3-D High Efficiency Video Coding (HEVC) that allows it to take full advantage of temporal correlations in the intermediate view and improve the existing synthesis from adjacent views.
Abstract: Multiview video (MVV) plus depths formats use view synthesis to build intermediate views from existing adjacent views at the receiver side. Traditional view synthesis exploits the disparity information to interpolate an intermediate view by considered inter-view correlations. However, temporal correlation between different frames of the intermediate view can be used to improve the synthesis. We propose a new coding scheme for 3-D High Efficiency Video Coding (HEVC) that allows us to take full advantage of temporal correlations in the intermediate view and improve the existing synthesis from adjacent views. We use optical flow techniques to derive dense motion vector fields (MVF) from the adjacent views and then warp them at the level of the intermediate view. This allows us to construct multiple temporal predictions of the synthesized frame. A second contribution is an adaptive fusion method that judiciously selects between temporal and inter-view prediction to eliminate artifacts associated with each prediction type. The proposed system is compared against the state-of-the-art view synthesis reference software 1-D Fast technique used in 3-D HEVC standardization. Three intermediary views are synthesized. Gains of up to 1.21-dB Bjontegaard Delta peak SNR are shown when evaluated on several standard MVV test sequences.

56 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: A hole filling approach based on background reconstruction is proposed, in which the temporal correlation information in both the 2D video and its corresponding depth map are exploited to construct a background video to eliminate holes in the synthetized video.
Abstract: The depth image based rendering (DIBR) plays a key role in 3D video synthesis, by which other virtual views can be generated from a 2D video and its depth map. However, in the synthesis process, the background occluded by the foreground objects might be exposed in the new view, resulting in some holes in the synthetized video. In this paper, a hole filling approach based on background reconstruction is proposed, in which the temporal correlation information in both the 2D video and its corresponding depth map are exploited to construct a background video. To construct a clean background video, the foreground objects are detected and removed. Also motion compensation is applied to make the background reconstruction model suitable for moving camera scenario. Each frame is projected to the current plane where a modified Gaussian mixture model is performed. The constructed background video is used to eliminate the holes in the synthetized video. Our experimental results have indicated that the proposed approach has better quality of the synthetized 3D video compared with the other methods.

50 citations


Journal ArticleDOI
TL;DR: A new reference view selection problem is cast that seeks the subset of views minimizing the distortion over a view navigation window defined by the user under bandwidth constraints, and an effective polynomial time algorithm using dynamic programming is proposed to solve the optimization problem.
Abstract: In multiview applications, camera views can be used as reference views to synthesize additional virtual viewpoints, allowing users to freely navigate within a 3D scene. However, bandwidth constraints may restrict the number of reference views sent to clients, limiting the quality of the synthesized viewpoints. In this work, we study the problem of in-network reference view synthesis aimed at improving the navigation quality at the clients. We consider a distributed cloud network architecture, where data stored in a main cloud is delivered to end users with the help of cloudlets, i.e., resource-rich proxies close to the users. We argue that, in case of limited bandwidth from the cloudlet to the users, re-sampling at the couldlet the viewpoints of the 3D scene (i.e., synthesizing novel virtual views in the cloudlets to be used as new references to the decoder) is beneficial compared to mere subsampling of the original set of camera views. We therefore cast a new reference view selection problem that seeks the subset of views minimizing the distortion over a view navigation window defined by the user under bandwidth constraints. We prove that the problem is NP-hard, and we propose an effective polynomial time algorithm using dynamic programming to solve the optimization problem under general assumptions that cover most of the multiview scenarios in practice. Simulation results confirm the performance gain offered by virtual view synthesis in the network.

32 citations


Patent
13 May 2016
TL;DR: In this article, a system and method of deep learning using deep networks to predict new views from existing images may generate and improve models and representations from large-scale data, which can be used in graphics generation.
Abstract: A system and method of deep learning using deep networks to predict new views from existing images may generate and improve models and representations from large-scale data. This system and method of deep learning may employ a deep architecture performing new view synthesis directly from pixels, trained from large numbers of posed image sets. A system employing this type of deep network may produce pixels of an unseen view based on pixels of neighboring views, lending itself to applications in graphics generation.

31 citations


Journal ArticleDOI
TL;DR: The virtual view synthesis procedure and the distortion propagation from existing views to virtual views are analyzed in detail, and then a virtual view distortion/PSNR estimation method is derived and Experimental results demonstrate that the proposed method could estimate PSNRs of virtual views accurately.
Abstract: In three-dimensional videos (3-DVs) with ${n}$ -view texture videos plus ${n}$ -view depth maps, virtual views can be synthesized from neighboring texture videos and the associated depth maps. To evaluate the system performance or guide the rate-distortion-optimization process of 3-DV coding, the distortion/PSNR of the virtual view should be calculated by measuring the quality difference between the virtual view synthesized by compressed 3-DVs with one synthesized by uncompressed 3-DVs, which increases the complexity of a 3-DV system. In order to reduce the complexity of 3-DV system, it is better to estimate virtual view distortions/PSNR directly without rendering virtual views. In this paper, the virtual view synthesis procedure and the distortion propagation from existing views to virtual views are analyzed in detail, and then a virtual view distortion/PSNR estimation method is derived. Experimental results demonstrate that the proposed method could estimate PSNRs of virtual views accurately. The squared correlation coefficient and root of mean squared error between the estimated PSNRs by the proposed method and the actual PSNRs are 0.998 and 2.012 on average for all the tested sequences. Since the proposed method is implemented row-by-row independently, it is also friendly for parallel design. The execute time for each row of pictures with $1024 {\times }768$ resolution is only 0.079 s, while for pictures with $1920 {\times }1088$ resolution it is only 0.155 s.

30 citations


Posted Content
TL;DR: In this article, a CNN-based approach is proposed to synthesize novel views of the same object or scene observed from arbitrary viewpoints from an input image. But, instead of synthesizing pixels from scratch, they learn to copy them from the input image, which is different from our approach.
Abstract: We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows -- 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

Journal ArticleDOI
TL;DR: An analytical model to estimate the depth-error-induced virtual view synthesis distortion (VVSD) in 3D video, taking the distance between reference and virtual views (virtual view position) into account is proposed.
Abstract: We propose an analytical model to estimate the depth-error-induced virtual view synthesis distortion (VVSD) in 3D video, taking the distance between reference and virtual views (virtual view position) into account. In particular, we start with a comprehensive preanalysis and discussion over several possible VVSD scenarios. Taking intrinsic characteristic of each scenario into consideration, we specifically classify them into four clusters: 1) overlapping region; 2) disocclusion and boundary region; 3) edge region; and 4) infrequent region. We propose to model VVSD as the linear combination of the distortion under different scenarios (DDSs) weighted by the probability under different scenarios (PDSs). We show analytically that DDS and PDS can be related to the virtual view position using quadratic/biquadratic models and linear models, respectively. Experimental results verify that the proposed model is capable of estimating the relationship between VVSD and the distance between reference and virtual views. Therefore, our model can be used to inform a reference view setup for capturing, or distortion at certain virtual view positions, when depth information is compressed.

Journal ArticleDOI
TL;DR: In view of supporting future highquality, auto-stereoscopic 3D displays and Free Navigation virtual/augmented reality applications with sparse, arbitrarily arranged camera setups, innovative depth estimation and virtual view synthesis techniques with global optimizations over all camera views should be developed.
Abstract: ISO/IEC MPEG and ITU-T VCEG have recently jointly issued a new multiview video compression standard, called 3D-HEVC, which reaches unpreceded compression performances for linear,dense camera arrangements. In view of supporting future highquality,auto-stereoscopic 3D displays and Free Navigation virtual/augmented reality applications with sparse, arbitrarily arranged camera setups, innovative depth estimation and virtual view synthesis techniques with global optimizations over all camera views should be developed. Preliminary studies in response to the MPEG-FTV (Free viewpoint TV) Call for Evidence suggest these targets are within reach, with at least 6% bitrate gains over 3DHEVC technology.

Journal ArticleDOI
TL;DR: A layered depth image (LDI) is introduced in the original camera view, in which it is proposed to identify and fill occluded background so that when the LDI data is rendered to a virtual view, no disocclusions appear but views with consistent data are produced also handling translucent disocclusion.

Posted Content
TL;DR: In this article, a recurrent convolutional encoder-decoder network is proposed to synthesize novel views of a 3D object from a single image, which can capture long-term dependencies along a sequence of transformations.
Abstract: An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: In this approach, virtual views are synthesized using two neighboring real views, but the disoccluded areas are not inpainted, but filled by the information from the further real views — additional views, traditionally not included in the view synthesis.
Abstract: In the paper, we propose a new method for the virtual view synthesis called as Multiview Synthesis. In our approach, virtual views are synthesized using two neighboring real views, but the disoccluded areas are not inpainted, but filled by the information from the further real views — additional views, traditionally not included in the view synthesis. The whole synthesis is performed with triangles rather than with individual pixels. In the proposed Multiview Synthesis, additional steps of adaptive color correction, blurred edge removal and spatial edge blurring are also included. The experimental results show that both the objective and subjective quality of the synthesized views is significantly higher than the quality of views obtained using the state-of-the-art MPEG reference view synthesis software.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that spatio-temporal inconsistencies are significantly reduced using the proposed method and subjective and objective qualities are improved compared to state-of-the-art reference methods.
Abstract: Depth-image-based rendering (DIBR) is a commonly used method for synthesizing additional views using video-plus-depth (V+D) format. A critical issue with DIBR-based view synthesis is the lack of information behind foreground objects. This lack is manifested as disocclusions, holes, next to the foreground objects in rendered virtual views as a consequence of the virtual camera “seeing” behind the foreground object. The disocclusions are larger in the extrapolation case, i.e. the single camera case. Texture synthesis methods (inpainting methods) aim to fill these disocclusions by producing plausible texture content. However, virtual views inevitably exhibit both spatial and temporal inconsistencies at the filled disocclusion areas, depending on the scene content. In this paper, we propose a layered depth image (LDI) approach that improves the spatio-temporal consistency. In the process of LDI generation, depth information is used to classify the foreground and background in order to form a static scene sprite from a set of neighboring frames. Occlusions in the LDI are then identified and filled using inpainting, such that no disocclusions appear when the LDI data is rendered to a virtual view. In addition to the depth information, optical flow is computed to extract the stationary parts of the scene and to classify the occlusions in the inpainting process. Experimental results demonstrate that spatio-temporal inconsistencies are significantly reduced using the proposed method. Furthermore, subjective and objective qualities are improved compared to state-of-the-art reference methods.

Journal ArticleDOI
TL;DR: To enhance compression performance, the synthesized view distortion, which is evaluated by emulating the interpolation and the virtual view synthesis process, is used in the optimization objective function for coding mode selection in the video encoder.
Abstract: In this paper, we propose a depth map down-sampling and coding scheme that minimizes the view synthesis distortion. Moreover, a solution for the optimal depth map down-sampling problem that minimizes the depth-caused distortion in the virtual view by exploiting the depth map and the associated texture information along with the up-sampling method to be used in the decoder side is derived. Furthermore, to enhance compression performance, the synthesized view distortion, which is evaluated by emulating the interpolation and the virtual view synthesis process, is used in the optimization objective function for coding mode selection in the video encoder. Experimental results show that both the proposed depth map down-sampling and encoding methods lead to good performance, and the average bit rate reduction is 2.62 $\%$ compared with 3D-AVC.

Proceedings ArticleDOI
11 Jul 2016
TL;DR: The experimental results show that view synthesis based on the proposed MRF-based inpainting method systematically improves performance over the state-of-the-art in multiview view synthesis.
Abstract: View synthesis using depth image-based rendering generates virtual viewpoints of a 3D scene based on texture and depth information from a set of available cameras. One of the core components in view synthesis is image inpainting which performs the reconstruction of areas that were occluded in the available cameras but are visible from the virtual viewpoint. Inpainting methods based on Markov random fields (MRFs) have been shown to be very effective in inpainting large areas in images. In this paper, we propose a novel MRF-based in-painting method for multiview video. The proposed method steers the MRF optimization towards completion from background to foreground and exploits the available depth information in order to avoid bleeding artifacts. The proposed approach allows for efficiently filling-in large disocclusion areas and greatly accelerates execution compared to traditional MRF-based inpainting techniques. The experimental results show that view synthesis based on the proposed inpainting method systematically improves performance over the state-of-the-art in multiview view synthesis. Average PSNR gains up to 1.88 dB compared to the MPEG View Synthesis Reference software were observed.

Journal ArticleDOI
TL;DR: This work studies the reference view selection problem and proposes an algorithm for the optimal selection of reference views in multiview coding systems, and formulation of an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized.
Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video, and other emerging multimedia applications require large sets of images, hence larger data volumes and increased resources compared with traditional video services. The significant increase in the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multiview video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the data set. In such coding schemes, the two following questions become fundamental: 1) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? And 2) where to place these key views in the multiview data set? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views, such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multiview coding scheme. We show that considering the 3D scene geometry in the reference view, positioning problem brings significant rate–distortion improvements and outperforms the traditional coding strategy that simply selects key frames based on the distance between cameras.

Proceedings ArticleDOI
11 Jul 2016
TL;DR: A new hole-filling technique using the number of GMM model rather than the background image to identify background/foreground pixels, which provides 0.9~1.7dB PSNR improvement compare to the state-of-the-art method.
Abstract: View synthesis technique for 3D video and free viewpoint video (FVV) using existing view(s) can avoid the large volume of video data transmission. Existing techniques may concern poor rendering quality by missing pixel values (i.e. creating holes) due to the occluded region, rounding error and disparity discontinuity. To address those problems with the existing techniques uses correlations in spatial texture only or both spatial texture and temporal background. The former techniques (e.g. inpainting) suffer quality degradation due to lack of spatial correlation on the foreground-background boundary areas. The latter techniques (e.g. background update with Gaussian Mixture-based Modelling (GMM)) can improve quality in some occluded areas, however, due to the dependency on warping of background image and spatial correlation they still suffer quality degradation. In this paper, we propose a new hole-filling technique using the number of GMM model rather than the background image to identify background/foreground pixels. The missing pixels of background and foreground are recovered from the background pixel and the weighted average of warped and foreground model pixels respectively. The experimental results show that the proposed approach provides 0.9∼1.7dB PSNR improvement compare to the state-of-the-art method.

Patent
08 Nov 2016
TL;DR: In this paper, an efficient method is used for extracting objects at partially occluded regions as defined by the auxiliary data from the texture videos to facilitate view synthesis with reduced artifacts.
Abstract: Original or compressed Auxiliary Data, including possibly major depth discontinuities in the form of shape images, partial occlusion data, associated tuned and control parameters, and depth information of the original video(s), are used to facilitate the interactive display and generation of new views (view synthesis) of conventional 2D, stereo, and multi-view videos in conventional 2D, 3D (stereo) and multi-view or autostereoscopic displays with reduced artifacts. The partial or full occlusion data includes image, depth and opacity data of possibly partially occluded areas to facilitate the reduction of artifacts in the synthesized view. An efficient method is used for extracting objects at partially occluded regions as defined by the auxiliary data from the texture videos to facilitate view synthesis with reduced artifacts. Further, a method for updating the image background and the depth values uses the auxiliary data after extraction of each object to reduce the artifacts due to limited performance of online inpainting of missing data or holes during view synthesis.

Patent
22 Aug 2016
TL;DR: In this article, an innovative view merging method coupled with an efficient hole filling procedure compensates for depth misregistrations and inaccuracies to produce realistic synthesized views for full parallax light field displays is described.
Abstract: An innovative method for synthesis of compressed light fields is described. Compressed light fields are commonly generated by sub-sampling light field views. The suppressed views must then be synthesized at the display, utilizing information from the compressed light field. The present invention describes a method for view synthesis that utilizes depth information of the scene to reconstruct the absent views. An innovative view merging method coupled with an efficient hole filling procedure compensates for depth misregistrations and inaccuracies to produce realistic synthesized views for full parallax light field displays.

Proceedings ArticleDOI
10 Oct 2016
TL;DR: This paper proposes a layered image representation which is re-composed for the novel view with a special reconstruction filter, allowing to produce a typical novel view of 1024×1024 pixels in ca.
Abstract: Novel-view synthesis can be used to hide latency in a real-time remote rendering setup, to increase frame rate or to produce advanced visual effects such as depth-of-field or motion blur in volumes or stereo and light field imagery. Regrettably, existing real-time solutions are limited to opaque surfaces. Prior art has circumvented the challenge by making volumes opaque i. e., projecting the volume onto representative surfaces for reprojection, omitting correct volumetric effects. This paper proposes a layered image representation which is re-composed for the novel view with a special reconstruction filter. We propose a view-dependent approximation to the volume allowing to produce a typical novel view of 1024×1024 pixels in ca. 25ms on a current GPU. At the heart of our approach is the idea to compress the complex view-dependent emission-absorption function along original view rays into a layered piecewise-analytic emission-absorption representation that can be efficiently ray-cast from a novel view. It does not assume opaque surfaces or approximate color and opacity, can be re-evaluated very efficiently, results in an image identical to the reference from the original view, has correct volumetric shading for novel views and works on a low and fixed number of layers per pixel that fits modern GPU architectures.

Patent
21 Sep 2016
TL;DR: In this article, a panoramic 3D video generation method for virtual reality equipment is presented, which includes the following steps: shooting a scene video using a wide-angle camera array, and generating a panorama video through a stitching algorithm; shooting the scene depth map using a depth camera array and generating the panorama depth map video through the stitching algorithm, by detecting the head position of a person in real time, cutting an image of a corresponding position in the panorama video frame as a video of the left view; and then generating a right view image through extrap
Abstract: The invention discloses a panoramic 3D video generation method for virtual reality equipment, comprising the following steps: shooting a scene video using a wide-angle camera array, and generating a panoramic video through a stitching algorithm; shooting a scene depth map using a depth camera array, and generating a panoramic depth map video through the stitching algorithm; by detecting the head position of a person in real time, cutting an image of a corresponding position in the panoramic video frame as a video of the left view; and generating a right view image through extrapolation based on a virtual view synthesis technology according to the left view image and the corresponding depth map, wherein the two images are stitched into a left-right 3D video which is displayed in virtual reality equipment. By adding the view synthesis technology to a panoramic video displayed by virtual reality equipment, viewers can see the 3D effect of the panoramic video, the scene is more lifelike, and the viewer experience is enhanced.

Journal ArticleDOI
TL;DR: A predicted hole mapping (PHM) algorithm is presented, which requires no filling priority and smoothing operation, allowing parallel computation that facilitates a real-time 3D conversion system.
Abstract: Three-dimensional (3D) display technologies make great process in recent years. View synthesis for 3D content requires the hole-filling, which is a challenging task. The increase of resolution and the number of views for view synthesis brings new challenges on memory and processing speed. A predicted hole mapping (PHM) algorithm is presented, which requires no filling priority and smoothing operation, allowing parallel computation that facilitates a real-time 3D conversion system. In experiments, the proposed PHM is evaluated and compared with some other methods in terms of peak signal to noise ratio and structural similarity index measurement, and the result shows the advantages in the numbers. The method can operate on the 32-view display with 4K × 2K resolution in real time on GPU.

Proceedings ArticleDOI
01 Sep 2016
TL;DR: The results show the correlation between the number of occlusions in the scene and a gain from using camera pairs instead of uniformly distributed cameras.
Abstract: In the article we deal with the problem of camera positioning in sparse multiview systems with applications to free navigation. The limited number of the cameras, though makes the system relatively practical, implies problems with proper depth estimation and virtual view synthesis, due to increased amount of the occluded areas. We present experimental results for the optimal positioning of the cameras, depending on two factors — characteristics of an acquired scene and the multi-camera system (linear or circular camera setup). The results show the correlation between the number of occlusions in the scene and a gain from using camera pairs instead of uniformly distributed cameras.

Journal ArticleDOI
01 Jun 2016
TL;DR: A trilateral depth filter with local texture information, spatial proximity, and color similarity is incorporated to remove the ghost contours by rectifying the misalignment between the depth map and its associated color image.
Abstract: Numerous depth image-based rendering algorithms have been proposed to synthesize the virtual view for the free viewpoint television. However, inaccuracies in the depth map cause visual artifacts in the virtual view. In this paper, we propose a novel virtual view synthesis framework to create the virtual view of the scene. Here, we incorporate a trilateral depth filter with local texture information, spatial proximity, and color similarity to remove the ghost contours by rectifying the misalignment between the depth map and its associated color image. To further enhance the quality of the synthesized virtual views, we partition the scene into different 3D object segments based on the color image and depth map. Each 3D object segment is warped and blended independently to avoid mixing the pixels belonging to different parts of the scene. The evaluation results indicate that the proposed method significantly improves the quality of the synthesized virtual view compared with other methods and are qualitatively very similar to the ground truth. In addition, it also performs well in real-world scenes.

Journal ArticleDOI
TL;DR: A confidence-based depth recovery and high quality 3D view synthesis are proposed and Experimental results show that the proposed method yields higher quality recovered depth maps and synthesized image views than other previous methods.

Journal ArticleDOI
TL;DR: A priority patch inpainting algorithm for hole filling in DIBR algorithms by generating multiple virtual views by applying texture-based interpolation method for crack filling and a prioritized method for selecting the critical patch is proposed to reduce computation time.
Abstract: Hole and crack filling is the most important issue in depth-image-based rendering (DIBR) algorithms for generating virtual view images when only one view image and one depth map are available. This paper proposes a priority patch inpainting algorithm for hole filling in DIBR algorithms by generating multiple virtual views. A texture-based interpolation method is applied for crack filling. Then, an inpainting-based algorithm is applied patch by patch for hole filling. A prioritized method for selecting the critical patch is also proposed to reduce computation time. Finally, the proposed method is realized on the compute unified device architecture parallel computing platform which runs on a graphics processing unit. Simulation results show that the proposed algorithm is 51-fold faster for virtual view synthesis and achieves better virtual view quality compared to the traditional DIBR algorithm which contains depth preprocessing, warping, and hole filling.