scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2011"


Journal ArticleDOI
01 Apr 2011
TL;DR: This paper describes efficient coding methods for video and depth data, and synthesis methods are presented, which mitigate errors from depth estimation and coding, for the generation of views.
Abstract: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.

420 citations


Journal ArticleDOI
TL;DR: This paper considers the DIBR-based synthesized view evaluation problem, and provides hints for a new objective measure for 3DTV quality assessment.
Abstract: 3DTV technology has brought out new challenges such as the question of synthesized views evaluation. Synthesized views are generated through a depth image-based rendering (DIBR) process. This process induces new types of artifacts whose impact on visual quality has to be identified considering various contexts of use. While visual quality assessment has been the subject of many studies in the last 20 years, there are still some unanswered questions regarding new technological improvement. DIBR is bringing new challenges mainly because it deals with geometric distortions. This paper considers the DIBR-based synthesized view evaluation problem. Different experiments have been carried out. They question the protocols of subjective assessment and the reliability of the objective quality metrics in the context of 3DTV, in these specific conditions (DIBR-based synthesized views), and they consist in assessing seven different view synthesis algorithms through subjective and objective measurements. Results show that usual metrics are not sufficient for assessing 3-D synthesized views, since they do not correctly render human judgment. Synthesized views contain specific artifacts located around the disoccluded areas, but usual metrics seem to be unable to express the degree of annoyance perceived in the whole image. This study provides hints for a new objective measure. Two approaches are proposed: the first one is based on the analysis of the shifts of the contours of the synthesized view; the second one is based on the computation of a mean SSIM score of the disoccluded areas.

218 citations


Journal ArticleDOI
TL;DR: A new hole filling approach for DIBR using texture synthesis is presented and results show that the proposed approach provides improved rendering results in comparison to the latest MPEG view synthesis reference software (VSRS) version 3.6.
Abstract: A depth image-based rendering (DIBR) approach with advanced inpainting methods is presented. The DIBR algorithm can be used in 3-D video applications to synthesize a number of different perspectives of the same scene, e.g., from a multiview-video-plus-depth (MVD) representation. This MVD format consists of video and depth sequences for a limited number of original camera views of the same natural scene. Here, DIBR methods allow the computation of additional new views. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original views may become visible, especially in extrapolated views beyond the viewing range of the original cameras. The presented algorithm synthesizes these occluded textures. The synthesizer achieves visually satisfying results by taking spatial and temporal consistency measures into account. Detailed experiments show significant objective and subjective gains of the proposed method in comparison to the state-of-the-art methods.

172 citations


Journal ArticleDOI
24 Jan 2011
TL;DR: This paper gives an overview of the state-of-the-art in 3-D video postproduction and processing as well as an outlook to remaining challenges and opportunities.
Abstract: This paper gives an overview of the state-of-the-art in 3-D video postproduction and processing as well as an outlook to remaining challenges and opportunities. First, fundamentals of stereography are outlined that set the rules for proper 3-D content creation. Manipulation of the depth composition of a given stereo pair via view synthesis is identified as the key functionality in this context. Basic algorithms are described to adapt and correct fundamental stereo properties such as geometric distortions, color alignment, and stereo geometry. Then, depth image-based rendering is explained as the widely applied solution for view synthesis in 3-D content creation today. Recent improvements of depth estimation already provide very good results. However, in most cases, still interactive workflows dominate. Warping-based methods may become an alternative for some applications in the future, which do not rely on dense and accurate depth estimation. Finally, 2-D to 3-D conversion is covered, which is an important special area for reuse of existing legacy 2-D content in 3-D. Here various advanced algorithms are combined in interactive workflows.

160 citations


Proceedings ArticleDOI
16 May 2011
TL;DR: A depth-based inpainting algorithm which efficiently handles disocclusion occurring on virtual viewpoint rendering and fills in larger disocclusions in distant synthesized views based on a coherent tensor-based color and geometry structure propagation.
Abstract: This paper describes a depth-based inpainting algorithm which efficiently handles disocclusion occurring on virtual viewpoint rendering. A single reference view and a set of depth maps are used in the proposed approach. The method not only deals with small disocclusion filling related to small camera baseline, but also manages to fill in larger disocclusions in distant synthesized views. This relies on a coherent tensor-based color and geometry structure propagation. The depth is used to drive the filling order, while enforcing the structure diffusion from similar candidate-patches. By acting on patch prioritization, selection and combination, the completion of distant synthesized views allows a consistent and realistic rendering of virtual viewpoints.

89 citations


Journal ArticleDOI
TL;DR: A novel solution of suppression of misalignment and alignment enforcement between texture and depth to reduce background noises and foreground erosion, respectively, among different types of boundary artifacts is proposed.
Abstract: 3D Video (3DV) with depth-image-based view synthesis is a promising candidate of next generation broadcasting applications. However, the synthesized views in 3DV are often contaminated by annoying artifacts, particularly notably around object boundaries, due to imperfect depth maps (e.g., produced by state-of-the-art stereo matching algorithms or compressed lossily). In this paper, we first review some representative methods for boundary artifact reduction in view synthesis, and make an in-depth investigation into the underlying mechanisms of boundary artifact generation from a new perspective of texture-depth alignment in boundary regions. Three forms of texture-depth misalignment are identified as the causes for different boundary artifacts, which mainly present themselves as scattered noises on the background and object erosion on the foreground. Based on the insights gained from the analysis, we propose a novel solution of suppression of misalignment and alignment enforcement (denoted as SMART) between texture and depth to reduce background noises and foreground erosion, respectively, among different types of boundary artifacts. The SMART is developed as a three-step pre-processing in view synthesis. Experiments on view synthesis with original and compressed texture/depth data consistently demonstrate the superior performance of the proposed method as compared with other relevant boundary artifact reduction schemes.

85 citations


Journal ArticleDOI
TL;DR: A depth no-synthesis-error (D-NOSE) model is developed to examine the allowable depth distortions in rendering a virtual view without introducing any geometry changes and shows that avirtual view can be synthesized losslessly if depth distortions follow the D-Nose specified thresholds.
Abstract: Currently, 3-D Video targets at the application of disparity-adjustable stereoscopic video, where view synthesis based on depth-image-based rendering (DIBR) is employed to generate virtual views. Distortions in depth information may introduce geometry changes or occlusion variations in the synthesized views. In practice, depth information is stored in 8-bit grayscale format, whereas the disparity range for a visually comfortable stereo pair is usually much less than 256 levels. Thus, several depth levels may correspond to the same integer (or sub-pixel) disparity value in the DIBR-based view synthesis such that some depth distortions may not result in geometry changes in the synthesized view. From this observation, we develop a depth no-synthesis-error (D-NOSE) model to examine the allowable depth distortions in rendering a virtual view without introducing any geometry changes. We further show that the depth distortions prescribed by the proposed D-NOSE profile also do not compromise the occlusion order in view synthesis. Therefore, a virtual view can be synthesized losslessly if depth distortions follow the D-NOSE specified thresholds. Our simulations validate the proposed D-NOSE model in lossless view synthesis and demonstrate the gain with the model in depth coding.

73 citations


Journal ArticleDOI
TL;DR: Experimental results suggest that the bit rate for depth map coding can be reduced up to 78% for the depth maps captured with depth-range cameras and up to 24% with computer vision algorithms, without affecting the 3-D visual quality or the arbitrary view synthesis quality for free-viewpoint video applications.
Abstract: This paper addresses the sensitivity of human vision to spatial depth variations in a 3-D video scene, seen on a stereoscopic display, based on an experimental derivation of a just noticeable depth difference (JNDD) model. The main target is to exploit the depth perception sensitivity of humans in suppressing the unnecessary spatial depth details, hence reducing the transmission overhead allocated to depth maps. Based on the JNDD model derived, depth map sequences are preprocessed to suppress the depth details that are not perceivable by the viewers and to minimize the rendering artefacts that arise due to optical noise, where the optical noise is triggered by the inaccuracies in the depth estimation process. Theoretical and experimental evidences are provided to illustrate that the proposed depth adaptive preprocessing filter does not alter the 3-D visual quality or the view synthesis quality for free-viewpoint video applications. Experimental results suggest that the bit rate for depth map coding can be reduced up to 78% for the depth maps captured with depth-range cameras and up to 24% for the depth maps estimated with computer vision algorithms, without affecting the 3-D visual quality or the arbitrary view synthesis quality.

55 citations


Journal ArticleDOI
TL;DR: A novel algorithm to generate multiple virtual views from a video-plus-depth sequence for modern autostereoscopic displays with an iterative re-weighted framework to jointly consider intensity and depth consistency in the adjacent frames is proposed.
Abstract: In this paper, we propose a novel algorithm to generate multiple virtual views from a video-plus-depth sequence for modern autostereoscopic displays. To synthesize realistic content in the disocclusion regions at the virtual views is the main challenging problem for this task. Spatial coherence and temporal consistency are the two key factors to produce perceptually satisfactory virtual images. The proposed algorithm employs the spatio-temporal consistency constraint to handle the uncertain pixels in the disocclusion regions. On the one hand, regarding the spatial coherence, we incorporate the intensity gradient strength with the depth information to determine the filling priority for inpainting the disocclusion regions, so that the continuity of image structures can be preserved. On the other hand, the temporal consistency is enforced by estimating the intensities in the disocclusion regions across the adjacent frames with an optimization process. We propose an iterative re-weighted framework to jointly consider intensity and depth consistency in the adjacent frames, which not only imposes temporal consistency but also reduces noise disturbance. Finally, for accelerating the multi-view synthesis process, we apply the proposed view synthesis algorithm to generate the intensity and depth maps at the leftmost and rightmost viewpoints, so that the intermediate views are efficiently interpolated through image warping according to the associated depth maps between the two synthesized images and their corresponding symmetric depths. In the experimental validation, we perform quantitative evaluation on synthetic data as well as subjective assessment on real video data with comparison to some representative methods to demonstrate the superior performance of the proposed algorithm.

51 citations


Journal ArticleDOI
TL;DR: This work proposes a novel content adaptive enhancement technique applied to the previously estimated multi-view depth map sequences that enforces consistency across the spatial, temporal and inter-view dimensions of the depth maps so that both the coding efficiency and the quality of the synthesized views are improved.
Abstract: Depth map estimation is an important part of the multi-view video coding and virtual view synthesis within the free viewpoint video applications. However, computing an accurate depth map is a computationally complex process, which makes real-time implementation challenging. Alternatively, a simple estimation, though quick and promising for real-time processing, might result in inconsistent multi-view depth map sequences. To exploit this simplicity and to improve the quality of depth map estimation, we propose a novel content adaptive enhancement technique applied to the previously estimated multi-view depth map sequences. The enhancement method is locally adapted to edges, motion and depth-range of the scene to avoid blurring the synthesized views and to reduce the computational complexity. At the same time, and very importantly, the method enforces consistency across the spatial, temporal and inter-view dimensions of the depth maps so that both the coding efficiency and the quality of the synthesized views are improved. We demonstrate these improvements in the experiments, where the enhancement method is applied to several multi-view test sequences and the obtained synthesized views are compared to the views synthesized using other methods in terms of both numerical and perceived visual quality.

50 citations


Proceedings ArticleDOI
29 Dec 2011
TL;DR: A novel algorithm for depth map compression explicitly signaling the location of discontinuities is proposed, which can prove the superior visual quality originating from less geometric distortions of synthesized views using compressed depth maps.
Abstract: Emerging video technologies like 3DTV and Free Viewpoint Video require to transmit more information than just 2D color data. To allow rendering of arbitrary viewpoints of a video scene a new data format including 2D color and an accompanying depth map has been proposed. Consequently this new technique requires an efficient coding method not only for color information, but also for depth data. Depth maps are characterized by segments describable by piecewise linear functions bounded by sharp edges. Preserving these depth discontinuities is a crucial requirement for high quality view synthesis. To adapt to these characteristics we propose a novel algorithm for depth map compression explicitly signaling the location of discontinuities. Experimental results show that the proposed method yields up to 9dB PSNR gain compared to a JPEG-2000 encoder in high-bitrate scenarios. Subjective quality assessment of synthesized views using compressed depth maps can prove the superior visual quality originating from less geometric distortions.

Proceedings ArticleDOI
11 Jul 2011
TL;DR: This method applies purely image domain warping to content creation for autostereoscopic display, supporting multiview creation from stereo input, which is the most relevant use case scenario.
Abstract: Content creation for autostereoscopic display is a widely unresolved task Typical methods rely on view synthesis based on depth image based rendering Our method applies purely image domain warping instead Input video is analyzed and information about sparse disparity, vertical edges and saliency is extracted A constrained energy minimization problem is formulated and efficiently solved The resulting image warping functions are used to synthesize novel views Our approach is fully automatic, accurate, and reliable Disocclusions and related artifacts are avoided due to smooth, saliency-driven warping functions Our method also works well for extrapolation of views in a limited range, thus supporting multiview creation from stereo input, which is the most relevant use case scenario

Proceedings ArticleDOI
05 Dec 2011
TL;DR: A spherical ego-centric representation of the environment is proposed that is able to reproduce photo-realistic omnidirectional views of captured environments and is used for real-time model-based localisation and navigation.
Abstract: This paper presents a method and apparatus for building dense visual maps of large scale 3D environments for real-time localisation and navigation. A spherical ego-centric representation of the environment is proposed that is able to reproduce photo-realistic omnidirectional views of captured environments. This representation is novel in that it is composed of a graph of locally accurate augmented spherical panoramas that allows to generate varying viewpoints through novel view synthesis. The spheres are related by a graph of 6dof poses which are estimated through multi-view spherical registration. To acquire these models, a multi-baseline acquisition system has been designed and built which is based on an outward facing ring of cameras with diverging views. This configuration allows to capture high resolution spherical images of the environment and compute a dense depth map through a wide baseline dense correspondence algorithm. A calibration procedure is developed for an outward facing camera ring that imposes a loop closing constraint, in order to obtain a consistent set of extrinsic parameters. This spherical sensor is shown to acquire compact, accurate and efficient representations of large environments and is used for real-time model-based localisation.

Proceedings ArticleDOI
29 Dec 2011
TL;DR: This paper explicitly manipulate depth values, without causing severe synthesized view distortion, in order to maximize representation sparsity in the transform domain for compression gain, and shows that their TDS approach gained up to 1.7dB in rate-distortion performance for the interpolated view over compression of unaltered depth maps.
Abstract: Compression of depth maps is important for “texture plus depth” format of multiview images, which enables synthesis of novel intermediate views via depth-image-based rendering (DIBR) at decoder. Previous depth map coding schemes exploit unique depth data characteristics to compactly and faithfully reproduce the original signal. In contrast, since depth map is only a means to the end of view synthesis and not itself viewed, in this paper we explicitly manipulate depth values, without causing severe synthesized view distortion, in order to maximize representation sparsity in the transform domain for compression gain — we call this process transform domain spar-sification (TDS). Specifically, for each pixel in the depth map, we first define a quadratic penalty function, with minimum at ground truth depth value, based on synthesized view's distortion sensitivity to the pixel's depth value during DIBR. We then define an objective for a depth signal in a block as a weighted sum of: i) signal's sparsity in the transform domain, and ii) per-pixel synthesized view distortion penalties for the chosen signal. Given that sparsity (70-norm) is non-convex and difficult to optimize, we replace the Zo-norm in the objective with a computationally inexpensive weighted 12-norm; the optimization is then an unconstrained quadratic program, solvable via a set of linear equations. For the weighted /2-norm to promote sparsity, we solve the optimization iteratively, where at each iteration weights are readjusted to mimic sparsity-promoting Z T -norm, 0 < r < 1. Using JPEG as an example transform codec, we show that our TDS approach gained up to 1.7dB in rate-distortion performance for the interpolated view over compression of unaltered depth maps.

Proceedings ArticleDOI
16 May 2011
TL;DR: An effective virtual view synthesis approach, which utilizes the technology of depth-image-based rendering (DIBR), which is effective and reliable in both of subjective and objective evaluations.
Abstract: We propose an effective virtual view synthesis approach, which utilizes the technology of depth-image-based rendering (DIBR). In our scheme, two reference color images and their associated depth maps are used to generate the arbitrary virtual viewpoint. Firstly, the main and auxiliary viewpoint images are warped to the virtual viewpoint. After that, the cracks and error points are removed to enhance the image quality. Then, we complement the disocclusions of the virtual viewpoint image warped from the main viewpoint with the help of the auxiliary viewpoint. In order to reduce the color incontinuity of the virtual view, the brightness of the two reference viewpoint images are adjusted. Finally, the holes are filled by the depth-assistance asymmetric dilation inpainting method. Simulations show that the view synthesis approach is effective and reliable in both of subjective and objective evaluations.

Journal ArticleDOI
TL;DR: A real-time HD1080p view synthesis engine based on the reference algorithm from 3-D video coding team is presented by solving high computational complexity and high memory cost problems by solving bilinear interpolation and Z scaling method with floating-point format.
Abstract: This paper presents a real-time HD1080p view synthesis engine based on the reference algorithm from 3-D video coding team by solving high computational complexity and high memory cost problems. For the computational complexity, we propose the bilinear interpolation to simplify the hole filling process, and the Z scaling method with floating-point format to reduce the cost of homography calculation. For the memory cost, we propose the frame-level pipelining to reduce the requirement of warped depth maps, and the column-order warping method to remove the Z-buffer in occlusion handling. With the 90 nm complementary metal-oxide-semiconductor technology, our view synthesis engine can archive the throughput of 32.4 f/s for HD1080p videos with the gate count of 268.5 K and the internal memory of 69.4 kbytes. The experimental result shows our implementation has the similar synthesis quality as the original reference algorithm.

Proceedings ArticleDOI
29 Dec 2011
TL;DR: Results show that the most commonly used objective metrics can be far from human judgment depending on the artifact to deal with.
Abstract: This paper addresses the problem of evaluating virtual view synthesized images in the multi-view video context. As a matter of fact, view synthesis brings new types of distortion. The question refers to the ability of the traditional used objective metrics to assess synthesized views quality, considering the new types of artifacts. The experiments conducted to determine their reliability consist in assessing seven different view synthesis algorithms. Subjective and objective measurements have been performed. Results show that the most commonly used objective metrics can be far from human judgment depending on the artifact to deal with.

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper presents a new method for view synthesis that is both fast and accurate and has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.
Abstract: From a rectified stereo image pair, the task of view synthesis is to generate images from any viewpoint along the baseline. The main difficulty of the problem is how to fill occluded regions. In this paper, we present a new method for view synthesis that is both fast and accurate. Occlusions are filled using color and disparity information to produce consistent pixel estimates. Results are comparable to current state-of-the-art methods in terms of objective measures while computation time is drastically reduced. This work has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.

Proceedings ArticleDOI
11 Jul 2011
TL;DR: A cubic synthesized view distortion model is derived to describe the visual quality of an interpolated view as a function of the view's location, and it is shown how optimal bit allocation can be performed to minimize the maximum view synthesis distortion at any intermediate viewpoint.
Abstract: “Texture-plus-depth” has become a popular coding format for multiview image compression, where a decoder can synthesize images at intermediate viewpoints using encoded texture and depth maps of closest captured view locations via depth-image-based rendering (DIBR). As in other resource-constrained scenarios, limited available bits must be optimally distributed among captured texture and depth maps to minimize the expected signal distortion at the decoder. A specific challenge of multiview image compression for DIBR is that the encoder must allocate bits without the knowledge of how many and which specific virtual views will be synthesized at the decoder for viewing. In this paper, we derive a cubic synthesized view distortion model to describe the visual quality of an interpolated view as a function of the view's location. Given the model, one can easily find the virtual view location between two coded views where the maximum synthesized distortion occurs. Using a multiview image codec based on shape-adaptive wavelet transform, we show how optimal bit allocation can be performed to minimize the maximum view synthesis distortion at any intermediate viewpoint. Our experimental results show that the optimal bit allocation can outperform a common uniform bit allocation scheme by up to 1.0dB in coding efficiency performance, while simultaneously being competitive to a state-of-the-art H.264 codec.

Proceedings ArticleDOI
TL;DR: This paper examines how to generate new views so that the perceived depth is similar to the original scene depth, and proposes a method to detect and reduce artifacts in the third and last step, these artifacts being created by errors contained in the disparity from the first step.
Abstract: The 3D shape perceived from viewing a stereoscopic movie depends on the viewing conditions, most notably on the screen size and distance, and depth and size distortions appear because of the differences between the shooting and viewing geometries. When the shooting geometry is constrained, or when the same stereoscopic movie must be displayed with different viewing geometries (e.g. in a movie theater and on a 3DTV), these depth distortions may be reduced by novel view synthesis techniques. They usually involve three steps: computing the stereo disparity, computing a disparity-dependent 2D mapping from the original stereo pair to the synthesized views, and finally composing the synthesized views. In this paper, we focus on the second and third step: we examine how to generate new views so that the perceived depth is similar to the original scene depth, and we propose a method to detect and reduce artifacts in the third and last step, these artifacts being created by errors contained in the disparity from the first step.

Proceedings ArticleDOI
29 Dec 2011
TL;DR: This paper proposes a novel object-based LDI representation, improving synthesized virtual views quality, in a rate-constrained context, and reorganised pixels from each LDI layer are reorganised to enhance depth continuity.
Abstract: Layered Depth Image (LDI) representations are attractive compact representations for multi-view videos. Any virtual viewpoint can be rendered from LDI by using view synthesis technique. However, rendering from classical LDI leads to annoying visual artifacts, such as cracks and disocclusions. Visual quality gets even worse after a DCT-based compression of the LDI, because of blurring effects on depth discontinuities. In this paper, we propose a novel object-based LDI representation, improving synthesized virtual views quality, in a rate-constrained context. Pixels from each LDI layer are reorganised to enhance depth continuity.

Proceedings ArticleDOI
26 Oct 2011
TL;DR: It is shown that adding depth blur to the rendered texture can drastically improve the repeatability of FAST and Harris corner detectors, which can be very helpful, e.g., to make tracking-by-synthesis running on mobile phones.
Abstract: Tracking-by-synthesis is a promising method for markerless vision-based camera tracking, particularly suitable for Augmented Reality applications. In particular, it is drift-free, viewpoint invariant and easy-to-combine with physical sensors such as GPS and inertial sensors. While edge features have been used succesfully within the tracking-by-synthesis framework, point features have, to our knowledge, still never been used. We believe that this is due to the fact that real-time corner detectors are generally weakly repeatable between a camera image and a rendered texture. In this paper, we compare the repeatability of commonly used FAST, Harris and SURF interest point detectors across view synthesis. We show that adding depth blur to the rendered texture can drastically improve the repeatability of FAST and Harris corner detectors (up to 100% in our experiments), which can be very helpful, e.g., to make tracking-by-synthesis running on mobile phones. We propose a method for simulating depth blur on the rendered images using a pre-calibrated depth response curve. In order to fulfil the performance requirements, a pyramidal approach is used based on the well-known MIP mapping technique. We also propose an original method for calibrating the depth response curve, which is suitable for any kind of focus lenses and comes for free in terms of programming effort, once the tracking-by-synthesis algorithm has been implemented.

Journal ArticleDOI
TL;DR: Improved projective rectification-based view interpolation and extrapolation methods are developed and applied to view synthesis prediction-based multiview video coding (MVC), and an improved model is proposed to study the rate-distortion performances of various practical MVC schemes.
Abstract: In this paper, we first develop improved projective rectification-based view interpolation and extrapolation methods, and apply them to view synthesis prediction-based multiview video coding (MVC). A geometric model for these view synthesis methods is then developed. We also propose an improved model to study the rate-distortion (R-D) performances of various practical MVC schemes, including the current joint multiview video coding standard. Experimental results show that our schemes achieve superior view synthesis results, and can lead to better R-D performance in MVC. Simulation results with the theoretical models help explaining the experimental results.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed depth view synthesis method provides high-quality depth images for the current view and the proposed VSP modes provide high coding gains, especially on the anchor frames.
Abstract: The view synthesis prediction (VSP) method utilizes interview correlations between views by generating an additional reference frame in the multiview video coding. This paper describes a multiview depth video coding scheme that incorporates depth view synthesis and additional prediction modes. In the proposed scheme, we exploit the reconstructed neighboring depth frame to generate an additional reference depth image for the current viewpoint to be coded using the depth image-based-rendering technique. In order to generate high-quality reference depth images, we used pre-processing on depth, depth image warping, and two types of hole filling methods depending on the number of available reference views. After synthesizing the additional depth image, we encode the depth video using the proposed additional prediction modes named VSP modes; those additional modes refer to the synthesized depth image. In particular, the VSP_SKIP mode refers to the co-located block of the synthesized frame without the coding motion vectors and residual data, which gives most of the coding gains. Experimental results demonstrate that the proposed depth view synthesis method provides high-quality depth images for the current view and the proposed VSP modes provide high coding gains, especially on the anchor frames.

Journal ArticleDOI
TL;DR: An immersive videoconferencing system that enables gaze correction between users in the internet protocol TV (IPTV) environment with parallel programming executed on the GPU for realtime processing is presented.
Abstract: In this paper, we present an immersive videoconferencing system that enables gaze correction between users in the internet protocol TV (IPTV) environment. After we capture the object using stereo cameras, we perform preprocessing techniques, such as camera calibration, color correction, and image rectification. The preprocessed images are down-sampled and disparities are computed by using the downsampled images. The disparity sequence is then filtered to improve temporal consistency. After central view synthesis, occlusion areas are decided and holes are filled. The entire system is implemented with parallel programming that is executed on the GPU for realtime processing. Finally, the user can observe the gaze-corrected image through display. From experimental results, we have verified that the proposed stereo camera system is sufficient to generate the natural gaze-corrected virtual image and realize immersive videoconferencing.

Proceedings ArticleDOI
16 May 2011
TL;DR: This paper proposes a depth up-sampling method which uses the high resolution view as prior in a joint bilateral filter for upscaling, and analyzes the influence of view coding on the result.
Abstract: In 3DTV and free viewpoint imaging systems based on a view-plus-depth representation, depth compression is important for high-quality view synthesis. Several publications have proposed depth down-/up-sampling as part of the depth coding strategy. Recently, we proposed a depth up-sampling method which uses the high resolution view in the process of depth up-sampling. Actually, in 2007 Kopf et.al. already proposed a Joint Bilateral Upsampler(JBU), which uses a high resolution input image as prior in a joint bilateral filter for upscaling. In this paper we compare our previous method with the JBU approach, in the context of depth coding. Furthermore, we analyze the influence of view coding on the depth up-sampling result.

Proceedings ArticleDOI
07 Apr 2011
TL;DR: Free-viewpoint view synthesis (FVVS) extends the common two-view stereo 3D vision into virtual reality by generating unlimited views from any desired viewpoint in the next-generation 3DTV systems.
Abstract: 3DTV promises to become the mainstream of next-generation TV systems. Highresolution 3DTV provides users with a vivid watching experience. Moreover, free-viewpoint view synthesis (FVVS) extends the common two-view stereo 3D vision into virtual reality by generating unlimited views from any desired viewpoint. In the next-generation 3DTV systems, the set-top box (STB) SoC requires both a high-definition (HD) multiview video-coding (MVC) decoder to reconstruct the real camera-captured scenes and a free-viewpoint view synthesizer to generate the virtual scenes [1–2].

Proceedings ArticleDOI
29 Aug 2011
TL;DR: An image-based free-viewpoint system is employed to synthesize the stereoscopic views and is able to match camera path and timing of time lapsed background footage and a live-action foreground video.
Abstract: We present an alternative approach to flexible stereoscopic 3D video content creation. To accomplish a natural image look without the need for expensive hardware or time consuming manual scene modeling, we employ an image-based free-viewpoint system to synthesize the stereoscopic views. By recording the sequence in a sparse multi-view setup, we are able to maintain control over camera position and timing as well as the parameters relevant for stereoscopic content. In particular, we are able to use the system to match camera path and timing of time lapsed background footage and a live-action foreground video.

Proceedings ArticleDOI
09 May 2011
TL;DR: A way to simplify the user's task of understanding the scene by rendering the camera view as if observed from the user’s perspective by estimating his position using a real-time visual SLAM system is provided.
Abstract: Understanding and analysing video data from static or mobile surveillance cameras often requires knowledge of the scene and the camera placement. In this article, we provide a way to simplify the user's task of understanding the scene by rendering the camera view as if observed from the user's perspective by estimating his position using a real-time visual SLAM system. Augmenting the view is referred to as hidden view synthesis. Compared to previous work, the current approach improves by simplifying the setup and requiring minimal user input. This is achieved by building a map of the environment using a visual SLAM system and then registering the surveillance camera in this map. By exploiting the map, a different moving camera can render hidden views in real-time at 30Hz. We discuss some of the challenges remaining for full automation. Results are shown in an indoor environment for surveillance applications and outdoors with application to improved safety in transport.

Proceedings ArticleDOI
29 Dec 2011
TL;DR: An efficient depth map compression method for the view rendering, a novel distortion metric base on view rendering distortions instead of distortion of depth map itself, and a region based video characteristics distortion model is proposed for precisely estimation distortion in view synthesis.
Abstract: A depth map represents three-dimensional (3D) scene information and is used to synthesize virtual views in 3D video. Since the quality of synthesized virtual views highly depends on the quality of depth map, efficient depth compression is crucial to realize the 3D video system. However compressing depth map using existing video coding techniques yields unacceptable distortions while rendering virtual views. To solve this problem, we propose an efficient depth map compression method for the view rendering, a novel distortion metric base on view rendering distortions instead of distortion of depth map itself. First, we derive relationships between distortions in coded depth map and rendered view. Then, a region based video characteristics distortion model is proposed for precisely estimation distortion in view synthesis. Finally, experimental results have shown that 1.8 dB coding gain in terms of PSNR and subjective quality improvement of synthesized views are achieved by the proposed method.