Showing papers on "View synthesis published in 2011"

PDF

Open Access

Journal Article•DOI•

3-D Video Representation Using Depth Maps

[...]

Klaus-Robert Müller¹, Philipp Merkle¹, Thomas Wiegand¹•Institutions (1)

01 Apr 2011

TL;DR: This paper describes efficient coding methods for video and depth data, and synthesis methods are presented, which mitigate errors from depth estimation and coding, for the generation of views.

...read moreread less

Abstract: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.

...read moreread less

420 citations

Journal Article•DOI•

Towards a New Quality Metric for 3-D Synthesized View Assessment

[...]

Emilie Bosc, Romuald Pepion, P. Le Callet, Martin Koppel¹, Patrick Ndjiki-Nya¹, Muriel Pressigout, Luce Morin - Show less +3 more•Institutions (1)

Heinrich Hertz Institute¹

26 Sep 2011-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This paper considers the DIBR-based synthesized view evaluation problem, and provides hints for a new objective measure for 3DTV quality assessment.

...read moreread less

Abstract: 3DTV technology has brought out new challenges such as the question of synthesized views evaluation. Synthesized views are generated through a depth image-based rendering (DIBR) process. This process induces new types of artifacts whose impact on visual quality has to be identified considering various contexts of use. While visual quality assessment has been the subject of many studies in the last 20 years, there are still some unanswered questions regarding new technological improvement. DIBR is bringing new challenges mainly because it deals with geometric distortions. This paper considers the DIBR-based synthesized view evaluation problem. Different experiments have been carried out. They question the protocols of subjective assessment and the reliability of the objective quality metrics in the context of 3DTV, in these specific conditions (DIBR-based synthesized views), and they consist in assessing seven different view synthesis algorithms through subjective and objective measurements. Results show that usual metrics are not sufficient for assessing 3-D synthesized views, since they do not correctly render human judgment. Synthesized views contain specific artifacts located around the disoccluded areas, but usual metrics seem to be unable to express the degree of annoyance perceived in the whole image. This study provides hints for a new objective measure. Two approaches are proposed: the first one is based on the analysis of the shifts of the contours of the synthesized view; the second one is based on the computation of a mean SSIM score of the disoccluded areas.

...read moreread less

218 citations

Journal Article•DOI•

Depth Image-Based Rendering With Advanced Texture Synthesis for 3-D Video

[...]

Patrick Ndjiki-Nya¹, Martin Koppel¹, Dimitar Doshkov¹, Haricharan Lakshman¹, Philipp Merkle¹, Klaus-Robert Müller¹, Thomas Wiegand¹ - Show less +3 more•Institutions (1)

Heinrich Hertz Institute¹

01 Jun 2011-IEEE Transactions on Multimedia

TL;DR: A new hole filling approach for DIBR using texture synthesis is presented and results show that the proposed approach provides improved rendering results in comparison to the latest MPEG view synthesis reference software (VSRS) version 3.6.

...read moreread less

Abstract: A depth image-based rendering (DIBR) approach with advanced inpainting methods is presented. The DIBR algorithm can be used in 3-D video applications to synthesize a number of different perspectives of the same scene, e.g., from a multiview-video-plus-depth (MVD) representation. This MVD format consists of video and depth sequences for a limited number of original camera views of the same natural scene. Here, DIBR methods allow the computation of additional new views. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original views may become visible, especially in extrapolated views beyond the viewing range of the original cameras. The presented algorithm synthesizes these occluded textures. The synthesizer achieves visually satisfying results by taking spatial and temporal consistency measures into account. Detailed experiments show significant objective and subjective gains of the proposed method in comparison to the state-of-the-art methods.

...read moreread less

172 citations

Journal Article•DOI•

Three-Dimensional Video Postproduction and Processing

[...]

Aljosa Smolic¹, Peter Kauff², Sebastian Knorr³, Alexander Hornung¹, M Kunter³, Marcus Müller², Manuel Lang¹ - Show less +3 more•Institutions (3)

Disney Research¹, Heinrich Hertz Institute², Free University of Berlin³

24 Jan 2011

TL;DR: This paper gives an overview of the state-of-the-art in 3-D video postproduction and processing as well as an outlook to remaining challenges and opportunities.

...read moreread less

Abstract: This paper gives an overview of the state-of-the-art in 3-D video postproduction and processing as well as an outlook to remaining challenges and opportunities. First, fundamentals of stereography are outlined that set the rules for proper 3-D content creation. Manipulation of the depth composition of a given stereo pair via view synthesis is identified as the key functionality in this context. Basic algorithms are described to adapt and correct fundamental stereo properties such as geometric distortions, color alignment, and stereo geometry. Then, depth image-based rendering is explained as the widely applied solution for view synthesis in 3-D content creation today. Recent improvements of depth estimation already provide very good results. However, in most cases, still interactive workflows dominate. Warping-based methods may become an alternative for some applications in the future, which do not rely on dense and accurate depth estimation. Finally, 2-D to 3-D conversion is covered, which is an important special area for reuse of existing legacy 2-D content in 3-D. Here various advanced algorithms are combined in interactive workflows.

...read moreread less

160 citations

Proceedings Article•DOI•

Depth-based image completion for view synthesis

[...]

Josselin Gautier¹, Olivier Le Meur¹, Christine Guillemot²•Institutions (2)

University of Rennes¹, French Institute for Research in Computer Science and Automation²

16 May 2011

TL;DR: A depth-based inpainting algorithm which efficiently handles disocclusion occurring on virtual viewpoint rendering and fills in larger disocclusions in distant synthesized views based on a coherent tensor-based color and geometry structure propagation.

...read moreread less

Abstract: This paper describes a depth-based inpainting algorithm which efficiently handles disocclusion occurring on virtual viewpoint rendering. A single reference view and a set of depth maps are used in the proposed approach. The method not only deals with small disocclusion filling related to small camera baseline, but also manages to fill in larger disocclusions in distant synthesized views. This relies on a coherent tensor-based color and geometry structure propagation. The depth is used to drive the filling order, while enforcing the structure diffusion from similar candidate-patches. By acting on patch prioritization, selection and combination, the completion of distant synthesized views allows a consistent and realistic rendering of virtual viewpoints.

...read moreread less

89 citations

Journal Article•DOI•

Boundary Artifact Reduction in View Synthesis of 3D Video: From Perspective of Texture-Depth Alignment

[...]

Zhao Yin¹, Ce Zhu², Zhenzhong Chen², Dong Tian³, Lu Yu¹ - Show less +1 more•Institutions (3)

Zhejiang University¹, Nanyang Technological University², Mitsubishi³

28 Mar 2011-IEEE Transactions on Broadcasting

TL;DR: A novel solution of suppression of misalignment and alignment enforcement between texture and depth to reduce background noises and foreground erosion, respectively, among different types of boundary artifacts is proposed.

...read moreread less

Abstract: 3D Video (3DV) with depth-image-based view synthesis is a promising candidate of next generation broadcasting applications. However, the synthesized views in 3DV are often contaminated by annoying artifacts, particularly notably around object boundaries, due to imperfect depth maps (e.g., produced by state-of-the-art stereo matching algorithms or compressed lossily). In this paper, we first review some representative methods for boundary artifact reduction in view synthesis, and make an in-depth investigation into the underlying mechanisms of boundary artifact generation from a new perspective of texture-depth alignment in boundary regions. Three forms of texture-depth misalignment are identified as the causes for different boundary artifacts, which mainly present themselves as scattered noises on the background and object erosion on the foreground. Based on the insights gained from the analysis, we propose a novel solution of suppression of misalignment and alignment enforcement (denoted as SMART) between texture and depth to reduce background noises and foreground erosion, respectively, among different types of boundary artifacts. The SMART is developed as a three-step pre-processing in view synthesis. Experiments on view synthesis with original and compressed texture/depth data consistently demonstrate the superior performance of the proposed method as compared with other relevant boundary artifact reduction schemes.

...read moreread less

85 citations

Journal Article•DOI•

Depth No-Synthesis-Error Model for View Synthesis in 3-D Video

[...]

Zhao Yin¹, Ce Zhu¹, Zhenzhong Chen¹, Lu Yu²•Institutions (2)

Nanyang Technological University¹, Zhejiang University²

01 Aug 2011-IEEE Transactions on Image Processing

TL;DR: A depth no-synthesis-error (D-NOSE) model is developed to examine the allowable depth distortions in rendering a virtual view without introducing any geometry changes and shows that avirtual view can be synthesized losslessly if depth distortions follow the D-Nose specified thresholds.

...read moreread less

Abstract: Currently, 3-D Video targets at the application of disparity-adjustable stereoscopic video, where view synthesis based on depth-image-based rendering (DIBR) is employed to generate virtual views. Distortions in depth information may introduce geometry changes or occlusion variations in the synthesized views. In practice, depth information is stored in 8-bit grayscale format, whereas the disparity range for a visually comfortable stereo pair is usually much less than 256 levels. Thus, several depth levels may correspond to the same integer (or sub-pixel) disparity value in the DIBR-based view synthesis such that some depth distortions may not result in geometry changes in the synthesized view. From this observation, we develop a depth no-synthesis-error (D-NOSE) model to examine the allowable depth distortions in rendering a virtual view without introducing any geometry changes. We further show that the depth distortions prescribed by the proposed D-NOSE profile also do not compromise the occlusion order in view synthesis. Therefore, a virtual view can be synthesized losslessly if depth distortions follow the D-NOSE specified thresholds. Our simulations validate the proposed D-NOSE model in lossless view synthesis and demonstrate the gain with the model in depth coding.

...read moreread less

73 citations

Journal Article•DOI•

Display Dependent Preprocessing of Depth Maps Based on Just Noticeable Depth Difference Modeling

[...]

D. V. S. X. De Silva¹, Erhan Ekmekcioglu¹, W.A.C. Fernando¹, Stewart T. Worrall¹•Institutions (1)

University of Surrey¹

24 Jan 2011-IEEE Journal of Selected Topics in Signal Processing

TL;DR: Experimental results suggest that the bit rate for depth map coding can be reduced up to 78% for the depth maps captured with depth-range cameras and up to 24% with computer vision algorithms, without affecting the 3-D visual quality or the arbitrary view synthesis quality for free-viewpoint video applications.

...read moreread less

Abstract: This paper addresses the sensitivity of human vision to spatial depth variations in a 3-D video scene, seen on a stereoscopic display, based on an experimental derivation of a just noticeable depth difference (JNDD) model. The main target is to exploit the depth perception sensitivity of humans in suppressing the unnecessary spatial depth details, hence reducing the transmission overhead allocated to depth maps. Based on the JNDD model derived, depth map sequences are preprocessed to suppress the depth details that are not perceivable by the viewers and to minimize the rendering artefacts that arise due to optical noise, where the optical noise is triggered by the inaccuracies in the depth estimation process. Theoretical and experimental evidences are provided to illustrate that the proposed depth adaptive preprocessing filter does not alter the 3-D visual quality or the view synthesis quality for free-viewpoint video applications. Experimental results suggest that the bit rate for depth map coding can be reduced up to 78% for the depth maps captured with depth-range cameras and up to 24% for the depth maps estimated with computer vision algorithms, without affecting the 3-D visual quality or the arbitrary view synthesis quality.

...read moreread less

55 citations

Journal Article•DOI•

Spatio-Temporally Consistent Novel View Synthesis Algorithm From Video-Plus-Depth Sequences for Autostereoscopic Displays

[...]

Chia-Ming Cheng¹, Shu-Jyuan Lin¹, Shang-Hong Lai¹•Institutions (1)

National Tsing Hua University¹

02 May 2011-IEEE Transactions on Broadcasting

TL;DR: A novel algorithm to generate multiple virtual views from a video-plus-depth sequence for modern autostereoscopic displays with an iterative re-weighted framework to jointly consider intensity and depth consistency in the adjacent frames is proposed.

...read moreread less

Abstract: In this paper, we propose a novel algorithm to generate multiple virtual views from a video-plus-depth sequence for modern autostereoscopic displays. To synthesize realistic content in the disocclusion regions at the virtual views is the main challenging problem for this task. Spatial coherence and temporal consistency are the two key factors to produce perceptually satisfactory virtual images. The proposed algorithm employs the spatio-temporal consistency constraint to handle the uncertain pixels in the disocclusion regions. On the one hand, regarding the spatial coherence, we incorporate the intensity gradient strength with the depth information to determine the filling priority for inpainting the disocclusion regions, so that the continuity of image structures can be preserved. On the other hand, the temporal consistency is enforced by estimating the intensities in the disocclusion regions across the adjacent frames with an optimization process. We propose an iterative re-weighted framework to jointly consider intensity and depth consistency in the adjacent frames, which not only imposes temporal consistency but also reduces noise disturbance. Finally, for accelerating the multi-view synthesis process, we apply the proposed view synthesis algorithm to generate the intensity and depth maps at the leftmost and rightmost viewpoints, so that the intermediate views are efficiently interpolated through image warping according to the associated depth maps between the two synthesized images and their corresponding symmetric depths. In the experimental validation, we perform quantitative evaluation on synthetic data as well as subjective assessment on real video data with comparison to some representative methods to demonstrate the superior performance of the proposed algorithm.

...read moreread less

51 citations

Journal Article•DOI•

Content Adaptive Enhancement of Multi-View Depth Maps for Free Viewpoint Video

[...]

Erhan Ekmekcioglu¹, Vladan Velisavljevic², Stewart T. Worrall¹•Institutions (2)

University of Surrey¹, Deutsche Telekom²

01 Apr 2011-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This work proposes a novel content adaptive enhancement technique applied to the previously estimated multi-view depth map sequences that enforces consistency across the spatial, temporal and inter-view dimensions of the depth maps so that both the coding efficiency and the quality of the synthesized views are improved.

...read moreread less

Abstract: Depth map estimation is an important part of the multi-view video coding and virtual view synthesis within the free viewpoint video applications. However, computing an accurate depth map is a computationally complex process, which makes real-time implementation challenging. Alternatively, a simple estimation, though quick and promising for real-time processing, might result in inconsistent multi-view depth map sequences. To exploit this simplicity and to improve the quality of depth map estimation, we propose a novel content adaptive enhancement technique applied to the previously estimated multi-view depth map sequences. The enhancement method is locally adapted to edges, motion and depth-range of the scene to avoid blurring the synthesized views and to reduce the computational complexity. At the same time, and very importantly, the method enforces consistency across the spatial, temporal and inter-view dimensions of the depth maps so that both the coding efficiency and the quality of the synthesized views are improved. We demonstrate these improvements in the experiments, where the enhancement method is applied to several multi-view test sequences and the obtained synthesized views are compared to the views synthesized using other methods in terms of both numerical and perceived visual quality.

...read moreread less

50 citations

Proceedings Article•DOI•

Contour-based segmentation and coding for depth map compression

[...]

Fabian Jager¹•Institutions (1)

RWTH Aachen University¹

29 Dec 2011

TL;DR: A novel algorithm for depth map compression explicitly signaling the location of discontinuities is proposed, which can prove the superior visual quality originating from less geometric distortions of synthesized views using compressed depth maps.

...read moreread less

Abstract: Emerging video technologies like 3DTV and Free Viewpoint Video require to transmit more information than just 2D color data. To allow rendering of arbitrary viewpoints of a video scene a new data format including 2D color and an accompanying depth map has been proposed. Consequently this new technique requires an efficient coding method not only for color information, but also for depth data. Depth maps are characterized by segments describable by piecewise linear functions bounded by sharp edges. Preserving these depth discontinuities is a crucial requirement for high quality view synthesis. To adapt to these characteristics we propose a novel algorithm for depth map compression explicitly signaling the location of discontinuities. Experimental results show that the proposed method yields up to 9dB PSNR gain compared to a JPEG-2000 encoder in high-bitrate scenarios. Subjective quality assessment of synthesized views using compressed depth maps can prove the superior visual quality originating from less geometric distortions.

...read moreread less

Proceedings Article•DOI•

Automatic content creation for multiview autostereoscopic displays using image domain warping

[...]

Miquel A. Farre¹, Oliver Wang¹, Manuel Lang¹, Nikolce Stefanoski¹, Alexander Hornung¹, Aljoscha Smolic¹ - Show less +2 more•Institutions (1)

Disney Research¹

11 Jul 2011

TL;DR: This method applies purely image domain warping to content creation for autostereoscopic display, supporting multiview creation from stereo input, which is the most relevant use case scenario.

...read moreread less

Abstract: Content creation for autostereoscopic display is a widely unresolved task Typical methods rely on view synthesis based on depth image based rendering Our method applies purely image domain warping instead Input video is analyzed and information about sparse disparity, vertical edges and saliency is extracted A constrained energy minimization problem is formulated and efficiently solved The resulting image warping functions are used to synthesize novel views Our approach is fully automatic, accurate, and reliable Disocclusions and related artifacts are avoided due to smooth, saliency-driven warping functions Our method also works well for extrapolation of views in a limited range, thus supporting multiview creation from stereo input, which is the most relevant use case scenario

...read moreread less

Proceedings Article•DOI•

Dense visual mapping of large scale environments for real-time localisation

[...]

Maxime Meilland¹, Andrew I. Comport², Patrick Rives¹•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Nice Sophia Antipolis²

05 Dec 2011

TL;DR: A spherical ego-centric representation of the environment is proposed that is able to reproduce photo-realistic omnidirectional views of captured environments and is used for real-time model-based localisation and navigation.

...read moreread less

Abstract: This paper presents a method and apparatus for building dense visual maps of large scale 3D environments for real-time localisation and navigation. A spherical ego-centric representation of the environment is proposed that is able to reproduce photo-realistic omnidirectional views of captured environments. This representation is novel in that it is composed of a graph of locally accurate augmented spherical panoramas that allows to generate varying viewpoints through novel view synthesis. The spheres are related by a graph of 6dof poses which are estimated through multi-view spherical registration. To acquire these models, a multi-baseline acquisition system has been designed and built which is based on an outward facing ring of cameras with diverging views. This configuration allows to capture high resolution spherical images of the environment and compute a dense depth map through a wide baseline dense correspondence algorithm. A calibration procedure is developed for an outward facing camera ring that imposes a loop closing constraint, in order to obtain a consistent set of extrinsic parameters. This spherical sensor is shown to acquire compact, accurate and efficient representations of large environments and is used for real-time model-based localisation.

...read moreread less

Proceedings Article•DOI•

Transform domain sparsification of depth maps using iterative quadratic programming

[...]

Gene Cheung¹, Junichi Ishida², Akira Kubota², Antonio Ortega³•Institutions (3)

National Institute of Informatics¹, Chuo University², University of Southern California³

29 Dec 2011

TL;DR: This paper explicitly manipulate depth values, without causing severe synthesized view distortion, in order to maximize representation sparsity in the transform domain for compression gain, and shows that their TDS approach gained up to 1.7dB in rate-distortion performance for the interpolated view over compression of unaltered depth maps.

...read moreread less

Abstract: Compression of depth maps is important for “texture plus depth” format of multiview images, which enables synthesis of novel intermediate views via depth-image-based rendering (DIBR) at decoder. Previous depth map coding schemes exploit unique depth data characteristics to compactly and faithfully reproduce the original signal. In contrast, since depth map is only a means to the end of view synthesis and not itself viewed, in this paper we explicitly manipulate depth values, without causing severe synthesized view distortion, in order to maximize representation sparsity in the transform domain for compression gain — we call this process transform domain spar-sification (TDS). Specifically, for each pixel in the depth map, we first define a quadratic penalty function, with minimum at ground truth depth value, based on synthesized view's distortion sensitivity to the pixel's depth value during DIBR. We then define an objective for a depth signal in a block as a weighted sum of: i) signal's sparsity in the transform domain, and ii) per-pixel synthesized view distortion penalties for the chosen signal. Given that sparsity (70-norm) is non-convex and difficult to optimize, we replace the Zo-norm in the objective with a computationally inexpensive weighted 12-norm; the optimization is then an unconstrained quadratic program, solvable via a set of linear equations. For the weighted /2-norm to promote sparsity, we solve the optimization iteratively, where at each iteration weights are readjusted to mimic sparsity-promoting Z T -norm, 0 < r < 1. Using JPEG as an example transform codec, we show that our TDS approach gained up to 1.7dB in rate-distortion performance for the interpolated view over compression of unaltered depth maps.

...read moreread less

Proceedings Article•DOI•

DIBR based view synthesis for free-viewpoint television

[...]

Xiaohui Yang¹, Ju Liu¹, Jiande Sun¹, Xinchao Li¹, Wei Liu², Yuling Gao² - Show less +2 more•Institutions (2)

Shandong University¹, MediaTech Institute²

16 May 2011

TL;DR: An effective virtual view synthesis approach, which utilizes the technology of depth-image-based rendering (DIBR), which is effective and reliable in both of subjective and objective evaluations.

...read moreread less

Abstract: We propose an effective virtual view synthesis approach, which utilizes the technology of depth-image-based rendering (DIBR). In our scheme, two reference color images and their associated depth maps are used to generate the arbitrary virtual viewpoint. Firstly, the main and auxiliary viewpoint images are warped to the virtual viewpoint. After that, the cracks and error points are removed to enhance the image quality. Then, we complement the disocclusions of the virtual viewpoint image warped from the main viewpoint with the help of the auxiliary viewpoint. In order to reduce the color incontinuity of the virtual view, the brightness of the two reference viewpoint images are adjusted. Finally, the holes are filled by the depth-assistance asymmetric dilation inpainting method. Simulations show that the view synthesis approach is effective and reliable in both of subjective and objective evaluations.

...read moreread less

Journal Article•DOI•

VLSI Architecture for Real-Time HD1080p View Synthesis Engine

[...]

Ying-Rung Horng¹, Yu-Cheng Tseng², Tian-Sheuan Chang²•Institutions (2)

MediaTek¹, National Chiao Tung University²

29 Apr 2011-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A real-time HD1080p view synthesis engine based on the reference algorithm from 3-D video coding team is presented by solving high computational complexity and high memory cost problems by solving bilinear interpolation and Z scaling method with floating-point format.

...read moreread less

Abstract: This paper presents a real-time HD1080p view synthesis engine based on the reference algorithm from 3-D video coding team by solving high computational complexity and high memory cost problems. For the computational complexity, we propose the bilinear interpolation to simplify the hole filling process, and the Z scaling method with floating-point format to reduce the cost of homography calculation. For the memory cost, we propose the frame-level pipelining to reduce the requirement of warped depth maps, and the column-order warping method to remove the Z-buffer in occlusion handling. With the 90 nm complementary metal-oxide-semiconductor technology, our view synthesis engine can archive the throughput of 32.4 f/s for HD1080p videos with the gate count of 268.5 K and the internal memory of 69.4 kbytes. The experimental result shows our implementation has the similar synthesis quality as the original reference algorithm.

...read moreread less

Proceedings Article•DOI•

Can 3D synthesized views be reliably assessed through usual subjective and objective evaluation protocols

[...]

Emilie Bosc¹, Martin Koppel, Romuald Pépion², Muriel Pressigout¹, Luce Morin¹, Patrick Ndjiki-Nya, P. Le Callet² - Show less +3 more•Institutions (2)

Centre national de la recherche scientifique¹, University of Nantes²

29 Dec 2011

TL;DR: Results show that the most commonly used objective metrics can be far from human judgment depending on the artifact to deal with.

...read moreread less

Abstract: This paper addresses the problem of evaluating virtual view synthesized images in the multi-view video context. As a matter of fact, view synthesis brings new types of distortion. The question refers to the ability of the traditional used objective metrics to assess synthesized views quality, considering the new types of artifacts. The experiments conducted to determine their reliability consist in assessing seven different view synthesis algorithms. Subjective and objective measurements have been performed. Results show that the most commonly used objective metrics can be far from human judgment depending on the artifact to deal with.

...read moreread less

Proceedings Article•DOI•

Efficient stereo-to-multiview synthesis

[...]

Ankit K. Jain¹, Lam C. Tran¹, Ramsin Khoshabeh¹, Truong Q. Nguyen¹•Institutions (1)

University of California, San Diego¹

22 May 2011

TL;DR: This paper presents a new method for view synthesis that is both fast and accurate and has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.

...read moreread less

Abstract: From a rectified stereo image pair, the task of view synthesis is to generate images from any viewpoint along the baseline. The main difficulty of the problem is how to fill occluded regions. In this paper, we present a new method for view synthesis that is both fast and accurate. Occlusions are filled using color and disparity information to produce consistent pixel estimates. Results are comparable to current state-of-the-art methods in terms of objective measures while computation time is drastically reduced. This work has applications in free-viewpoint television, angular scalability for 3D video coding/decoding, and stereo-to-multiview conversion.

...read moreread less

Proceedings Article•DOI•

Bit allocation for multiview image compression using cubic synthesized view distortion model

[...]

Vladan Velisavljevic¹, Gene Cheung², Jacob Chakareski³•Institutions (3)

Deutsche Telekom¹, National Institute of Informatics², École Polytechnique Fédérale de Lausanne³

11 Jul 2011

TL;DR: A cubic synthesized view distortion model is derived to describe the visual quality of an interpolated view as a function of the view's location, and it is shown how optimal bit allocation can be performed to minimize the maximum view synthesis distortion at any intermediate viewpoint.

...read moreread less

Abstract: “Texture-plus-depth” has become a popular coding format for multiview image compression, where a decoder can synthesize images at intermediate viewpoints using encoded texture and depth maps of closest captured view locations via depth-image-based rendering (DIBR). As in other resource-constrained scenarios, limited available bits must be optimally distributed among captured texture and depth maps to minimize the expected signal distortion at the decoder. A specific challenge of multiview image compression for DIBR is that the encoder must allocate bits without the knowledge of how many and which specific virtual views will be synthesized at the decoder for viewing. In this paper, we derive a cubic synthesized view distortion model to describe the visual quality of an interpolated view as a function of the view's location. Given the model, one can easily find the virtual view location between two coded views where the maximum synthesized distortion occurs. Using a multiview image codec based on shape-adaptive wavelet transform, we show how optimal bit allocation can be performed to minimize the maximum view synthesis distortion at any intermediate viewpoint. Our experimental results show that the optimal bit allocation can outperform a common uniform bit allocation scheme by up to 1.0dB in coding efficiency performance, while simultaneously being competitive to a state-of-the-art H.264 codec.

...read moreread less

Proceedings Article•DOI•

Adapting stereoscopic movies to the viewing conditions using depth-preserving and artifact-free novel view synthesis

[...]

Frédéric Devernay¹, Sylvain Duchêne¹, Adrian Ramos-Peon¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

10 Feb 2011-Proceedings of SPIE

TL;DR: This paper examines how to generate new views so that the perceived depth is similar to the original scene depth, and proposes a method to detect and reduce artifacts in the third and last step, these artifacts being created by errors contained in the disparity from the first step.

...read moreread less

Abstract: The 3D shape perceived from viewing a stereoscopic movie depends on the viewing conditions, most notably on the screen size and distance, and depth and size distortions appear because of the differences between the shooting and viewing geometries. When the shooting geometry is constrained, or when the same stereoscopic movie must be displayed with different viewing geometries (e.g. in a movie theater and on a 3DTV), these depth distortions may be reduced by novel view synthesis techniques. They usually involve three steps: computing the stereo disparity, computing a disparity-dependent 2D mapping from the original stereo pair to the synthesized views, and finally composing the synthesized views. In this paper, we focus on the second and third step: we examine how to generate new views so that the perceived depth is similar to the original scene depth, and we propose a method to detect and reduce artifacts in the third and last step, these artifacts being created by errors contained in the disparity from the first step.

...read moreread less

Proceedings Article•DOI•

Object-based Layered Depth Images for improved virtual view synthesis in rate-constrained context

[...]

Vincent Jantet¹, Christine Guillemot², Luce Morin³•Institutions (3)

École normale supérieure de Cachan¹, French Institute for Research in Computer Science and Automation², Intelligence and National Security Alliance³

29 Dec 2011

TL;DR: This paper proposes a novel object-based LDI representation, improving synthesized virtual views quality, in a rate-constrained context, and reorganised pixels from each LDI layer are reorganised to enhance depth continuity.

...read moreread less

Abstract: Layered Depth Image (LDI) representations are attractive compact representations for multi-view videos. Any virtual viewpoint can be rendered from LDI by using view synthesis technique. However, rendering from classical LDI leads to annoying visual artifacts, such as cracks and disocclusions. Visual quality gets even worse after a DCT-based compression of the LDI, because of blurring effects on depth discontinuities. In this paper, we propose a novel object-based LDI representation, improving synthesized virtual views quality, in a rate-constrained context. Pixels from each LDI layer are reorganised to enhance depth continuity.

...read moreread less

Proceedings Article•DOI•

Tracking-by-synthesis using point features and pyramidal blurring

[...]

Gilles Simon¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

26 Oct 2011

TL;DR: It is shown that adding depth blur to the rendered texture can drastically improve the repeatability of FAST and Harris corner detectors, which can be very helpful, e.g., to make tracking-by-synthesis running on mobile phones.

...read moreread less

Abstract: Tracking-by-synthesis is a promising method for markerless vision-based camera tracking, particularly suitable for Augmented Reality applications. In particular, it is drift-free, viewpoint invariant and easy-to-combine with physical sensors such as GPS and inertial sensors. While edge features have been used succesfully within the tracking-by-synthesis framework, point features have, to our knowledge, still never been used. We believe that this is due to the fact that real-time corner detectors are generally weakly repeatable between a camera image and a rendered texture. In this paper, we compare the repeatability of commonly used FAST, Harris and SURF interest point detectors across view synthesis. We show that adding depth blur to the rendered texture can drastically improve the repeatability of FAST and Harris corner detectors (up to 100% in our experiments), which can be very helpful, e.g., to make tracking-by-synthesis running on mobile phones. We propose a method for simulating depth blur on the rendered images using a pre-calibrated depth response curve. In order to fulfil the performance requirements, a pyramidal approach is used based on the well-known MIP mapping technique. We also propose an original method for calibrating the depth response curve, which is suitable for any kind of focus lenses and comes for free in terms of programming effort, once the tracking-by-synthesis algorithm has been implemented.

...read moreread less

Journal Article•DOI•

Rectification-Based View Interpolation and Extrapolation for Multiview Video Coding

[...]

Xiaoyu Xiu¹, Derek Pang², Jie Liang¹•Institutions (2)

Simon Fraser University¹, Stanford University²

01 Jun 2011-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Improved projective rectification-based view interpolation and extrapolation methods are developed and applied to view synthesis prediction-based multiview video coding (MVC), and an improved model is proposed to study the rate-distortion performances of various practical MVC schemes.

...read moreread less

Abstract: In this paper, we first develop improved projective rectification-based view interpolation and extrapolation methods, and apply them to view synthesis prediction-based multiview video coding (MVC). A geometric model for these view synthesis methods is then developed. We also propose an improved model to study the rate-distortion (R-D) performances of various practical MVC schemes, including the current joint multiview video coding standard. Experimental results show that our schemes achieve superior view synthesis results, and can lead to better R-D performance in MVC. Simulation results with the theoretical models help explaining the experimental results.

...read moreread less

Journal Article•DOI•

Efficient multiview depth video coding using depth synthesis prediction

[...]

Cheon Lee¹, Byeongho Choi, Yo-Sung Ho¹•Institutions (1)

Gwangju Institute of Science and Technology¹

01 Jul 2011-Optical Engineering

TL;DR: Experimental results demonstrate that the proposed depth view synthesis method provides high-quality depth images for the current view and the proposed VSP modes provide high coding gains, especially on the anchor frames.

...read moreread less

Abstract: The view synthesis prediction (VSP) method utilizes interview correlations between views by generating an additional reference frame in the multiview video coding. This paper describes a multiview depth video coding scheme that incorporates depth view synthesis and additional prediction modes. In the proposed scheme, we exploit the reconstructed neighboring depth frame to generate an additional reference depth image for the current viewpoint to be coded using the depth image-based-rendering technique. In order to generate high-quality reference depth images, we used pre-processing on depth, depth image warping, and two types of hole filling methods depending on the number of available reference views. After synthesizing the additional depth image, we encode the depth video using the proposed additional prediction modes named VSP modes; those additional modes refer to the synthesized depth image. In particular, the VSP_SKIP mode refers to the co-located block of the synthesized frame without the coding motion vectors and residual data, which gives most of the coding gains. Experimental results demonstrate that the proposed depth view synthesis method provides high-quality depth images for the current view and the proposed VSP modes provide high coding gains, especially on the anchor frames.

...read moreread less

Journal Article•DOI•

Gaze-corrected view generation using stereo camera system for immersive videoconferencing

[...]

Sang-Beom Lee¹, In-Yong Shin¹, Yo-Sung Ho¹•Institutions (1)

Gwangju Institute of Science and Technology¹

15 Sep 2011-IEEE Transactions on Consumer Electronics

TL;DR: An immersive videoconferencing system that enables gaze correction between users in the internet protocol TV (IPTV) environment with parallel programming executed on the GPU for realtime processing is presented.

...read moreread less

Abstract: In this paper, we present an immersive videoconferencing system that enables gaze correction between users in the internet protocol TV (IPTV) environment. After we capture the object using stereo cameras, we perform preprocessing techniques, such as camera calibration, color correction, and image rectification. The preprocessed images are down-sampled and disparities are computed by using the downsampled images. The disparity sequence is then filtered to improve temporal consistency. After central view synthesis, occlusion areas are decided and holes are filled. The entire system is implemented with parallel programming that is executed on the GPU for realtime processing. Finally, the user can observe the gaze-corrected image through display. From experimental results, we have verified that the proposed stereo camera system is sufficient to generate the natural gaze-corrected virtual image and realize immersive videoconferencing.

...read moreread less

Proceedings Article•DOI•

Depth up-sampling for depth coding using view information

[...]

Meindert Onno Wildeboer¹, Tomohiro Yendo¹, M. Panahpour Tehrani¹, Toshiaki Fujii¹, Masayuki Tanimoto¹ - Show less +1 more•Institutions (1)

Nagoya University¹

16 May 2011

TL;DR: This paper proposes a depth up-sampling method which uses the high resolution view as prior in a joint bilateral filter for upscaling, and analyzes the influence of view coding on the result.

...read moreread less

Abstract: In 3DTV and free viewpoint imaging systems based on a view-plus-depth representation, depth compression is important for high-quality view synthesis. Several publications have proposed depth down-/up-sampling as part of the depth coding strategy. Recently, we proposed a depth up-sampling method which uses the high resolution view in the process of depth up-sampling. Actually, in 2007 Kopf et.al. already proposed a Joint Bilateral Upsampler(JBU), which uses a high resolution input image as prior in a joint bilateral filter for upscaling. In this paper we compare our previous method with the JBU approach, in the context of depth coding. Furthermore, we analyze the influence of view coding on the depth up-sampling result.

...read moreread less

Proceedings Article•DOI•

A 216fps 4096×2160p 3DTV set-top box SoC for free-viewpoint 3DTV applications

[...]

Pei-Kuei Tsung¹, Ping-Chih Lin¹, Kuan-Yu Chen¹, Chuang Tzu-Der¹, Hsin-Jung Yang¹, Shao-Yi Chien¹, Li-Fu Ding¹, Wei-Yin Chen¹, Chih-Chi Cheng¹, Tung-Chien Chen¹, Liang-Gee Chen¹ - Show less +7 more•Institutions (1)

National Taiwan University¹

07 Apr 2011

TL;DR: Free-viewpoint view synthesis (FVVS) extends the common two-view stereo 3D vision into virtual reality by generating unlimited views from any desired viewpoint in the next-generation 3DTV systems.

...read moreread less

Abstract: 3DTV promises to become the mainstream of next-generation TV systems. Highresolution 3DTV provides users with a vivid watching experience. Moreover, free-viewpoint view synthesis (FVVS) extends the common two-view stereo 3D vision into virtual reality by generating unlimited views from any desired viewpoint. In the next-generation 3DTV systems, the set-top box (STB) SoC requires both a high-definition (HD) multiview video-coding (MVC) decoder to reconstruct the real camera-captured scenes and a free-viewpoint view synthesizer to generate the virtual scenes [1–2].

...read moreread less

Proceedings Article•DOI•

Stereoscopic 3D view synthesis from unsynchronized multi-view video

[...]

Felix Klose¹, Kai Ruhl¹, Christian Lipski¹, Christian Linz¹, Markus Magnor¹ - Show less +1 more•Institutions (1)

Braunschweig University of Technology¹

29 Aug 2011

TL;DR: An image-based free-viewpoint system is employed to synthesize the stereoscopic views and is able to match camera path and timing of time lapsed background footage and a live-action foreground video.

...read moreread less

Abstract: We present an alternative approach to flexible stereoscopic 3D video content creation. To accomplish a natural image look without the need for expensive hardware or time consuming manual scene modeling, we employ an image-based free-viewpoint system to synthesize the stereoscopic views. By recording the sequence in a sparse multi-view setup, we are able to maintain control over camera position and timing as well as the parameters relevant for stereoscopic content. In particular, we are able to use the system to match camera path and timing of time lapsed background footage and a live-action foreground video.

...read moreread less

Proceedings Article•DOI•

Hidden view synthesis using real-time visual SLAM for simplifying video surveillance analysis

[...]

Christopher Mei¹, Eric Sommerlade², Gabe Sibley², Paul Newman², Ian Reid² - Show less +1 more•Institutions (2)

Centre national de la recherche scientifique¹, University of Oxford²

09 May 2011

TL;DR: A way to simplify the user's task of understanding the scene by rendering the camera view as if observed from the user’s perspective by estimating his position using a real-time visual SLAM system is provided.

...read moreread less

Abstract: Understanding and analysing video data from static or mobile surveillance cameras often requires knowledge of the scene and the camera placement. In this article, we provide a way to simplify the user's task of understanding the scene by rendering the camera view as if observed from the user's perspective by estimating his position using a real-time visual SLAM system. Augmenting the view is referred to as hidden view synthesis. Compared to previous work, the current approach improves by simplifying the setup and requiring minimal user input. This is achieved by building a map of the environment using a visual SLAM system and then registering the surveillance camera in this map. By exploiting the map, a different moving camera can render hidden views in real-time at 30Hz. We discuss some of the challenges remaining for full automation. Results are shown in an indoor environment for surveillance applications and outdoors with application to improved safety in transport.

...read moreread less

Proceedings Article•DOI•

Efficient rendering distortion estimation for depth map compression

[...]

Qiuwen Zhang¹, Ping An¹, Yan Zhang¹, Zhaoyang Zhang¹•Institutions (1)

Shanghai University¹

29 Dec 2011

TL;DR: An efficient depth map compression method for the view rendering, a novel distortion metric base on view rendering distortions instead of distortion of depth map itself, and a region based video characteristics distortion model is proposed for precisely estimation distortion in view synthesis.

...read moreread less

Abstract: A depth map represents three-dimensional (3D) scene information and is used to synthesize virtual views in 3D video. Since the quality of synthesized virtual views highly depends on the quality of depth map, efficient depth compression is crucial to realize the 3D video system. However compressing depth map using existing video coding techniques yields unacceptable distortions while rendering virtual views. To solve this problem, we propose an efficient depth map compression method for the view rendering, a novel distortion metric base on view rendering distortions instead of distortion of depth map itself. First, we derive relationships between distortions in coded depth map and rendered view. Then, a region based video characteristics distortion model is proposed for precisely estimation distortion in view synthesis. Finally, experimental results have shown that 1.8 dB coding gain in terms of PSNR and subjective quality improvement of synthesized views are achieved by the proposed method.

...read moreread less