scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2010"


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work takes point-based real-time structure from motion (SFM) as a starting point, generating accurate 3D camera pose estimates and a sparse point cloud and warp the base mesh into highly accurate depth maps based on view-predictive optical flow and a constrained scene flow update.
Abstract: We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an approximate but smooth base mesh generated from the SFM to predict the view at a bundle of poses around automatically selected reference frames spanning the scene, and then warp the base mesh into highly accurate depth maps based on view-predictive optical flow and a constrained scene flow update. The quality of the resulting depth maps means that a convincing global scene model can be obtained simply by placing them side by side and removing overlapping regions. We show that a cluttered indoor environment can be reconstructed from a live hand-held camera in a few seconds, with all processing performed by current desktop hardware. Real-time monocular dense reconstruction opens up many application areas, and we demonstrate both real-time novel view synthesis and advanced augmented reality where augmentations interact physically with the 3D scene and are correctly clipped by occlusions.

480 citations


Proceedings ArticleDOI
19 Jul 2010
TL;DR: A depth image-based rendering (DIBR) approach with advanced inpainting methods with significant objective and subjective gains of the proposed method in comparison to the state-of-the-art methods is presented.
Abstract: In free viewpoint television or 3D video, depth image based rendering (DIBR) is used to generate virtual views based on a textured image and its associated depth information. In doing so, image regions which are occluded in the original view may become visible in the virtual image. One of the main challenges in DIBR is to extrapolate known textures into the disoccluded area without inserting subjective annoyance. In this paper, a new hole filling approach for DIBR using texture synthesis is presented. Initially, the depth map in the virtual view is filled at disoccluded locations. Then, in the textured image, holes of limited spatial extent are closed by solving Laplace equations. Larger disoccluded regions are initialized via median filtering and subsequently refined by patch-based texture synthesis. Experimental results show that the proposed approach provides improved rendering results in comparison to the latest MPEG view synthesis reference software (VSRS) version 3.6 [1].

186 citations


Journal ArticleDOI
TL;DR: A new rendering algorithm is explored that enables to compute a free-viewpoint between two reference views from existing cameras that performs forward warping for both texture and depth simultaneously.

136 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: A new temporally and spatially consistent hole filling method for DIBR is presented, highlighting that gains in objective and visual quality can be achieved in comparison to the latest MPEG view synthesis reference software (VSRS).
Abstract: Depth-image-based rendering (DIBR) is used to generate additional views of a real-world scene from images or videos and associated per-pixel depth information. An inherent problem of the view synthesis concept is the fact that image information which is occluded in the original view may become visible in the “virtual” image. The resulting question is: how can these disocclusions be covered in a visually plausible manner? In this paper, a new temporally and spatially consistent hole filling method for DIBR is presented. In a first step, disocclusions in the depth map are filled. Then, a background sprite is generated and updated with every frame using the original and synthesized information from previous frames to achieve temporally consistent results. Next, small holes resulting from depth estimation inaccuracies are closed in the textured image, using methods that are based on solving Laplace equations. The residual disoccluded areas are coarsely initialized and subsequently refined by patch-based texture synthesis. Experimental results are presented, highlighting that gains in objective and visual quality can be achieved in comparison to the latest MPEG view synthesis reference software (VSRS).

77 citations


Proceedings ArticleDOI
01 Jan 2010
TL;DR: This work proposes to only render a single image, together with a depth buffer and use image-based techniques to generate two individual images for the left and right eye, and computes a high-quality stereo pair for roughly half the cost of the traditional methods.
Abstract: Stereo vision is becoming increasingly popular in feature films, visualization and interactive applications such as computer games. However, computation costs are doubled when rendering an individual image for each eye. In this work, we propose to only render a single image, together with a depth buffer and use image-based techniques to generate two individual images for the left and right eye. The resulting method computes a high-quality stereo pair for roughly half the cost of the traditional methods. We achieve this result via an adaptive-grid warping that also involves information from previous frames to avoid artifacts.

59 citations


Journal ArticleDOI
TL;DR: A new view synthesis method in multiview camera configurations of Free viewpoint TV (FTV) where potential depth errors are considered and the complementarity principle of the artifacts from left and right references is infers.

49 citations


Journal ArticleDOI
TL;DR: The efficiency of the proposed view synthesis method has been verified by evaluating the quality of synthesized images with various metrics such as peak signal‐to‐noise ratio, structural similarity, discrete cosine transform (DCT)‐based video quality metric, and the newly proposed metrics.
Abstract: Virtual view synthesis is one of the most important techniques to realize free viewpoint television and three‐dimensional (3D) video. In this article, we propose a view synthesis method to generate high‐quality intermediate views in such applications and new evaluation metrics named as spatial peak signal‐to‐noise ratio and temporal peak signal‐to‐noise ratio to measure spatial and temporal consistency, respectively. The proposed view synthesis method consists of five major steps: depth preprocessing, depth‐based 3D warping, depth‐based histogram matching, base plus assistant view blending, and depth‐based hole‐filling. The efficiency of the proposed view synthesis method has been verified by evaluating the quality of synthesized images with various metrics such as peak signal‐to‐noise ratio, structural similarity, discrete cosine transform (DCT)‐based video quality metric, and the newly proposed metrics. We have also confirmed that the synthesized images are objectively and subjectively natural. © 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 378–390, 2010 © 2010 Wiley Periodicals, Inc.

46 citations


Proceedings ArticleDOI
TL;DR: A new rendering algorithm that computes a free-viewpoint based on depth image warping between two reference views from existing cameras is explored and three quality enhancing techniques that specifically aim at solving the major artifacts are developed.
Abstract: Interactive free-viewpoint selection applied to a 3D multi-view signal is a possible attractive feature of the rapidly developing 3D TV media. This paper explores a new rendering algorithm that computes a free-viewpoint based on depth image warping between two reference views from existing cameras. We have developed three quality enhancing techniques that specifically aim at solving the major artifacts. First, resampling artifacts are filled in by a combination of median filtering and inverse warping. Second, contour artifacts are processed while omitting warping of edges at high discontinuities. Third, we employ a depth signal for more accurate disocclusion inpainting. We obtain an average PSNR gain of 3 dB and 4.5 dB for the 'Breakdancers' and 'Ballet' sequences, respectively, compared to recently published results. While experimenting with synthetic data, we observe that the rendering quality is highly dependent on the complexity of the scene. Moreover, experiments are performed using compressed video from surrounding cameras. The overall system quality is dominated by the rendering quality and not by coding.

45 citations


Proceedings ArticleDOI
19 Jul 2010
TL;DR: This work proposes depth adjustment method which controls the amounts of parallax of stereoscopic images by using visual fatigue prediction and depth-based view synthesis, and presents a method for extracting disparity characteristics from sparse corresponding features.
Abstract: A well-known problem in stereoscopic images is visual fatigue. In order to reduce visual fatigue, we propose depth adjustment method which controls the amounts of parallax of stereoscopic images by using visual fatigue prediction and depth-based view synthesis. We predict the visual fatigue level by examining the horizontal and vertical disparity characteristics of 3D images, and if the contents are judged to contain visual fatigue, depth adjustment is applied by using depth-based view synthesis. We present a method for extracting disparity characteristics from sparse corresponding features. Depth-based view synthesis algorithm is proposed to handle the hole regions in rendering process. We measured the correlations between visual fatigue prediction metrics and the subjective results, acquiring the ranges of 79% to 85%. Then, we performed depth-based view synthesis to the contents which are predicted to cause visual fatigue. A subjective evaluation showed that the proposed depth adjustment method generated comfortable stereoscopic images.

37 citations


Proceedings ArticleDOI
01 Dec 2010
TL;DR: This paper manipulates depth values themselves, without causing severe synthesized view distortion, in order to maximize sparsity in the transform domain for compression gain, and designs a heuristic to push resulting LP solution away from constraint boundaries to avoid quantization errors.
Abstract: Compression of depth maps is important for “image plus depth” representation of multiview images, which enables synthesis of novel intermediate views via depth-image-based rendering (DIBR) at decoder. Previous depth map coding schemes exploit unique depth characteristics to compactly and faithfully reproduce the original signal. In contrast, given that depth maps are not directly viewed but are only used for view synthesis, in this paper we manipulate depth values themselves, without causing severe synthesized view distortion, in order to maximize sparsity in the transform domain for compression gain. We formulate the sparsity maximization problem as an l 0 -norm optimization. Given l 0 -norm optimization is hard in general, we first find a sparse representation by iteratively solving a weighted l 1 minimization via linear programming (LP). We then design a heuristic to push resulting LP solution away from constraint boundaries to avoid quantization errors. Using JPEG as an example transform codec, we show that our approach gained up to 2.5dB in rate-distortion performance for the interpolated view.

36 citations


Patent
15 Jul 2010
TL;DR: In this article, a hybrid video decoder supporting intermediate view synthesis of an intermediate view video from a first and a second-view video which are predictively coded into a multi-view data signal with frames of the second view video being spatially subdivided into sub-regions and the multiview data signal having a prediction mode is provided.
Abstract: Hybrid video decoder supporting intermediate view synthesis of an intermediate view video from a first- and a second-view video which are predictively coded into a multi-view data signal with frames of the second-view video being spatially subdivided into sub-regions and the multi-view data signal having a prediction mode is provided, having: an extractor configured to respectively extract, from the multi-view data signal, for sub-regions of the frames of the second-view video, a disparity vector and a prediction residual; a predictive reconstructor configured to reconstruct the sub-regions of the frames of the second-view video, by generating a prediction from a reconstructed version of a portion of frames of the first-view video using the disparity vectors and a prediction residual for the respective sub-regions; and an intermediate view synthesizer configured to reconstruct first portions of the intermediate view video.

Proceedings ArticleDOI
07 Jun 2010
TL;DR: A novel scalable and high performance 3D acquisition framework for immersive 3D videoconference systems which takes benefit from both sides and is able to integrate complex computer vision algorithms, such as Visual Hull, multi-view stereo matching, segmentation, image rectification, lens distortion correction and virtual view synthesis in one real-time framework.
Abstract: The interest in immersive 3D video conference systems exists now for many years from both sides, the commercialization point of view as well as from a research perspective. Still, one of the major bottlenecks in this context is the computational complexity of the required algorithmic modules. This paper discusses this problem from a hardware point of view. We use new fast graphics board solutions, which allow high algorithmic parallelization in consumer PC environments on one hand and look at state-of-the-art powerful multi-core CPU processing capabilities on the other hand. We propose a novel scalable and high performance 3D acquisition framework for immersive 3D videoconference systems which takes benefit from both sides. In this way we are able to integrate complex computer vision algorithms, such as Visual Hull, multi-view stereo matching, segmentation, image rectification, lens distortion correction and virtual view synthesis as well as data encoding, network signaling and capturing for 16 HD cameras in one real-time framework. This paper is based on results and experiences of the European FP7 research project 3DPresence which aims to build a real-time three party and multi-user 3D videoconferencing system.

Proceedings ArticleDOI
07 Jun 2010
TL;DR: The simulation result and comparison with previous works on inpainting show that both objective evaluation estimates and subjective perceptual vision come to a better result by the system proposed in this paper.
Abstract: Multiview video can provide users a 3D and virtual reality perception by its multiple viewing angles. To improve the quality of virtual view synthesized frame and remove the disocclusion region, hole filling technique is required. By classifying the different image artifact and applying proper hybrid motion/depth-oriented inpainting algorithm, the image quality will be closer to the real image. In addition, the simulation result and comparison with previous works on inpainting show that both objective evaluation estimates and subjective perceptual vision come to a better result by the system proposed in this paper.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: This paper presents a complex framework for 3D video, where not only the 3D format and new coding methods are investigated, but also view synthesis and the provision of high-quality depth maps, e.g. via depth estimation.
Abstract: The introduction of first 3D systems for digital cinema and home entertainment is based on stereo technology. For efficiently supporting new display types, depth-enhanced formats and coding technology is required, as introduced in this overview paper. First, we discuss the necessity for a generic 3D video format, as the current state-of-the-art in multi-view video coding cannot support different types of multi-view displays at the same time. Therefore, a generic depth-enhanced 3D format is developed, where any number of views can be generated from one bit stream. This, however, requires a complex framework for 3D video, where not only the 3D format and new coding methods are investigated, but also view synthesis and the provision of high-quality depth maps, e.g. via depth estimation. We present this framework and discuss the interdependencies between the different modules.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: A probabilistic framework which constrains the reliability of each pixel of new view by Maximizing Likelihood (ML) is introduced, and the virtual view is generated by solving a Maximum a Posterior (MAP) problem using graph cuts.
Abstract: View synthesis using depth maps is an important application in 3D image processing. In this paper, a novel method is proposed for the plausible view synthesis of Free-viewpoint TV (FTV), using two input images and their depth maps. The depth estimation based on stereo matching is known to be error-prone, leading to noticeable artifacts in the synthesized new views. To produce high-quality view synthesis, we introduce a probabilistic framework which constrains the reliability of each pixel of new view by Maximizing Likelihood (ML). The spatial adaptive reliability is provided by incorporating Gamma hyper-prior and the synthesis error approximation. Furthermore, we generate the virtual view by solving a Maximum a Posterior (MAP) problem using graph cuts. We compare the proposed method with other depth based view synthesis approaches on MPEG test sequences. The results show the outperformance of our method both at subjective artifacts reduction and objective PSNR improvement.

Proceedings ArticleDOI
TL;DR: A new method of DIBR using multi-view images acquired in a linear camera arrangement that improves virtual viewpoint images by predicting the residual errors and in the experiments, PSNR could be improved for few decibels compared with the conventional method.
Abstract: The availability of multi-view images of a scene makes possible new and exciting applications, including Free-viewpoint TV (FTV). FTV allows us to change viewpoint freely in a 3D world, where the virtual viewpoint images are synthesized by Depth-Image-Based Rendering (DIBR). In this paper, we propose a new method of DIBR using multi-view images acquired in a linear camera arrangement. The proposed method improves virtual viewpoint images by predicting the residual errors. For virtual viewpoint image synthesis, it is necessary to estimate the depth maps with multi-view images. Some algorithms to estimate depth map were proposed, but it is difficult to estimate accurate depth map. As a result, rendered virtual viewpoint images have some errors due to the depth errors. Therefore, our proposed method takes into account those depth errors and improves the quality of the rendered virtual viewpoint images. In the proposed method, the virtual images of each camera position are generated using the real images from each other camera. Then, the residual errors can be calculated between the generated images and the real images acquired by the actual cameras. The residual errors are processed and fed back to predict the residual errors that can be happened to virtual viewpoint images generated by conventional method. In the experiments, PSNR could be improved for few decibels compared with the conventional method.

Proceedings ArticleDOI
11 Jul 2010
TL;DR: A semi-automatic depth estimation algorithm whereby the user defines object depth boundaries and disparity initialization and can significantly improve the depth maps and reduce view synthesis artifacts in Depth Image Based Rendering.
Abstract: In this paper, we propose a semi-automatic depth estimation algorithm whereby the user defines object depth boundaries and disparity initialization. Automatic depth estimation methods generally have difficulty to obtain good depth results around object edges and in areas with low texture. The goal of our method is to improve the depth in these areas and reduce view synthesis artifacts in Depth Image Based Rendering. Good view synthesis quality is very important in applications such as 3DTV and Free-viewpoint Television (FTV). In our proposed method, initial disparity values for smooth areas can be input through a so-called manual disparity map, and depth boundaries are defined by a manually created edge map which can be supplied for one or multiple frames. For evaluation we used MPEG multi-view videos and we demonstrate our algorithm can significantly improve the depth maps and reduce view synthesis artifacts.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: This paper compares different disparity-dependent mappings in terms of perceived shape distortion and alteration of the images, and proposes a hybrid mapping which does not distort depth and minimizes modifications of the image content.
Abstract: The 3-D shape perceived from viewing a stereoscopic movie depends on the viewing conditions, most notably on the screen size and distance, and depth and size distortions appear because of the differences between the shooting and viewing geometries. When the shooting geometry is constrained, or when the same stereoscopic movie must be displayed with different viewing geometries (e.g. in a movie theater and on a 3DTV), these depth distortions may be reduced by new view synthesis techniques. They usually involve three steps: computing the stereo disparity, computing a disparity-dependent 2-D mapping from the original stereo pair to the synthesized views, and finally composing the synthesized views. In this paper, we compare different disparity-dependent mappings in terms of perceived shape distortion and alteration of the images, and we propose a hybrid mapping which does not distort depth and minimizes modifications of the image content.

Proceedings ArticleDOI
19 Jul 2010
TL;DR: Two methods which use the concept of depth based 3D warping for efficient multiview video compression and a novel method that uses 3D Warping to improve conventional disparity compensated prediction are proposed.
Abstract: For future applications like 3DTV and free view point TV, the video data format is likely to comprise of multiple views plus pre-estimated geometric information such as a depth-map corresponding to each view. It has been well studied that the depth-maps are useful both from compression and rendering perspective. In this paper, we propose two methods which use the concept of depth based 3D warping for efficient multiview video compression. First, we present a method to synthesize virtual view based on 3D warping to enhance inter-view prediction. We analyze similar previous works on view synthesis based inter-view prediction and present comparative results. Next, we propose a novel method that uses 3D warping to improve conventional disparity compensated prediction. Experimental results demonstrate improved coding efficiency with our proposed methods when compared with MVC.

Proceedings ArticleDOI
07 Jun 2010
TL;DR: It is shown that the view synthesis reference software yields high distortions that mask those due to depth map compression, when the distortion is measured by average luma peak signal-to-noise ratio, which is essential to the reliability of quality evaluation.
Abstract: Several quality evaluation studies have been performed for video-plus-depth coding systems. In these studies, however, the distortions in the synthesized views have been quantified in experimental setups where both the texture and depth videos are compressed. Nevertheless, there are several factors that affect the quality of the synthesized view. Incorporating more than one source of distortion in the study could be misleading; one source of distortion could mask (or be masked by) the effect of other sources of distortion. In this paper, we conduct a quality evaluation study that aims to assess the distortions introduced by the view synthesis procedure and depth map compression in multiview-video-plus-depth coding systems. We report important findings that many of the existing studies have overlooked, yet are essential to the reliability of quality evaluation. In particular, we show that the view synthesis reference software yields high distortions that mask those due to depth map compression, when the distortion is measured by average luma peak signal-to-noise ratio. In addition, we show what quality metric to use in order to reliably quantify the effect of depth map compression on view synthesis quality. Experimental results that support these findings are provided for both synthetic and real multiview-video-plus-depth sequences.

Proceedings ArticleDOI
07 Jun 2010
TL;DR: A novel reliability based view synthesis method using two references and their depth maps that outperforms state-of-the-art view interpolation methods both at eliminating artifacts and improving PSNR.
Abstract: View synthesis using depth maps is a crucial application for Free-viewpoint TV (FTV). In this paper, we propose a novel reliability based view synthesis method using two references and their depth maps. The depth estimation with stereo matching is known to be error-prone, leading to noticeable artifacts in the synthesized new views. In order to provide plausible virtual views for FTV, our focus is on the error suppression for the synthesized view. We innovatively introduce the continuous reliability using error approximation by the reference cross-check. The new view interpolation algorithm is generated with the criterion of Least Sum of Squared Errors (LSSE). Furthermore, the proposed algorithm can be considered as a reliable version of the conventional linear view blending. We experimentally demonstrate the effectiveness of our framework with MPEG standard test sequences. The results show that our method outperforms state-of-the-art view interpolation methods both at eliminating artifacts and improving PSNR.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: A novel method to synthesize intermediate views from two stereo images and disparity maps that is robust to errors in disparity maps and provides an explicit probabilistic model to select the best candidate for each disoccluded pixel efficiently with Conditional Random Fields and graph-cuts is proposed.
Abstract: We propose a novel method to synthesize intermediate views from two stereo images and disparity maps that is robust to errors in disparity maps. The proposed method computes a placement matrix from each disparity map that can be used to correct errors when warping pixels from reference view to virtual view. The second contribution is a new hole filling method that uses depth, edge, and segmentation information to aid the process of filling disoccluded pixels. The proposed method selects pixels from segmented regions that are connected to the disoccluded region as candidates to fill the disoccluded pixels. We also provide an explicit probabilistic model to select the best candidate for each disoccluded pixel efficiently with Conditional Random Fields (CRFs) and graph-cuts.

Proceedings ArticleDOI
29 Oct 2010
TL;DR: This work proposes to add a third phase where 2D or 3D artifacts are detected and removed is each stereoscopic image pair, while keeping the perceived quality of the stereoscopic movie close to the original.
Abstract: Novel view synthesis methods consist in using several images or video sequences of the same scene, and creating new images of this scene, as if they were taken by a camera placed at a different viewpoint. They can be used in stereoscopic cinema to change the camera parameters (baseline, vergence, focal length...) a posteriori, or to adapt a stereoscopic broadcast that was shot for given viewing conditions (such as a movie theater) to a different screen size and distance (such as a 3DTV in a living room). View synthesis from stereoscopic movies usually proceeds in two phases: First, disparity maps and other viewpoint-independent data (such as scene layers and matting information) are extracted from the original sequences, and second, this data and the original images are used to synthesize the new sequence, given geometric information about the synthesized viewpoints. Unfortunately, since no known stereo method gives perfect results in all situations, the results of the first phase will most probably contain errors, which will result in 2D or 3D artifacts in the synthesized stereoscopic movie. We propose to add a third phase where these artifacts are detected and removed is each stereoscopic image pair, while keeping the perceived quality of the stereoscopic movie close to the original.

Proceedings ArticleDOI
01 Dec 2010
TL;DR: This work investigates the cause of boundary noises, and proposes a novel solution to remove such boundary noises by applying restrictions during forward warping on the pixels within the texture-depth misalignment regions.
Abstract: During view synthesis based on depth maps, also known as Depth-Image-Based Rendering (DIBR), annoying artifacts are often generated around foreground objects, yielding the visual effects that slim silhouettes of foreground objects are scattered into the background. The artifacts are referred as the boundary noises. We investigate the cause of boundary noises, and find out that they result from the misalignment between texture and depth information along object boundaries. Accordingly, we propose a novel solution to remove such boundary noises by applying restrictions during forward warping on the pixels within the texture-depth misalignment regions. Experiments show this algorithm can effectively eliminate most boundary noises and it is also robust for view synthesis with compressed depth and texture information.

Proceedings ArticleDOI
10 Dec 2010
TL;DR: A new filtering technique addresses the disocclusions problem issued from the depth image based rendering (DIBR) technique within 3DTV framework by pre-processing the depth video and/or post- processing the warped image through hole-filling techniques.
Abstract: In this paper, a new filtering technique addresses the disocclusions problem issued from the depth image based rendering (DIBR) technique within 3DTV framework. An inherent problem with DIBR is to fill in the newly exposed areas (holes) caused by the image warping process. In opposition with multiview video (MVV) systems, such as free viewpoint television (FTV), where multiple reference views are used for recovering the disocclusions, we consider in this paper a 3DTV system based on a video-plus-depth sequence which provides only one reference view of the scene. To overcome this issue, disocclusion removal can be achieved by pre-processing the depth video and/or post-processing the warped image through hole-filling techniques. Specifically, we propose in this paper a pre-processing of the depth video based on a bilateral filtering according to the strength of the depth discontinuity. Experimental results are shown to illustrate the efficiency of the proposed method compared to the traditional methods.

Proceedings ArticleDOI
07 Jun 2010
TL;DR: The proposed method synthesizes an initial view using the existing depth-based warping, and then uses the initial synthesized view as the templates needed to derive fine motion vectors and temporal reference pictures which yields the prediction output.
Abstract: This paper proposes a novel method that uses temporal reference pictures to improve the quality of view synthesis prediction. Existing view synthesis prediction schemes generate image signals from just only inter-view reference pictures. However, there are many types of signal mismatch like illumination, color, and focus mismatch across views, and these mismatches decrease the prediction performance. The proposed method synthesizes an initial view using the existing depth-based warping, and then uses the initial synthesized view as the templates needed to derive fine motion vectors. The initial synthesized view is then updated by using the derived motion vectors and temporal reference pictures which yields the prediction output. Experiments show that the proposed method can improve the quality of view synthesis about 14 dB for ballet and 4 dB for breakdancers at high bitrate, and reduces the bitrate by about 2% relative to conventional view synthesis prediction.

Patent
11 Aug 2010
TL;DR: In this article, a view synthesis predictive coding method for multi-view video coding is proposed. But the method is not suitable for video decoding with multiple camera systems, and it requires a large number of video frames.
Abstract: The invention relates to a digital image processing and video coding/decoding technique, specifically a view synthesis predictive coding The invention aims to solve the technical problems to provide a multi-view video coding method suitable for various camera systems The multi-view video coding method is characterized in that the method comprises the following steps: a, independently coding video sequences of one or more views by means of motion compensation prediction; b, coding other video sequences by the minimal-cost one prediction mode selected from parallax compensation prediction, view synthesis prediction, and motion compensation prediction The invention ensures better effect of video coding

Proceedings ArticleDOI
14 Mar 2010
TL;DR: The proposed algorithm classifies the pixel-wise depth map into two categories, one is reliable and the other is unreliable, followed by the depth refinement algorithm for those pixels with unreliable depth values.
Abstract: With the recent progress of display, capture device, and coding technologies, multi-view video applications such as stereoscopic video, free viewpoint TV (FTV), and free viewpoint video (FVV) have been introduced to the world with growing interest. To achieve free navigation of such applications, depth information is required along with the video data. There have been many research activities in the area of depth estimation; however, it still poses us great challenge to estimate accurate depth map. In this paper, we propose a depth refinement algorithm for multi-view video synthesis. The proposed algorithm classifies the pixel-wise depth map into two categories, one is reliable and the other is unreliable, followed by the depth refinement algorithm for those pixels with unreliable depth values. Except for the depth refinement algorithm, we also propose a reliable weighted view interpolation algorithm. At last, the refined depth map is evaluated by the quality of the synthesized view.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: It is shown that using the monotonicity assumption, suboptimal solutions can be efficiently pruned from the feasible space during parameter search and the complexity of the scheme can be reduced by up to 66% over full search without loss of optimality.
Abstract: The encoding of both texture and depth maps of a set of multi-view images, captured by a set of spatially correlated cameras, is important for any 3D visual communication systems based on depth-image-based rendering (DIBR). In this paper, we address the problem of efficient bit allocation among texture and depth maps of multi-view images. We pose the following question: for chosen (1) coding tool to encode texture and depth maps at the encoder and (2) view synthesis tool to reconstruct uncoded views at the decoder, how to best select captured views for encoding and distribute available bits among texture and depth maps of selected coded views, such that visual distortion of a “metric” of reconstructed views is minimized. We show that using the monotonicity assumption, suboptimal solutions can be efficiently pruned from the feasible space during parameter search. Our experiments show that optimal selection of coded views and associated quantization levels for texture and depth maps can outperform a heuristic scheme using constant levels for all maps (commonly used in the standard implementations) by up to 2.0dB. Moreover, the complexity of our scheme can be reduced by up to 66% over full search without loss of optimality.

Journal ArticleDOI
TL;DR: The results presented on two real sequences show that the proposed polygon soup representation provides a good trade-off between rendering quality and data compactness.