scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2015"


Proceedings ArticleDOI
07 Jun 2015
TL;DR: A new place recognition approach is developed that combines an efficient synthesis of novel views with a compact indexable image representation and significantly outperforms other large-scale place recognition techniques on this challenging data.
Abstract: We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.

502 citations


Proceedings Article
07 Dec 2015
TL;DR: A novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image and allows the model to capture long-term dependencies along a sequence of transformations.
Abstract: An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

346 citations


Posted Content
TL;DR: In this article, a deep network is trained end-to-end from a large number of posed image sets and the pixels from neighboring views of a scene are presented to the network which then directly produces the pixels of the unseen view.
Abstract: Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision, but their use in graphics problems has been limited. In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches which consist of multiple complex stages of processing, each of which require careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. To verify our method we show that it can convincingly reproduce known test views from nearby imagery. Additionally we show images rendered from novel viewpoints. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.

180 citations


Journal ArticleDOI
TL;DR: An improved method for tentative correspondence selection, applicable both with and without view synthesis, and a modification of the standard first to second nearest distance rule increases the number of correct matches by 5–20% at no additional computational cost are introduced.

158 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: A Disparity Assisted Phase based Synthesis strategy that can integrate disparity information into the phase term of a reference image to warp it to its close neighbor views to solve the problems of disparity inconsistency and ringing artifact in available phase-based view synthesis methods is introduced.
Abstract: We present a novel phase-based approach for reconstructing 4D light field from a micro-baseline stereo pair. Our approach takes advantage of the unique property of complex steerable pyramid filters in micro-baseline stereo. We first introduce a Disparity Assisted Phase based Synthesis (DAPS) strategy that can integrate disparity information into the phase term of a reference image to warp it to its close neighbor views. Based on the DAPS, an “analysis by synthesis” approach is proposed to warp from one of the input binocular images to the other, and iteratively optimize the disparity map to minimize the phase differences between the warped one and the ground truth input. Finally, the densely and regularly spaced, high quality light field images can be reconstructed using the proposed DAPS according to the refined disparity map. Our approach also solves the problems of disparity inconsistency and ringing artifact in available phase-based view synthesis methods. Experimental results demonstrate that our approach substantially improves both the quality of disparity map and light field, compared with the state-of-the-art stereo matching and image based rendering approaches.

99 citations


Journal ArticleDOI
TL;DR: Simulation results show that the novel algorithm has near-optimal compression efficiency with low computational complexity, so that it offers an effective encoding solution for IMVS applications.
Abstract: Several multiview video coding standards have been developed to efficiently compress images from different camera views capturing the same scene by exploiting the spatial, the temporal and the interview correlations. However, the compressed texture and depth data have typically many interview coding dependencies, which may not suit interactive multiview video streaming (IMVS) systems, where the user requests only one view at a time. In this context, this paper proposes an algorithm for the effective selection of the interview prediction structures (PSs) and associated texture and depth quantization parameters (QPs) for IMVS under relevant constraints. These PSs and QPs are selected such that the visual distortion is minimized, given some storage and point-to-point transmission rate constraints, and a user interaction behavior model. Simulation results show that the novel algorithm has near-optimal compression efficiency with low computational complexity, so that it offers an effective encoding solution for IMVS applications.

44 citations


Journal ArticleDOI
TL;DR: First results are provided showing that improvement of compression efficiency is required, as well as depth estimation and view synthesis algorithms improvement, but that the use of SMV appears realistic according to next generation compression technology requirements.
Abstract: Super Multi-View (SMV) video content is composed of tens or hundreds of views that provide a light-field representation of a scene. This representation allows a glass-free visualization and eliminates many causes of discomfort existing in current available 3D video technologies. Efficient video compression of SMV content is a key factor for enabling future 3D video services. This paper first compares several coding configurations for SMV content and several inter-view prediction structures are also tested and compared. The experiments mainly suggest that large differences in coding efficiency can be observed from one configuration to another. Several ratios for the number of coded and synthesized views are compared, both objectively and subjectively. It is reported that view synthesis significantly affects the coding scheme. The amount of views to skip highly depends on the sequence and on the quality of the associated depth maps. Reported ranges of bitrates required to obtain a good quality for the tested SMV content are realistic and coherent with future 4K/8K needs. The reliability of the PSNR metric for SMV content is also studied. Objective and subjective results show that PSNR is able to reflect increase or decrease in subjective quality even in the presence of synthesized views. However, depending on the ratio of coded and synthesized views, the order of magnitude of the effective quality variation is biased by PSNR. Results indicate that PSNR is less tolerant to view synthesis artifacts than human viewers. Finally, preliminary observations are initiated. First, the light-field conversion step does not seem to alter the objective results for compression. Secondly, the motion parallax does not seem to be impacted by specific compression artifacts. The perception of the motion parallax is only altered by variations of the typical compression artifacts along the viewing angle, in cases where the subjective image quality is already low. To the best of our knowledge, this paper is the first to carry out subjective experiments and to report results of SMV compression for light-field 3D displays. It provides first results showing that improvement of compression efficiency is required, as well as depth estimation and view synthesis algorithms improvement, but that the use of SMV appears realistic according to next generation compression technology requirements. HighlightsStudy of the impact of compression on subjective quality for lightfield SMV content.To the best of our knowledge, this paper is the first to report results of this kind.Several SMV coding configurations are compared both objectively and subjectively.Compression efficiency, depth estimation and view synthesis require improvements.SMV appears realistic according to next generation compression technology requirements.

42 citations


Journal ArticleDOI
TL;DR: By utilizing texture found in temporally adjacent frames, this work proposes to fill disocclusions in a faithful way, i.e., using texture that a real camera would observe in place of the virtual camera, to reduce the amount of artifacts introduced into the filling region.
Abstract: Disocclusion filling is a critical problem in depth- based view synthesis. Exposed regions in the target view that correspond to occluded areas in the reference view have to be filled in a meaningful way. Current approaches aim to do this in a plausible way, mostly inspired by image inpainting techniques . However, disocclusion filling is a video-based problem which exhibits more information than just the current frame. By utilizing texture found in temporally adjacent frames, we propose to fill disocclusions in a faithful way, i.e., using texture that a real camera would observe in place of the virtual camera. Only if faithful information is not available we fall back to plausible filling. Our approach is designed for single view video-plus-depth where neighboring camera views are not available for disocclusion filling. In contrast to previous approaches , our method uses superpixels instead of square patches as filling entities to reduce the amount of artifacts introduced into the filling region. Despite its importance , faithfulness has not obtained the due attention yet. Our experiments show that situations are common where a simple plausible filling does not lead to satisfying filling results. Thus, it is important to stress faithful disocclusion filling. Our current work is an attempt in this direction.

37 citations


Journal ArticleDOI
TL;DR: It is shown that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality and compare their respective view synthesis qualities as a function of the compactness of the geometry description.
Abstract: In this paper, we propose a new geometry representation method for multiview image sets. Our approach relies on graphs to describe the multiview geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. Our multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our graph-based representation (GBR) carefully selects the amount of geometry information needed before coding. This is in contrast with depth coding, which directly compresses with losses the original geometry signal, thus making it difficult to quantify the impact of coding errors on geometry-based interpolation. We present the principles of this GBR and we build an efficient coding algorithm to represent it. We compare our GBR approach to classical depth compression methods and compare their respective view synthesis qualities as a function of the compactness of the geometry description. We show that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality. Experimental results demonstrate the potential of this new representation.

35 citations


Proceedings ArticleDOI
01 Sep 2015
TL;DR: An approach to inpaint holes in depth maps that appear when synthesizing virtual views from a RGB-D scenes is proposed based on a superpixel oversegmentation of both the original and synthesized views, which makes the algorithm more robust to inaccurate depth maps.
Abstract: In this paper we propose an approach to inpaint holes in depth maps that appear when synthesizing virtual views from a RGB-D scenes. Based on a superpixel oversegmentation of both the original and synthesized views, the proposed approach efficiently deals with many occlusion situations where most of previous approaches fail. The use of superpixels makes the algorithm more robust to inaccurate depth maps, while giving an efficient way to model the image. Extensive comparisons to relevant state-of-the-art methods show that our approach outperforms qualitatively and quantitavely these existing approaches.

30 citations


Journal ArticleDOI
TL;DR: This paper considers the two DIBR algorithms used in the Moving Picture Experts Group view synthesis reference software, and develops a scheme for the encoder to estimate the distortion of the synthesized virtual view at the decoder when the reference texture and depth sequences experience transmission errors such as packet loss.
Abstract: Depth-image-based rendering (DIBR) is frequently used in multiview video applications such as free-viewpoint television. In this paper, we consider the two DIBR algorithms used in the Moving Picture Experts Group view synthesis reference software, and develop a scheme for the encoder to estimate the distortion of the synthesized virtual view at the decoder when the reference texture and depth sequences experience transmission errors such as packet loss. We first develop a graphical model to analyze how random errors in the reference depth image affect the synthesized virtual view. The warping competition rule adopted in the DIBR algorithms is explicitly represented by the graphical model. We then consider the case where packet loss occurs to both the encoded texture and depth images during transmission and develop a recursive optimal distribution estimation (RODE) method to calculate the per-pixel texture and depth probability distributions in each frame of the reference views. The RODE is then integrated with the graphical model method to estimate the distortion in the synthesized view caused by packet loss. Experimental results verify the accuracy of the graphical model method, the RODE, and the combined estimation scheme.

Proceedings ArticleDOI
TL;DR: A view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views is introduced, which has an improved rate-distortion performance and preserves the structure of the perceived light fields better.
Abstract: Full parallax light field displays require high pixel density and huge amounts of data. Compression is a necessary tool used by 3D display systems to cope with the high bandwidth requirements. One of the formats adopted by MPEG for 3D video coding standards is the use of multiple views with associated depth maps. Depth maps enable the coding of a reduced number of views, and are used by compression and synthesis software to reconstruct the light field. However, most of the developed coding and synthesis tools target linearly arranged cameras with small baselines. Here we propose to use the 3D video coding format for full parallax light field coding. We introduce a view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views. We determine the minimal requirements for view sub-sampling and present the rate-distortion performance of our proposal. We also compare our method with established video compression techniques, such as H.264/AVC, H.264/MVC, and the new 3D video coding algorithm, 3DV-ATM. Our results show that our method not only has an improved rate-distortion performance, it also preserves the structure of the perceived light fields better.

Journal ArticleDOI
TL;DR: An inference-based multiview depth image enhancement algorithm is introduced and investigated and it is shown that this approach consistently improves the quality of virtual views by 0.2 dB to 1.6 dB, depending on thequality of the input multiv view depth imagery.
Abstract: An inference-based multiview depth image enhancement algorithm is introduced and investigated in this paper. Multiview depth imagery plays a pivotal role in free-viewpoint television. This technology requires high-quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows inter-view inconsistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the multiview depth imagery at multiple viewpoints by probabilistic weighting of each depth pixel. First, our approach classifies the color pixels in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further subclustering. Clustering based on generative models is used for assigning probabilistic weights to each depth pixel. Finally, these probabilistic weights are used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach consistently improves the quality of virtual views by 0.2 dB to 1.6 dB, depending on the quality of the input multiview depth imagery.

Proceedings ArticleDOI
30 Jul 2015
TL;DR: The need for new compression technology capable of efficient compression of sparse convergent views of Free-Viewpoint Television systems is demonstrated.
Abstract: We deal with the processing of multiview video acquired by the use of practical thus relatively simple acquisition systems that have a limited number of cameras located around a scene on independent tripods. The real-camera locations are nearly arbitrary as it would be required in the real-world Free-Viewpoint Television systems. The appropriate test video sequences are also reported. We describe a family of original extensions and adaptations of the multiview video processing algorithms adapted to arbitrary camera positions around a scene. The techniques constitute the video processing chain for Free-Viewpoint Television as they are aimed at estimating the parameters of such a multi-camera system, video correction, depth estimation and virtual view synthesis. Moreover, we demonstrate the need for new compression technology capable of efficient compression of sparse convergent views. The experimental results for processing the proposed test sequences are reported.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: An algorithm to estimate the quality of the synthesized images in the absence of the corresponding reference images is presented based upon the cyclopean eye theory, showing excellent correlation results with respect to state-of-the-art full reference image and video quality metrics.
Abstract: In free-viewpoint television (FTV) framework, due to hardware and bandwidth constraints, only a limited number of viewpoints are generally captured, coded and transmitted; therefore, a large number of views needs to be synthesized at the receiver to grant a really immersive 3D experience. It is thus evident that the estimation of the quality of the synthesized views is of paramount importance. Moreover, quality assessment of the synthesized view is very challenging since the corresponding original views are generally not available either on the encoder (not captured) or the decoder side (not transmitted). To tackle the mentioned issues, this paper presents an algorithm to estimate the quality of the synthesized images in the absence of the corresponding reference images. The algorithm is based upon the cyclopean eye theory. The statistical characteristics of an estimated cyclopean image are compared with the synthesized image to measure its quality. The prediction accuracy and reliability of the proposed technique are tested on standard video dataset compressed with HEVC showing excellent correlation results with respect to state-of-the-art full reference image and video quality metrics.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: A comprehensive set of experiments have been carried out to justify the robustness of the proposed scheme over existing schemes with respect to compression of the 3D-HEVC video codec and synthesis view attack.
Abstract: In this paper, a 3D video watermarking scheme is proposed for depth image based rendering (DIBR) based multi view video plus depth (MVD) encoding technique. To make the scheme invariant to view synthesis process in DIBR technique, watermark is inserted in a center view which is rendered from left and right views of a 3D video frame. A low pass center view, obtained from the motion compensated temporal filtering over all the frames of a GOP, is used for embedding to reduce the temporal flickering artifacts. To make the scheme invariant to the DIBR process, 2D DT-DWT block coefficients of low-pass center view are used for embedding by exploiting its shift invariance and directional property. A comprehensive set of experiments have been carried out to justify the robustness of the proposed scheme over existing schemes with respect to compression of the 3D-HEVC video codec and synthesis view attack.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed novel virtual view rendering method based on DIBR can obtain high-quality virtual view images and achieve satisfactory subjective visual effects.
Abstract: DIBR is a promising technology for rendering new views of scenes from a collection of densely sampled images or videos. It has potential application in virtual reality, immersive, advanced visualization, and 3D television systems. However, due to imperfect depth maps and the illumination difference between reference images, annoying artifacts appear in the rendering image. To generate high-quality intermediate virtual viewpoint image, this paper proposes a novel virtual view rendering method based on DIBR. The proposed method consists of four main parts: luminance compensation based on histogram matching, isolated depth pixel removing, 3D warping with depth-based pixel interpolation, and background-based hole filling. Experimental results show that our method can obtain high-quality virtual view images and achieve satisfactory subjective visual effects.

Proceedings ArticleDOI
02 Nov 2015
TL;DR: A depth-aided patch based inpainting method to perform the disocclusion of holes that appear when synthesizing virtual views from RGB-D scenes is proposed, which is efficient compared to state-of-the-art approaches.
Abstract: In this paper we propose a depth-aided patch based inpainting method to perform the disocclusion of holes that appear when synthesizing virtual views from RGB-D scenes. Depth information is added to each key step of the classical patch-based algorithm from [Criminisi et al. 2004] to guide the synthesis of missing structures and textures. These contributions result in a new inpainting method which is efficient compared to state-of-the-art approaches (both in visual quality and computational burden), while requiring only a single easy-to-adjust additional parameter.

Journal ArticleDOI
TL;DR: Simulation results show the good performance of the novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

Proceedings ArticleDOI
30 Jul 2015
TL;DR: This paper proposes an improved DASH-based IMVS scheme over wireless networks that allows virtual views to be generated at either the cloud-based server or the client, and can adaptively select the optimal approach based on the network condition and the cost of the cloud.
Abstract: Interactive multiview video streaming (IMVS) allows viewers to periodically switch viewpoint. Its user experience can be further enhanced by creating virtual views from neighboring coded views using view synthesis techniques. Dynamic adaptive streaming over HTTP (DASH) is a new standard that can adjust the quality of video streaming according to the network condition. In this paper, we propose an improved DASH-based IMVS scheme over wireless networks. The main contributions are twofold. First, our scheme allows virtual views to be generated at either the cloud-based server or the client, and can adaptively select the optimal approach based on the network condition and the cost of the cloud. Second, scalable video coding is used in our system. Simulations with the NS3 tool demonstrate the advantage of our proposed scheme over the existing approach with client-based view synthesis and single-layer video coding.

Journal ArticleDOI
TL;DR: A novel view synthesis algorithm for three-dimensional video based on segmentation using multi-level thresholding method that achieves an average PSNR gain of 0.98 dB for the multi-view test sequences and improves the subjective quality of the synthesized views.
Abstract: In this paper, we present a novel view synthesis algorithm for three-dimensional video. The proposed algorithm is based on segmentation using multi-level thresholding method. Recently, numerous techniques have been suggested which use a 2-D color image and the per-pixel depth map of the scene to create virtual views of the scene from any viewing position. However, inaccuracy in the depth maps cause annoying visual artifacts in depth-based view synthesis. In the proposed method, the depth maps are first preprocessed to avoid the errors caused by wrong depth values. Then, the color images are segmented according to the depth values and the regions belonging to different segments are warped independently. To further enhance the quality of the synthesized views, a multi-level thresholding based ghost removal algorithm and a novel hole filling algorithm have been proposed. Experimental results show that the proposed methods achieve an average PSNR gain of 0.98 dB for the multi-view test sequences and also improve the subjective quality of the synthesized views.

Proceedings ArticleDOI
30 Jul 2015
TL;DR: The experimental results and comparisons show that the proposed view synthesis method allows smoother view reconstruction, while holes due to occlusion and 3D warping are filled with less artifacts.
Abstract: The existing virtual view synthesis methods generate the images with many artifacts that are annoying, especially for forward virtual viewpoint, and virtual viewpoint generated by reference views with large baseline, due to occlusions and the limited sampling density. In this paper, we propose a new view synthesis method, robust to the above-mentioned problem, consist of three steps, using stereo contents. Firstly, view plus depth data of each viewpoint is 3D warped to the virtual viewpoint. We determine which neighboring pixels should be connected or kept isolated. Polygons enclosed by the connected pixels, i.e. superpixel, are interpolated. Secondly, we blend those warped images by comparing each pixel's depth value to obtain the virtual view, in which non-occlusion holes have already been interpolated by the process in the first step. Thirdly, the remaining holes are filled by inpainting. Our experimental results and comparisons show that the proposed view synthesis method allows smoother view reconstruction, while holes due to occlusion and 3D warping are filled with less artifacts.

Proceedings ArticleDOI
08 Jul 2015
TL;DR: A novel, fully automatic method to obtain accurate view synthesis for soccer games that solely relies on feature detection and utilizes the structures visible in a 3D light field to limit the search range of traditional view synthesis methods.
Abstract: In this paper, we propose a novel, fully automatic method to obtain accurate view synthesis for soccer games. Existing methods often make assumptions about the scene. This usually requires manual input and introduces artifacts in situations not handled by those assumptions. Our method does not make assumptions about the scene; it solely relies on feature detection and utilizes the structures visible in a 3D light field to limit the search range of traditional view synthesis methods. A visual comparison between a standard plane sweep, a depth-aware plane sweep and our method is provided, showing that our method provides more accurate results in most cases.

Proceedings ArticleDOI
19 Apr 2015
TL;DR: This paper presents view synthesis optimization for 3D-HEVC based on a new texture smoothness process where lines of pixels are skipped based on the analysis of pixel regularity from smooth texture regions to reduce coding complexity.
Abstract: This paper presents view synthesis optimization for 3D-HEVC based on a new texture smoothness process. In the original method, all pixels are exhaustively rendered to get distortions from synthesized views. Since not all pixels from the distorted depth map may cause distortions in the synthesized view, it brings unnecessary coding complexity. In this paper, lines of pixels are skipped based on the analysis of pixel regularity from smooth texture regions. It is due to the fact that the distorted disparity may not have much effect on the synthesized view in smooth texture regions. The proposed method can reduce the coding complexity of view synthesis optimization without significant performance loss.

Journal ArticleDOI
TL;DR: A view synthesis distortion model is proposed first to indicate the importance of each frame in the depth video, and to achieve a balance between virtual view image quality and buffer constraint, the model is incorporated in the bargain game theoretic model to handle the frame level bit allocation problem for Hierarchical B-picture (HBP).

Journal ArticleDOI
TL;DR: A fast quality metric for depth maps, called fast depth quality metric (FDQM), which efficiently evaluates the impacts of depth map errors on the qualities of synthesized intermediate views in multiview video plus depth applications, without performing the actual view synthesis.
Abstract: We propose a fast quality metric for depth maps, called fast depth quality metric (FDQM), which efficiently evaluates the impacts of depth map errors on the qualities of synthesized intermediate views in multiview video plus depth applications. In other words, the proposed FDQM assesses view synthesis distortions in the depth map domain, without performing the actual view synthesis. First, we estimate the distortions at pixel positions, which are specified by reference disparities and distorted disparities, respectively. Then, we integrate those pixel-wise distortions into an FDQM score by employing a spatial pooling scheme, which considers occlusion effects and the characteristics of human visual attention. As a benchmark of depth map quality assessment, we perform a subjective evaluation test for intermediate views, which are synthesized from compressed depth maps at various bitrates. We compare the subjective results with objective metric scores. Experimental results demonstrate that the proposed FDQM yields highly correlated scores to the subjective ones. Moreover, FDQM requires at least 10 times less computations than conventional quality metrics, since it does not perform the actual view synthesis.

Journal ArticleDOI
TL;DR: This paper introduces a novel and efficient depth- based texture coding scheme that includes depth-based motion vector prediction, block-based view synthesis prediction, and adaptive luminance compensation, which were adopted in an AVC-compatible 3D video coding standard.
Abstract: The target of 3D video coding is to compress Multiview Video plus Depth (MVD) format data, which consist of a texture image and its corresponding depth map. In the MVD format, the depth map plays an important role for successful services in 3D video applications, because it enables the user to experience 3D by generating arbitrary intermediate views. The depth map has a strong correlation with its associated texture data, so it can be utilized to improve texture coding efficiency. This paper introduces a novel and efficient depth-based texture coding scheme. It includes depth-based motion vector prediction, block-based view synthesis prediction, and adaptive luminance compensation, which were adopted in an AVC-compatible 3D video coding standard. Simulation results demonstrate that the proposed scheme reduces the total coding bitrates of texture and depth by 19.06% for the coded PSNR and 17.01% for the synthesized PSNR in a P-I-P view prediction structure, respectively.

Patent
29 Jul 2015
TL;DR: In this article, the authors proposed a virtual view synthesis method based on homographic matrix partition, which consists of calibrating left and right neighboring view cameras to obtain the internal reference matrixes of the left-and right-view cameras and deriving an essential matrix from the basis matrix, performing singular value decomposition on the essential matrix, and computing the motion parameters including a rotation matrix and a translation matrix.
Abstract: The invention discloses a virtual view synthesis method based on homographic matrix partition. The virtual view synthesis method based on homographic matrix partition comprises the following steps of 1) calibrating left and right neighboring view cameras to obtain the internal reference matrixes of the left and right neighboring view cameras and a basis matrix between the left and right neighboring view cameras, deriving an essential matrix from the basis matrix, performing singular value decomposition on the essential matrix, and computing the motion parameters including a rotation matrix and a translation matrix between the left and right neighboring view cameras; 2) performing interpolation division on the rotation matrix and the translation matrix to obtain sub homographic matrixes from left and right neighboring views to a middle virtual view; 3) applying the forward mapping technology to map two view images to a middle virtual view image respectively through the sub homographic matrixes, taking the mapping graph of one of the images as a reference coordinate system and performing interpolation fusion on the mapped two images to synthesize a middle virtual view image. The virtual view synthesis method based on the homographic matrix partition has the advantages of being high in synthesis speed, simple and effective in process and high in practical engineering value.

Proceedings ArticleDOI
01 Oct 2015
TL;DR: This paper proposes an EPI based view rendering framework for 3D video coding solution and identifies the major benefits of such framework, notably in comparison with the traditional local synthesis approach.
Abstract: In current 3D video coding solutions, such as the 3D-HEVC standard, depth data is instrumental to have a continuum of views synthesized at the decoder based on a limited set of coded views. In order view synthesis may be performed at the decoder, depth data is currently directly acquired or estimated at the encoder based on very few neighboring views and transmitted to the decoder after appropriate compression. At the decoder, further views then those decoded are synthesized using again very few neighboring decoded views, thus using a local synthesis approach. A promising alternative synthesis approach may consider not a few but rather all the views available at the decoder, thus offering a scene global approach to synthesis. One way to implement this approach involves cutting the views cube along the viewpoint direction, creating the so-called epipolar plane images (EPI) which provide a rather compact representation of the scene. In this context, this paper proposes an EPI based view rendering framework for 3D video coding solution and identifies the major benefits of such framework, notably in comparison with the traditional local synthesis approach.

Proceedings ArticleDOI
01 Oct 2015
TL;DR: By maintaining sharp but slightly inaccurate object contours, the resulting quality of virtual views synthesized via DIBR exceeds those synthesized using depth images compressed with edge-adaptive codecs that losslessly encode object contour as SI, in particular when the total coding rate budget is low.
Abstract: A depth image provides geometric information of a 3D scene, namely the shapes of physical objects captured from a particular viewpoint. This information is important for synthesizing images corresponding to different virtual camera viewpoints via depth-image-based rendering (DIBR). Since it has been shown that blurring of object contours in the depth images leads to bleeding artefacts in virtual images. The most effective way to compress depth images relies on edge-adaptive image codecs that preserve contours, which are losslessly coded as side information (SI). However, lossless coding of the exact object contours can be expensive. In this paper, we argue that the contours themselves can be suitably approximated to save bits, while the depth images piecewise smooth (PWS) characteristic stays preserved. Specifically, we first propose a metric that estimates contour coding rate based on edge statistics. Given an initial rate estimate, we then pro-actively approximate object contours in a way that guarantees rate reduction when coded using arithmetic edge coding (AEC) as SI. Given the sharp but approximated contours, we finally encode the image using an edge-adaptive image codec with graph Fourier transform (GFT) for edge preservation. We show in our experiments that by maintaining sharp but slightly inaccurate object contours, the resulting quality of virtual views synthesized via DIBR exceeds those synthesized using depth images compressed with edge-adaptive codecs that losslessly encode object contours as SI, in particular when the total coding rate budget is low. This confirms that optimized coding of depth images results in an effective tradeoff in the representation of contour and respective depth information.