Showing papers on "View synthesis published in 2015"

PDF

Open Access

Proceedings Article•DOI•

24/7 place recognition by view synthesis

[...]

Akihiko Torii¹, Relja Arandjelovic², Josef Sivic², Masatoshi Okutomi¹, Tomas Pajdla³ - Show less +1 more•Institutions (3)

Tokyo Institute of Technology¹, École Normale Supérieure², Czech Technical University in Prague³

07 Jun 2015

TL;DR: A new place recognition approach is developed that combines an efficient synthesis of novel views with a compact indexable image representation and significantly outperforms other large-scale place recognition techniques on this challenging data.

...read moreread less

Abstract: We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.

...read moreread less

502 citations

Proceedings Article•

Weakly-supervised disentangling with recurrent transformations for 3D view synthesis

[...]

Jimei Yang¹, Scott Reed², Ming-Hsuan Yang¹, Honglak Lee²•Institutions (2)

University of California, Merced¹, University of Michigan²

07 Dec 2015

TL;DR: A novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image and allows the model to capture long-term dependencies along a sequence of transformations.

...read moreread less

Abstract: An important problem for both graphics and vision is to synthesize novel views of a 3D object from a single image. This is particularly challenging due to the partial observability inherent in projecting a 3D object onto the image space, and the ill-posedness of inferring object shape and pose. However, we can train a neural network to address the problem if we restrict our attention to specific object categories (in our case faces and chairs) for which we can gather ample training data. In this paper, we propose a novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image. The recurrent structure allows our model to capture long-term dependencies along a sequence of transformations. We demonstrate the quality of its predictions for human faces on the Multi-PIE dataset and for a dataset of 3D chair models, and also show its ability to disentangle latent factors of variation (e.g., identity and pose) without using full supervision.

...read moreread less

346 citations

Posted Content•

DeepStereo: Learning to Predict New Views from the World's Imagery

[...]

John Flynn, Ivan Neulander, James Philbin, Noah Snavely

22 Jun 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a deep network is trained end-to-end from a large number of posed image sets and the pixels from neighboring views of a scene are presented to the network which then directly produces the pixels of the unseen view.

...read moreread less

Abstract: Deep networks have recently enjoyed enormous success when applied to recognition and classification problems in computer vision, but their use in graphics problems has been limited. In this work, we present a novel deep architecture that performs new view synthesis directly from pixels, trained from a large number of posed image sets. In contrast to traditional approaches which consist of multiple complex stages of processing, each of which require careful tuning and can fail in unexpected ways, our system is trained end-to-end. The pixels from neighboring views of a scene are presented to the network which then directly produces the pixels of the unseen view. The benefits of our approach include generality (we only require posed image sets and can easily apply our method to different domains), and high quality results on traditionally difficult scenes. We believe this is due to the end-to-end nature of our system which is able to plausibly generate pixels according to color, depth, and texture priors learnt automatically from the training data. To verify our method we show that it can convincingly reproduce known test views from nearby imagery. Additionally we show images rendered from novel viewpoints. To our knowledge, our work is the first to apply deep learning to the problem of new view synthesis from sets of real-world, natural imagery.

...read moreread less

180 citations

Journal Article•DOI•

MODS: Fast and robust method for two-view matching☆

[...]

Dmytro Mishkin¹, Jiri Matas¹, Michal Perdoch¹•Institutions (1)

Czech Technical University in Prague¹

01 Dec 2015-Computer Vision and Image Understanding

TL;DR: An improved method for tentative correspondence selection, applicable both with and without view synthesis, and a modification of the standard first to second nearest distance rule increases the number of correct matches by 5–20% at no additional computational cost are introduced.

...read moreread less

158 citations

Proceedings Article•DOI•

Light field from micro-baseline image pair

[...]

Zhoutong Zhang¹, Yebin Liu¹, Qionghai Dai¹•Institutions (1)

Tsinghua University¹

07 Jun 2015

TL;DR: A Disparity Assisted Phase based Synthesis strategy that can integrate disparity information into the phase term of a reference image to warp it to its close neighbor views to solve the problems of disparity inconsistency and ringing artifact in available phase-based view synthesis methods is introduced.

...read moreread less

Abstract: We present a novel phase-based approach for reconstructing 4D light field from a micro-baseline stereo pair. Our approach takes advantage of the unique property of complex steerable pyramid filters in micro-baseline stereo. We first introduce a Disparity Assisted Phase based Synthesis (DAPS) strategy that can integrate disparity information into the phase term of a reference image to warp it to its close neighbor views. Based on the DAPS, an “analysis by synthesis” approach is proposed to warp from one of the input binocular images to the other, and iteratively optimize the disparity map to minimize the phase differences between the warped one and the ground truth input. Finally, the densely and regularly spaced, high quality light field images can be reconstructed using the proposed DAPS according to the refined disparity map. Our approach also solves the problems of disparity inconsistency and ringing artifact in available phase-based view synthesis methods. Experimental results demonstrate that our approach substantially improves both the quality of disparity map and light field, compared with the state-of-the-art stereo matching and image based rendering approaches.

...read moreread less

99 citations

Journal Article•DOI•

Optimizing Multiview Video Plus Depth Prediction Structures for Interactive Multiview Video Streaming

[...]

Ana De Abreu¹, Pascal Frossard¹, Fernando Pereira²•Institutions (2)

École Normale Supérieure¹, Instituto Superior Técnico²

25 Feb 2015-IEEE Journal of Selected Topics in Signal Processing

TL;DR: Simulation results show that the novel algorithm has near-optimal compression efficiency with low computational complexity, so that it offers an effective encoding solution for IMVS applications.

...read moreread less

Abstract: Several multiview video coding standards have been developed to efficiently compress images from different camera views capturing the same scene by exploiting the spatial, the temporal and the interview correlations. However, the compressed texture and depth data have typically many interview coding dependencies, which may not suit interactive multiview video streaming (IMVS) systems, where the user requests only one view at a time. In this context, this paper proposes an algorithm for the effective selection of the interview prediction structures (PSs) and associated texture and depth quantization parameters (QPs) for IMVS under relevant constraints. These PSs and QPs are selected such that the visual distortion is minimized, given some storage and point-to-point transmission rate constraints, and a user interaction behavior model. Simulation results show that the novel algorithm has near-optimal compression efficiency with low computational complexity, so that it offers an effective encoding solution for IMVS applications.

...read moreread less

44 citations

Journal Article•DOI•

Subjective evaluation of Super Multi-View compressed contents on high-end light-field 3D displays

[...]

Antoine Dricot¹, Joel Jung, Marco Cagnazzo¹, Béatrice Pesquet¹, Frederic Dufaux¹, Péter Tamás Kovács², Vamsi Kiran Adhikarla³ - Show less +3 more•Institutions (3)

Institut Mines-Télécom¹, Tampere University of Technology², Pázmány Péter Catholic University³

01 Nov 2015-Signal Processing-image Communication

TL;DR: First results are provided showing that improvement of compression efficiency is required, as well as depth estimation and view synthesis algorithms improvement, but that the use of SMV appears realistic according to next generation compression technology requirements.

...read moreread less

Abstract: Super Multi-View (SMV) video content is composed of tens or hundreds of views that provide a light-field representation of a scene. This representation allows a glass-free visualization and eliminates many causes of discomfort existing in current available 3D video technologies. Efficient video compression of SMV content is a key factor for enabling future 3D video services. This paper first compares several coding configurations for SMV content and several inter-view prediction structures are also tested and compared. The experiments mainly suggest that large differences in coding efficiency can be observed from one configuration to another. Several ratios for the number of coded and synthesized views are compared, both objectively and subjectively. It is reported that view synthesis significantly affects the coding scheme. The amount of views to skip highly depends on the sequence and on the quality of the associated depth maps. Reported ranges of bitrates required to obtain a good quality for the tested SMV content are realistic and coherent with future 4K/8K needs. The reliability of the PSNR metric for SMV content is also studied. Objective and subjective results show that PSNR is able to reflect increase or decrease in subjective quality even in the presence of synthesized views. However, depending on the ratio of coded and synthesized views, the order of magnitude of the effective quality variation is biased by PSNR. Results indicate that PSNR is less tolerant to view synthesis artifacts than human viewers. Finally, preliminary observations are initiated. First, the light-field conversion step does not seem to alter the objective results for compression. Secondly, the motion parallax does not seem to be impacted by specific compression artifacts. The perception of the motion parallax is only altered by variations of the typical compression artifacts along the viewing angle, in cases where the subjective image quality is already low. To the best of our knowledge, this paper is the first to carry out subjective experiments and to report results of SMV compression for light-field 3D displays. It provides first results showing that improvement of compression efficiency is required, as well as depth estimation and view synthesis algorithms improvement, but that the use of SMV appears realistic according to next generation compression technology requirements. HighlightsStudy of the impact of compression on subjective quality for lightfield SMV content.To the best of our knowledge, this paper is the first to report results of this kind.Several SMV coding configurations are compared both objectively and subjectively.Compression efficiency, depth estimation and view synthesis require improvements.SMV appears realistic according to next generation compression technology requirements.

...read moreread less

42 citations

Journal Article•DOI•

Faithful Disocclusion Filling in Depth Image Based Rendering Using Superpixel-Based Inpainting

[...]

Michael Schmeing¹, Xiaoyi Jiang¹•Institutions (1)

University of Münster¹

03 Sep 2015-IEEE Transactions on Multimedia

TL;DR: By utilizing texture found in temporally adjacent frames, this work proposes to fill disocclusions in a faithful way, i.e., using texture that a real camera would observe in place of the virtual camera, to reduce the amount of artifacts introduced into the filling region.

...read moreread less

Abstract: Disocclusion filling is a critical problem in depth- based view synthesis. Exposed regions in the target view that correspond to occluded areas in the reference view have to be filled in a meaningful way. Current approaches aim to do this in a plausible way, mostly inspired by image inpainting techniques . However, disocclusion filling is a video-based problem which exhibits more information than just the current frame. By utilizing texture found in temporally adjacent frames, we propose to fill disocclusions in a faithful way, i.e., using texture that a real camera would observe in place of the virtual camera. Only if faithful information is not available we fall back to plausible filling. Our approach is designed for single view video-plus-depth where neighboring camera views are not available for disocclusion filling. In contrast to previous approaches , our method uses superpixels instead of square patches as filling entities to reduce the amount of artifacts introduced into the filling region. Despite its importance , faithfulness has not obtained the due attention yet. Our experiments show that situations are common where a simple plausible filling does not lead to satisfying filling results. Thus, it is important to stress faithful disocclusion filling. Our current work is an attempt in this direction.

...read moreread less

37 citations

Journal Article•DOI•

Graph-Based Representation for Multiview Image Geometry

[...]

Thomas Maugey¹, Antonio Ortega², Pascal Frossard³•Institutions (3)

French Institute for Research in Computer Science and Automation¹, University of Southern California², École Polytechnique Fédérale de Lausanne³

06 Feb 2015-IEEE Transactions on Image Processing

TL;DR: It is shown that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality and compare their respective view synthesis qualities as a function of the compactness of the geometry description.

...read moreread less

Abstract: In this paper, we propose a new geometry representation method for multiview image sets. Our approach relies on graphs to describe the multiview geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. Our multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our graph-based representation (GBR) carefully selects the amount of geometry information needed before coding. This is in contrast with depth coding, which directly compresses with losses the original geometry signal, thus making it difficult to quantify the impact of coding errors on geometry-based interpolation. We present the principles of this GBR and we build an efficient coding algorithm to represent it. We compare our GBR approach to classical depth compression methods and compare their respective view synthesis qualities as a function of the compactness of the geometry description. We show that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality. Experimental results demonstrate the potential of this new representation.

...read moreread less

35 citations

Proceedings Article•DOI•

Superpixel-based depth map inpainting for RGB-D view synthesis

[...]

Pierre Buyssens¹, Maxime Daisy¹, David Tschumperlé¹, Olivier Lezoray¹•Institutions (1)

University of Caen Lower Normandy¹

01 Sep 2015

TL;DR: An approach to inpaint holes in depth maps that appear when synthesizing virtual views from a RGB-D scenes is proposed based on a superpixel oversegmentation of both the original and synthesized views, which makes the algorithm more robust to inaccurate depth maps.

...read moreread less

Abstract: In this paper we propose an approach to inpaint holes in depth maps that appear when synthesizing virtual views from a RGB-D scenes. Based on a superpixel oversegmentation of both the original and synthesized views, the proposed approach efficiently deals with many occlusion situations where most of previous approaches fail. The use of superpixels makes the algorithm more robust to inaccurate depth maps, while giving an efficient way to model the image. Extensive comparisons to relevant state-of-the-art methods show that our approach outperforms qualitatively and quantitavely these existing approaches.

...read moreread less

30 citations

Journal Article•DOI•

View Synthesis Distortion Estimation With a Graphical Model and Recursive Calculation of Probability Distribution

[...]

Dong Zhang¹, Jie Liang¹•Institutions (1)

Simon Fraser University¹

01 May 2015-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper considers the two DIBR algorithms used in the Moving Picture Experts Group view synthesis reference software, and develops a scheme for the encoder to estimate the distortion of the synthesized virtual view at the decoder when the reference texture and depth sequences experience transmission errors such as packet loss.

...read moreread less

Abstract: Depth-image-based rendering (DIBR) is frequently used in multiview video applications such as free-viewpoint television. In this paper, we consider the two DIBR algorithms used in the Moving Picture Experts Group view synthesis reference software, and develop a scheme for the encoder to estimate the distortion of the synthesized virtual view at the decoder when the reference texture and depth sequences experience transmission errors such as packet loss. We first develop a graphical model to analyze how random errors in the reference depth image affect the synthesized virtual view. The warping competition rule adopted in the DIBR algorithms is explicitly represented by the graphical model. We then consider the case where packet loss occurs to both the encoded texture and depth images during transmission and develop a recursive optimal distribution estimation (RODE) method to calculate the per-pixel texture and depth probability distributions in each frame of the reference views. The RODE is then integrated with the graphical model method to estimate the distortion in the synthesized view caused by packet loss. Experimental results verify the accuracy of the graphical model method, the RODE, and the combined estimation scheme.

...read moreread less

Proceedings Article•DOI•

Depth assisted compression of full parallax light fields

[...]

Danillo B. Graziosi, Zahir Y. Alpaslan, Hussein S. El-Ghoroury

17 Mar 2015-electronic imaging

TL;DR: A view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views is introduced, which has an improved rate-distortion performance and preserves the structure of the perceived light fields better.

...read moreread less

Abstract: Full parallax light field displays require high pixel density and huge amounts of data. Compression is a necessary tool used by 3D display systems to cope with the high bandwidth requirements. One of the formats adopted by MPEG for 3D video coding standards is the use of multiple views with associated depth maps. Depth maps enable the coding of a reduced number of views, and are used by compression and synthesis software to reconstruct the light field. However, most of the developed coding and synthesis tools target linearly arranged cameras with small baselines. Here we propose to use the 3D video coding format for full parallax light field coding. We introduce a view selection method inspired by plenoptic sampling followed by transform-based view coding and view synthesis prediction to code residual views. We determine the minimal requirements for view sub-sampling and present the rate-distortion performance of our proposal. We also compare our method with established video compression techniques, such as H.264/AVC, H.264/MVC, and the new 3D video coding algorithm, 3DV-ATM. Our results show that our method not only has an improved rate-distortion performance, it also preserves the structure of the perceived light fields better.

...read moreread less

Journal Article•DOI•

Probabilistic Multiview Depth Image Enhancement Using Variational Inference

[...]

Pravin Kumar Rana¹, Jalil Taghia¹, Zhanyu Ma², Markus Flierl¹•Institutions (2)

Royal Institute of Technology¹, Beijing University of Posts and Telecommunications²

01 Apr 2015-IEEE Journal of Selected Topics in Signal Processing

TL;DR: An inference-based multiview depth image enhancement algorithm is introduced and investigated and it is shown that this approach consistently improves the quality of virtual views by 0.2 dB to 1.6 dB, depending on thequality of the input multiv view depth imagery.

...read moreread less

Abstract: An inference-based multiview depth image enhancement algorithm is introduced and investigated in this paper. Multiview depth imagery plays a pivotal role in free-viewpoint television. This technology requires high-quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows inter-view inconsistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the multiview depth imagery at multiple viewpoints by probabilistic weighting of each depth pixel. First, our approach classifies the color pixels in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further subclustering. Clustering based on generative models is used for assigning probabilistic weights to each depth pixel. Finally, these probabilistic weights are used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach consistently improves the quality of virtual views by 0.2 dB to 1.6 dB, depending on the quality of the input multiview depth imagery.

...read moreread less

Proceedings Article•DOI•

A practical approach to acquisition and processing of free viewpoint video

[...]

Marek Domanski¹, Adrian Dziembowski¹, Dawid Mieloch¹, Adam Luczak¹, Olgierd Stankiewicz¹, Krzysztof Wegner¹ - Show less +2 more•Institutions (1)

Poznań University of Technology¹

30 Jul 2015

TL;DR: The need for new compression technology capable of efficient compression of sparse convergent views of Free-Viewpoint Television systems is demonstrated.

...read moreread less

Abstract: We deal with the processing of multiview video acquired by the use of practical thus relatively simple acquisition systems that have a limited number of cameras located around a scene on independent tripods. The real-camera locations are nearly arbitrary as it would be required in the real-world Free-Viewpoint Television systems. The appropriate test video sequences are also reported. We describe a family of original extensions and adaptations of the multiview video processing algorithms adapted to arbitrary camera positions around a scene. The techniques constitute the video processing chain for Free-Viewpoint Television as they are aimed at estimating the parameters of such a multi-camera system, video correction, depth estimation and virtual view synthesis. Moreover, we demonstrate the need for new compression technology capable of efficient compression of sparse convergent views. The experimental results for processing the proposed test sequences are reported.

...read moreread less

Proceedings Article•DOI•

Objective quality metric for 3D virtual views

[...]

Muhammad Shahid Farid¹, Maurizio Lucenteforte¹, Marco Grangetto¹•Institutions (1)

University of Turin¹

10 Dec 2015

TL;DR: An algorithm to estimate the quality of the synthesized images in the absence of the corresponding reference images is presented based upon the cyclopean eye theory, showing excellent correlation results with respect to state-of-the-art full reference image and video quality metrics.

...read moreread less

Abstract: In free-viewpoint television (FTV) framework, due to hardware and bandwidth constraints, only a limited number of viewpoints are generally captured, coded and transmitted; therefore, a large number of views needs to be synthesized at the receiver to grant a really immersive 3D experience. It is thus evident that the estimation of the quality of the synthesized views is of paramount importance. Moreover, quality assessment of the synthesized view is very challenging since the corresponding original views are generally not available either on the encoder (not captured) or the decoder side (not transmitted). To tackle the mentioned issues, this paper presents an algorithm to estimate the quality of the synthesized images in the absence of the corresponding reference images. The algorithm is based upon the cyclopean eye theory. The statistical characteristics of an estimated cyclopean image are compared with the synthesized image to measure its quality. The prediction accuracy and reliability of the proposed technique are tested on standard video dataset compressed with HEVC showing excellent correlation results with respect to state-of-the-art full reference image and video quality metrics.

...read moreread less

Proceedings Article•DOI•

3D video watermarking using DT-DWT to resist synthesis view attack

[...]

Shuvendu Rana¹, Arijit Sur¹•Institutions (1)

Indian Institute of Technology Guwahati¹

28 Dec 2015

TL;DR: A comprehensive set of experiments have been carried out to justify the robustness of the proposed scheme over existing schemes with respect to compression of the 3D-HEVC video codec and synthesis view attack.

...read moreread less

Abstract: In this paper, a 3D video watermarking scheme is proposed for depth image based rendering (DIBR) based multi view video plus depth (MVD) encoding technique. To make the scheme invariant to view synthesis process in DIBR technique, watermark is inserted in a center view which is rendered from left and right views of a 3D video frame. A low pass center view, obtained from the motion compensated temporal filtering over all the frames of a GOP, is used for embedding to reduce the temporal flickering artifacts. To make the scheme invariant to the DIBR process, 2D DT-DWT block coefficients of low-pass center view are used for embedding by exploiting its shift invariance and directional property. A comprehensive set of experiments have been carried out to justify the robustness of the proposed scheme over existing schemes with respect to compression of the 3D-HEVC video codec and synthesis view attack.

...read moreread less

Journal Article•DOI•

View generation with DIBR for 3D display system

[...]

Laihua Wang¹, Chunping Hou¹, Jianjun Lei¹, Weiqing Yan¹•Institutions (1)

Tianjin University¹

01 Nov 2015-Multimedia Tools and Applications

TL;DR: Experimental results show that the proposed novel virtual view rendering method based on DIBR can obtain high-quality virtual view images and achieve satisfactory subjective visual effects.

...read moreread less

Abstract: DIBR is a promising technology for rendering new views of scenes from a collection of densely sampled images or videos. It has potential application in virtual reality, immersive, advanced visualization, and 3D television systems. However, due to imperfect depth maps and the illumination difference between reference images, annoying artifacts appear in the rendering image. To generate high-quality intermediate virtual viewpoint image, this paper proposes a novel virtual view rendering method based on DIBR. The proposed method consists of four main parts: luminance compensation based on histogram matching, isolated depth pixel removing, 3D warping with depth-based pixel interpolation, and background-based hole filling. Experimental results show that our method can obtain high-quality virtual view images and achieve satisfactory subjective visual effects.

...read moreread less

Proceedings Article•DOI•

Depth-aware patch-based image disocclusion for virtual view synthesis

[...]

Pierre Buyssens, Maxime Daisy, David Tschumperlé, Olivier Lezoray

02 Nov 2015

TL;DR: A depth-aided patch based inpainting method to perform the disocclusion of holes that appear when synthesizing virtual views from RGB-D scenes is proposed, which is efficient compared to state-of-the-art approaches.

...read moreread less

Abstract: In this paper we propose a depth-aided patch based inpainting method to perform the disocclusion of holes that appear when synthesizing virtual views from RGB-D scenes. Depth information is added to each key step of the classical patch-based algorithm from [Criminisi et al. 2004] to guide the synthesis of missing structures and textures. These contributions result in a new inpainting method which is efficient compared to state-of-the-art approaches (both in visual quality and computational burden), while requiring only a single easy-to-adjust additional parameter.

...read moreread less

Journal Article•DOI•

Optimal layered representation for adaptive interactive multiview video streaming

[...]

Ana De Abreu¹, Laura Toni², Nikolaos Thomos³, Thomas Maugey⁴, Fernando Pereira¹, Pascal Frossard² - Show less +2 more•Institutions (4)

Instituto Superior Técnico¹, École Polytechnique Fédérale de Lausanne², University of Essex³, French Institute for Research in Computer Science and Automation⁴

01 Nov 2015-Journal of Visual Communication and Image Representation

TL;DR: Simulation results show the good performance of the novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

...read moreread less

Proceedings Article•DOI•

A cloud-assisted DASH-based Scalable Interactive Multiview Video Streaming framework

[...]

Mincheng Zhao¹, Xiangyang Gong¹, Jie Liang², Jia Guo¹, Wendong Wang¹, Xirong Que¹, Shiduan Cheng¹ - Show less +3 more•Institutions (2)

Beijing University of Posts and Telecommunications¹, Simon Fraser University²

30 Jul 2015

TL;DR: This paper proposes an improved DASH-based IMVS scheme over wireless networks that allows virtual views to be generated at either the cloud-based server or the client, and can adaptively select the optimal approach based on the network condition and the cost of the cloud.

...read moreread less

Abstract: Interactive multiview video streaming (IMVS) allows viewers to periodically switch viewpoint. Its user experience can be further enhanced by creating virtual views from neighboring coded views using view synthesis techniques. Dynamic adaptive streaming over HTTP (DASH) is a new standard that can adjust the quality of video streaming according to the network condition. In this paper, we propose an improved DASH-based IMVS scheme over wireless networks. The main contributions are twofold. First, our scheme allows virtual views to be generated at either the cloud-based server or the client, and can adaptively select the optimal approach based on the network condition and the cost of the cloud. Second, scalable video coding is used in our system. Simulations with the NS3 tool demonstrate the advantage of our proposed scheme over the existing approach with client-based view synthesis and single-layer video coding.

...read moreread less

Journal Article•DOI•

Segmentation-based view synthesis for multi-view video plus depth

[...]

Maziar Loghman¹, Joohee Kim¹•Institutions (1)

Illinois Institute of Technology¹

01 Mar 2015-Multimedia Tools and Applications

TL;DR: A novel view synthesis algorithm for three-dimensional video based on segmentation using multi-level thresholding method that achieves an average PSNR gain of 0.98 dB for the multi-view test sequences and improves the subjective quality of the synthesized views.

...read moreread less

Abstract: In this paper, we present a novel view synthesis algorithm for three-dimensional video. The proposed algorithm is based on segmentation using multi-level thresholding method. Recently, numerous techniques have been suggested which use a 2-D color image and the per-pixel depth map of the scene to create virtual views of the scene from any viewing position. However, inaccuracy in the depth maps cause annoying visual artifacts in depth-based view synthesis. In the proposed method, the depth maps are first preprocessed to avoid the errors caused by wrong depth values. Then, the color images are segmented according to the depth values and the regions belonging to different segments are warped independently. To further enhance the quality of the synthesized views, a multi-level thresholding based ghost removal algorithm and a novel hole filling algorithm have been proposed. Experimental results show that the proposed methods achieve an average PSNR gain of 0.98 dB for the multi-view test sequences and also improve the subjective quality of the synthesized views.

...read moreread less

Proceedings Article•DOI•

View synthesis using superpixel based inpainting capable of occlusion handling and hole filling

[...]

Tomoyuki Tezuka¹, Mehrdad Panahpour Tehrani¹, Kazuyoshi Suzuki¹, Keita Takahashi¹, Toshiaki Fujii¹ - Show less +1 more•Institutions (1)

Nagoya University¹

30 Jul 2015

TL;DR: The experimental results and comparisons show that the proposed view synthesis method allows smoother view reconstruction, while holes due to occlusion and 3D warping are filled with less artifacts.

...read moreread less

Abstract: The existing virtual view synthesis methods generate the images with many artifacts that are annoying, especially for forward virtual viewpoint, and virtual viewpoint generated by reference views with large baseline, due to occlusions and the limited sampling density. In this paper, we propose a new view synthesis method, robust to the above-mentioned problem, consist of three steps, using stereo contents. Firstly, view plus depth data of each viewpoint is 3D warped to the virtual viewpoint. We determine which neighboring pixels should be connected or kept isolated. Polygons enclosed by the connected pixels, i.e. superpixel, are interpolated. Secondly, we blend those warped images by comparing each pixel's depth value to obtain the virtual view, in which non-occlusion holes have already been interpolated by the process in the first step. Thirdly, the remaining holes are filled by inpainting. Our experimental results and comparisons show that the proposed view synthesis method allows smoother view reconstruction, while holes due to occlusion and 3D warping are filled with less artifacts.

...read moreread less

Proceedings Article•DOI•

Multi-camera epipolar plane image feature detection for robust view synthesis

[...]

Lode Jorissen¹, Patrik Goorts¹, Sammy Rogmans¹, Gauthier Lafruit², Philippe Bekaert¹ - Show less +1 more•Institutions (2)

University of Hasselt¹, Université libre de Bruxelles²

08 Jul 2015

TL;DR: A novel, fully automatic method to obtain accurate view synthesis for soccer games that solely relies on feature detection and utilizes the structures visible in a 3D light field to limit the search range of traditional view synthesis methods.

...read moreread less

Abstract: In this paper, we propose a novel, fully automatic method to obtain accurate view synthesis for soccer games. Existing methods often make assumptions about the scene. This usually requires manual input and introduces artifacts in situations not handled by those assumptions. Our method does not make assumptions about the scene; it solely relies on feature detection and utilizes the structures visible in a 3D light field to limit the search range of traditional view synthesis methods. A visual comparison between a standard plane sweep, a depth-aware plane sweep and our method is provided, showing that our method provides more accurate results in most cases.

...read moreread less

Proceedings Article•DOI•

View synthesis optimization based on texture smoothness for 3D-HEVC

[...]

Huan Dou¹, Yui-Lam Chan², Ke-Bin Jia¹, Wan-Chi Siu²•Institutions (2)

Beijing University of Technology¹, Hong Kong Polytechnic University²

19 Apr 2015

TL;DR: This paper presents view synthesis optimization for 3D-HEVC based on a new texture smoothness process where lines of pixels are skipped based on the analysis of pixel regularity from smooth texture regions to reduce coding complexity.

...read moreread less

Abstract: This paper presents view synthesis optimization for 3D-HEVC based on a new texture smoothness process. In the original method, all pixels are exhaustively rendered to get distortions from synthesized views. Since not all pixels from the distorted depth map may cause distortions in the synthesized view, it brings unnecessary coding complexity. In this paper, lines of pixels are skipped based on the analysis of pixel regularity from smooth texture regions. It is due to the fact that the distorted disparity may not have much effect on the synthesized view in smooth texture regions. The proposed method can reduce the coding complexity of view synthesis optimization without significant performance loss.

...read moreread less

Journal Article•DOI•

View synthesis distortion model based frame level rate control optimization for multiview depth video coding

[...]

Xu Wang¹, Sam Kwong¹, Hui Yuan², Yun Zhang², Zhaoqing Pan³ - Show less +1 more•Institutions (3)

City University of Hong Kong¹, Chinese Academy of Sciences², Hebei University of Technology³

01 Jul 2015-Signal Processing

TL;DR: A view synthesis distortion model is proposed first to indicate the importance of each frame in the depth video, and to achieve a balance between virtual view image quality and buffer constraint, the model is incorporated in the bargain game theoretic model to handle the frame level bit allocation problem for Hierarchical B-picture (HBP).

...read moreread less

Journal Article•DOI•

FDQM: Fast Quality Metric for Depth Maps Without View Synthesis

[...]

Won-Dong Jang¹, Tae-Young Chung², Jae-Young Sim³, Chang-Su Kim¹•Institutions (3)

Korea University¹, Samsung², Ulsan National Institute of Science and Technology³

01 Jul 2015-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A fast quality metric for depth maps, called fast depth quality metric (FDQM), which efficiently evaluates the impacts of depth map errors on the qualities of synthesized intermediate views in multiview video plus depth applications, without performing the actual view synthesis.

...read moreread less

Abstract: We propose a fast quality metric for depth maps, called fast depth quality metric (FDQM), which efficiently evaluates the impacts of depth map errors on the qualities of synthesized intermediate views in multiview video plus depth applications. In other words, the proposed FDQM assesses view synthesis distortions in the depth map domain, without performing the actual view synthesis. First, we estimate the distortions at pixel positions, which are specified by reference disparities and distorted disparities, respectively. Then, we integrate those pixel-wise distortions into an FDQM score by employing a spatial pooling scheme, which considers occlusion effects and the characteristics of human visual attention. As a benchmark of depth map quality assessment, we perform a subjective evaluation test for intermediate views, which are synthesized from compressed depth maps at various bitrates. We compare the subjective results with objective metric scores. Experimental results demonstrate that the proposed FDQM yields highly correlated scores to the subjective ones. Moreover, FDQM requires at least 10 times less computations than conventional quality metrics, since it does not perform the actual view synthesis.

...read moreread less

Journal Article•DOI•

Depth-Based Texture Coding in AVC-Compatible 3D Video Coding

[...]

Jin Young Lee¹, Jian-Liang Lin², Yi-Wen Chen², Yu-Lin Chang², Igor Mironovich Kovliga¹, Alexey Mikhailovich Fartukov¹, Mikhail Mishurovskiy¹, Ho-Cheon Wey¹, Yu-Wen Huang², Shaw-Min Lei³ - Show less +6 more•Institutions (3)

Samsung¹, MediaTek², KAIST³

01 Aug 2015-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper introduces a novel and efficient depth- based texture coding scheme that includes depth-based motion vector prediction, block-based view synthesis prediction, and adaptive luminance compensation, which were adopted in an AVC-compatible 3D video coding standard.

...read moreread less

Abstract: The target of 3D video coding is to compress Multiview Video plus Depth (MVD) format data, which consist of a texture image and its corresponding depth map. In the MVD format, the depth map plays an important role for successful services in 3D video applications, because it enables the user to experience 3D by generating arbitrary intermediate views. The depth map has a strong correlation with its associated texture data, so it can be utilized to improve texture coding efficiency. This paper introduces a novel and efficient depth-based texture coding scheme. It includes depth-based motion vector prediction, block-based view synthesis prediction, and adaptive luminance compensation, which were adopted in an AVC-compatible 3D video coding standard. Simulation results demonstrate that the proposed scheme reduces the total coding bitrates of texture and depth by 19.06% for the coded PSNR and 17.01% for the synthesized PSNR in a P-I-P view prediction structure, respectively.

...read moreread less

Patent•

Virtual view synthesis method based on homographic matrix partition

[...]

Feng Ying, Zhang Xin, Du Juan, Chen Xinkai, Subhash Rakheja - Show less +1 more

29 Jul 2015

TL;DR: In this article, the authors proposed a virtual view synthesis method based on homographic matrix partition, which consists of calibrating left and right neighboring view cameras to obtain the internal reference matrixes of the left-and right-view cameras and deriving an essential matrix from the basis matrix, performing singular value decomposition on the essential matrix, and computing the motion parameters including a rotation matrix and a translation matrix.

...read moreread less

Abstract: The invention discloses a virtual view synthesis method based on homographic matrix partition. The virtual view synthesis method based on homographic matrix partition comprises the following steps of 1) calibrating left and right neighboring view cameras to obtain the internal reference matrixes of the left and right neighboring view cameras and a basis matrix between the left and right neighboring view cameras, deriving an essential matrix from the basis matrix, performing singular value decomposition on the essential matrix, and computing the motion parameters including a rotation matrix and a translation matrix between the left and right neighboring view cameras; 2) performing interpolation division on the rotation matrix and the translation matrix to obtain sub homographic matrixes from left and right neighboring views to a middle virtual view; 3) applying the forward mapping technology to map two view images to a middle virtual view image respectively through the sub homographic matrixes, taking the mapping graph of one of the images as a reference coordinate system and performing interpolation fusion on the mapped two images to synthesize a middle virtual view image. The virtual view synthesis method based on the homographic matrix partition has the advantages of being high in synthesis speed, simple and effective in process and high in practical engineering value.

...read moreread less

Proceedings Article•DOI•

Epipolar plane image based rendering for 3D video coding

[...]

Catarina Brites¹, Joao Ascenso¹, Fernando Pereira¹•Institutions (1)

Instituto Superior Técnico¹

01 Oct 2015

TL;DR: This paper proposes an EPI based view rendering framework for 3D video coding solution and identifies the major benefits of such framework, notably in comparison with the traditional local synthesis approach.

...read moreread less

Abstract: In current 3D video coding solutions, such as the 3D-HEVC standard, depth data is instrumental to have a continuum of views synthesized at the decoder based on a limited set of coded views. In order view synthesis may be performed at the decoder, depth data is currently directly acquired or estimated at the encoder based on very few neighboring views and transmitted to the decoder after appropriate compression. At the decoder, further views then those decoded are synthesized using again very few neighboring decoded views, thus using a local synthesis approach. A promising alternative synthesis approach may consider not a few but rather all the views available at the decoder, thus offering a scene global approach to synthesis. One way to implement this approach involves cutting the views cube along the viewpoint direction, creating the so-called epipolar plane images (EPI) which provide a rather compact representation of the scene. In this context, this paper proposes an EPI based view rendering framework for 3D video coding solution and identifies the major benefits of such framework, notably in comparison with the traditional local synthesis approach.

...read moreread less

Proceedings Article•DOI•

Contour approximation & depth image coding for virtual view synthesis

[...]

Yuan Yuan¹, Gene Cheung², Pascal Frossard³, Patrick Le Callet⁴, Vicky Zhao¹ - Show less +1 more•Institutions (4)

University of Alberta¹, Graduate University for Advanced Studies², École Polytechnique Fédérale de Lausanne³, University of Nantes⁴

01 Oct 2015

TL;DR: By maintaining sharp but slightly inaccurate object contours, the resulting quality of virtual views synthesized via DIBR exceeds those synthesized using depth images compressed with edge-adaptive codecs that losslessly encode object contour as SI, in particular when the total coding rate budget is low.

...read moreread less

Abstract: A depth image provides geometric information of a 3D scene, namely the shapes of physical objects captured from a particular viewpoint. This information is important for synthesizing images corresponding to different virtual camera viewpoints via depth-image-based rendering (DIBR). Since it has been shown that blurring of object contours in the depth images leads to bleeding artefacts in virtual images. The most effective way to compress depth images relies on edge-adaptive image codecs that preserve contours, which are losslessly coded as side information (SI). However, lossless coding of the exact object contours can be expensive. In this paper, we argue that the contours themselves can be suitably approximated to save bits, while the depth images piecewise smooth (PWS) characteristic stays preserved. Specifically, we first propose a metric that estimates contour coding rate based on edge statistics. Given an initial rate estimate, we then pro-actively approximate object contours in a way that guarantees rate reduction when coded using arithmetic edge coding (AEC) as SI. Given the sharp but approximated contours, we finally encode the image using an edge-adaptive image codec with graph Fourier transform (GFT) for edge preservation. We show in our experiments that by maintaining sharp but slightly inaccurate object contours, the resulting quality of virtual views synthesized via DIBR exceeds those synthesized using depth images compressed with edge-adaptive codecs that losslessly encode object contours as SI, in particular when the total coding rate budget is low. This confirms that optimized coding of depth images results in an effective tradeoff in the representation of contour and respective depth information.

...read moreread less