scispace - formally typeset
Search or ask a question
Author

Yannick Morvan

Other affiliations: Philips
Bio: Yannick Morvan is an academic researcher from Eindhoven University of Technology. The author has contributed to research in topics: Rendering (computer graphics) & Image compression. The author has an hindex of 13, co-authored 34 publications receiving 1060 citations. Previous affiliations of Yannick Morvan include Philips.

Papers
More filters
Journal ArticleDOI
TL;DR: The results show that-although its rate-distortion (R-D) performance is worse-platelet-based depth coding outperforms H.264, due to improved sharp edge preservation.
Abstract: This article investigates the interaction between different techniques for depth compression and view synthesis rendering with multiview video plus scene depth data Two different approaches for depth coding are compared, namely H264/MVC, using temporal and inter-view reference images for efficient prediction, and the novel platelet-based coding algorithm, characterized by being adapted to the special characteristics of depth-images Since depth-images are a 2D representation of the 3D scene geometry, depth-image errors lead to geometry distortions Therefore, the influence of geometry distortions resulting from coding artifacts is evaluated for both coding approaches in two different ways First, the variation of 3D surface meshes is analyzed using the Hausdorff distance and second, the distortion is evaluated for 2D view synthesis rendering, where color and depth information are used together to render virtual intermediate camera views of the scene The results show that-although its rate-distortion (R-D) performance is worse-platelet-based depth coding outperforms H264, due to improved sharp edge preservation Therefore, depth coding needs to be evaluated with respect to geometry distortions

287 citations

Proceedings ArticleDOI
12 Nov 2007
TL;DR: A novel depth-image coding algorithm that concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges that improves the resulting quality of compressed depth images by 1.5-4 dB when compared to a JPEG-2000 encoder.
Abstract: This paper presents a novel depth-image coding algorithm that concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using piecewise-linear functions and sharp edges by a straight line. To define the area of support for each modeling function, we employ a quadtree decomposition that divides the image into blocks of variable size, each block being approximated by one modeling function containing one or two surfaces. The subdivision of the quadtree and the selection of the type of modeling function is optimized such that a global rate-distortion trade-off is realized. Additionally, we present a predictive coding scheme that improves the coding performance of the quadtree decomposition by exploiting the correlation between each block of the quadtree. Experimental results show that the described technique improves the resulting quality of compressed depth images by 1.5-4 dB when compared to a JPEG-2000 encoder.

140 citations

Proceedings ArticleDOI
TL;DR: An algorithm that models depth maps using piecewise-linear functions (platelets) to adapt to varying scene detail and can improve the resulting picture quality after compression of depth maps by 1-3 dB when compared to a JPEG-2000 encoder.
Abstract: Emerging 3-D displays show several views of the scene simultaneously. A direct transmission of a selection of these views is impractical, because various types of displays support a different number of views and the decoder has to interpolate the intermediate views. The transmission of multiview image information can be simplified by only transmitting the texture data for the central view and a corresponding depth map. Additional to the coding of the texture data, this technique requires the efficient coding of depth maps. Since the depth map represents the scene geometry and thereby covers the 3-D perception of the scene, sharp edges corresponding to object boundaries, should be preserved. We propose an algorithm that models depth maps using piecewise-linear functions (platelets). To adapt to varying scene detail, we employ a quadtree decomposition that divides the image into blocks of variable size, each block being approximated by one platelet. In order to preserve sharp object boundaries, the support area of each platelet is adapted to the object boundary. The subdivision of the quadtree and the selection of the platelet type are optimized such that a global rate-distortion trade-off is realized. Experimental results show that the described method can improve the resulting picture quality after compression of depth maps by 1-3 dB when compared to a JPEG-2000 encoder.

127 citations

DOI
01 Jan 2009
TL;DR: A new multi-view depth-estimation technique is proposed, employing a one-dimensional optimization strategy that reduces the noise level in the estimated depth images and enforces consistent depth images across the views, and is suitable for execution on a standard Graphics Processor Unit (GPU).
Abstract: Three-dimensional (3D) video and imaging technologies is an emerging trend in the development of digital video systems, as we presently witness the appearance of 3D displays, coding systems, and 3D camera setups. Three-dimensional multi-view video is typically obtained from a set of synchronized cameras, which are capturing the same scene from different viewpoints. This technique especially enables applications such as freeviewpoint video or 3D-TV. Free-viewpoint video applications provide the feature to interactively select and render a virtual viewpoint of the scene. A 3D experience such as for example in 3D-TV is obtained if the data representation and display enable to distinguish the relief of the scene, i.e., the depth within the scene. With 3D-TV, the depth of the scene can be perceived using a multi-view display that renders simultaneously several views of the same scene. To render these multiple views on a remote display, an efficient transmission, and thus compression of the multi-view video is necessary. However, a major problem when dealing with multiview video is the intrinsically large amount of data to be compressed, decompressed and rendered. We aim at an efficient and flexible multi-view video system, and explore three different aspects. First, we develop an algorithm for acquiring a depth signal from a multi-view setup. Second, we present efficient 3D rendering algorithms for a multi-view signal. Third, we propose coding techniques for 3D multi-view signals, based on the use of an explicit depth signal. This motivates that the thesis is divided in three parts. The first part (Chapter 3) addresses the problem of 3D multi-view video acquisition. Multi-view video acquisition refers to the task of estimating and recording a 3D geometric description of the scene. A 3D description of the scene can be represented by a so-called depth image, which can be estimated by triangulation of the corresponding pixels in the multiple views. Initially, we focus on the problem of depth estimation using two views, and present the basic geometric model that enables the triangulation of corresponding pixels across the views. Next, we review two calculation/optimization strategies for determining corresponding pixels: a local and a one-dimensional optimization strategy. Second, to generalize from the two-view case, we introduce a simple geometric model for estimating the depth using multiple views simultaneously. Based on this geometric model, we propose a new multi-view depth-estimation technique, employing a one-dimensional optimization strategy that (1) reduces the noise level in the estimated depth images and (2) enforces consistent depth images across the views. The second part (Chapter 4) details the problem of multi-view image rendering. Multi-view image rendering refers to the process of generating synthetic images using multiple views. Two different rendering techniques are initially explored: a 3D image warping and a mesh-based rendering technique. Each of these methods has its limitations and suffers from either high computational complexity or low image rendering quality. As a consequence, we present two image-based rendering algorithms that improves the balance on the aforementioned issues. First, we derive an alternative formulation of the relief texture algorithm which was extented to the geometry of multiple views. The proposed technique features two advantages: it avoids rendering artifacts ("holes") in the synthetic image and it is suitable for execution on a standard Graphics Processor Unit (GPU). Second, we propose an inverse mapping rendering technique that allows a simple and accurate re-sampling of synthetic pixels. Experimental comparisons with 3D image warping show an improvement of rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique. The third part concentrates on the compression problem of multi-view texture and depth video (Chapters 5–7). In Chapter 5, we extend the standard H.264/MPEG-4 AVC video compression algorithm for handling the compression of multi-view video. As opposed to the Multi-view Video Coding (MVC) standard that encodes only the multi-view texture data, the proposed encoder peforms the compression of both the texture and the depth multi-view sequences. The proposed extension is based on exploiting the correlation between the multiple camera views. To this end, two different approaches for predictive coding of views have been investigated: a block-based disparity-compensated prediction technique and a View Synthesis Prediction (VSP) scheme. Whereas VSP relies on an accurate depth image, the block-based disparity-compensated prediction scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4AVC disparitycompensated prediction. Additionally, we discuss the trade-off between the random-access to a user-selected view and the coding efficiency. Experimental results illustrating and quantifying this trade-off are provided. In Chapter 6, we focus on the compression of a depth signal. We present a novel depth image coding algorithm which concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using parameterized piecewiselinear functions and sharp edges by a straight line, so that it is more efficient than a conventional transform-based encoder. To optimize the quality of the coding system for a given bit rate, a special global rate-distortion optimization balances the rate against the accuracy of the signal representation. For typical bit rates, i.e., between 0.01 and 0.25 bit/pixel, experiments have revealed that the coder outperforms a standard JPEG-2000 encoder by 0.6-3.0 dB. Preliminary results were published in the Proceedings of 26th Symposium on Information Theory in the Benelux. In Chapter 7, we propose a novel joint depth-texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves, to obtain a single R-D surface that allows the optimization of the joint bit-allocation in relation to the obtained rendering quality. Experimental results show an estimated gain of 1 dB compared to a compression performed without joint bit-allocation optimization. Besides this, our joint R-D model can be readily integrated into an multi-view H.264/MPEG-4 AVC coder because it yields the optimal compression setting with a limited computation effort.

83 citations

Proceedings ArticleDOI
28 May 2008
TL;DR: The results of this evaluation show, that the coding algorithm specialized on the characteristics of depth images outperforms H.264 intra-coding, although its RD-performance is worse.
Abstract: This paper presents a comparative study on different techniques for depth-image compression and its implications on the quality of multiview video plus depth virtual view rendering. A novel coding algorithm for depth images that concentrates on their special characteristics, namely smooth regions delineated by sharp edges, is compared to H.264 intra-coding with depth- images. These two coding techniques are evaluated in the context of multiview video plus depth representations, where depth information is used to render virtual intermediate camera views of the scene. Therefore it is important to evaluate the influence of depth-image coding artifacts on the quality of rendered virtual views. The results of this evaluation show, that the coding algorithm specialized on the characteristics of depth images outperforms H.264 intra-coding, although its RD-performance is worse.

78 citations


Cited by
More filters
01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Journal ArticleDOI
01 Apr 2011
TL;DR: This paper describes efficient coding methods for video and depth data, and synthesis methods are presented, which mitigate errors from depth estimation and coding, for the generation of views.
Abstract: Current 3-D video (3DV) technology is based on stereo systems. These systems use stereo video coding for pictures delivered by two input cameras. Typically, such stereo systems only reproduce these two camera views at the receiver and stereoscopic displays for multiple viewers require wearing special 3-D glasses. On the other hand, emerging autostereoscopic multiview displays emit a large numbers of views to enable 3-D viewing for multiple users without requiring 3-D glasses. For representing a large number of views, a multiview extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views. However, since the quality improvement of multiview displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Such a format is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis. This paper describes efficient coding methods for video and depth data. For the generation of views, synthesis methods are presented, which mitigate errors from depth estimation and coding.

420 citations

Journal ArticleDOI
TL;DR: Critical insights into the recent development of texture analysis for quantifying the heterogeneity in PET/CT images are provided, issues and challenges are identified, and a list of correct formulae for usual features and recommendations regarding implementation are provided.
Abstract: After seminal papers over the period 2009 – 2011, the use of texture analysis of PET/CT images for quantification of intratumour uptake heterogeneity has received increasing attention in the last 4 years. Results are difficult to compare due to the heterogeneity of studies and lack of standardization. There are also numerous challenges to address. In this review we provide critical insights into the recent development of texture analysis for quantifying the heterogeneity in PET/CT images, identify issues and challenges, and offer recommendations for the use of texture analysis in clinical research. Numerous potentially confounding issues have been identified, related to the complex workflow for the calculation of textural features, and the dependency of features on various factors such as acquisition, image reconstruction, preprocessing, functional volume segmentation, and methods of establishing and quantifying correspondences with genomic and clinical metrics of interest. A lack of understanding of what the features may represent in terms of the underlying pathophysiological processes and the variability of technical implementation practices makes comparing results in the literature challenging, if not impossible. Since progress as a field requires pooling results, there is an urgent need for standardization and recommendations/guidelines to enable the field to move forward. We provide a list of correct formulae for usual features and recommendations regarding implementation. Studies on larger cohorts with robust statistical analysis and machine learning approaches are promising directions to evaluate the potential of this approach.

379 citations

Journal ArticleDOI
TL;DR: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data, and develops and integrated a novel encoder control that guarantees that high quality intermediate views can be generated based on the decoded data.
Abstract: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter, and inter-view residual prediction for coding of the dependent video views are developed and integrated. Furthermore, for depth coding, new intra coding modes, a modified motion compensation and motion vector coding as well as the concept of motion parameter inheritance are part of the HEVC extension. A novel encoder control uses view synthesis optimization, which guarantees that high quality intermediate views can be generated based on the decoded data. The bitstream format supports the extraction of partial bitstreams, so that conventional 2D video, stereo video, and the full multi-view video plus depth format can be decoded from a single bitstream. Objective and subjective results are presented, demonstrating that the proposed approach provides 50% bit rate savings in comparison with HEVC simulcast and 20% in comparison with a straightforward multi-view extension of HEVC without the newly developed coding tools.

365 citations

Patent
25 Sep 2013
TL;DR: In this article, a method for pixel-wise joint filtering of depth maps from a plurality of viewing angles is described, which enables to suppress the noise in the depth map data and provides improved performance for a view synthesis.
Abstract: There is disclosed a method, an apparatus, a server, a client and a non-transitory computer readable medium comprising a computer program stored therein for video coding and decoding. Depth pictures from a plurality of viewing angles are projected into a single viewing angle, making it possible to have pixel-wise joint filtering to be applied to all projected depth values. This approach enables to suppress the noise in the depth map data and provides improved performance for a view synthesis.

354 citations