scispace - formally typeset
Search or ask a question
Author

Yunwen He

Bio: Yunwen He is an academic researcher from InterDigital, Inc.. The author has contributed to research in topics: Filter bank & Filter (signal processing). The author has an hindex of 1, co-authored 1 publications receiving 16 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The design of ACT is described from several points of view, from theoretical analysis to implementation details, to demonstrate the significant coding gains of ACT in the HEVC SCC Extensions.
Abstract: The screen content coding (SCC) Extensions of high efficiency video coding (HEVC) employs in-loop adaptive color-space transform (ACT) technique to explore the inter-color-component redundancy, i.e., statistical redundancy among different color components. In ACT, the prediction residual signal is adaptively converted into a different color space, i.e., YCgCo. The rate-distortion criteria is employed to decide whether to code the residual signal in the original color space or YCgCo color space. Typically, the inter-color-component correlation could be reduced when ACT is enabled. The residual signal after possible color-space conversion is then coded, following the existing HEVC framework, i.e., transform if necessary, quantization and entropy coded. This paper describes the design of ACT from several points of view, from theoretical analysis to implementation details. Experimental results are also provided to demonstrate the significant coding gains of ACT in the HEVC SCC Extensions.

18 citations

Journal ArticleDOI
01 Mar 2022-Optik
TL;DR: In this paper , a new fusion framework based on Quaternion Non-Subsampled Contourlet Transform (QNSCT) and Guided Filter detail enhancement is designed to address the problems of inconspicuous infrared targets and poor background texture in Infrared and visible image fusion.
Abstract: Image fusion is the process of fusing multiple images of the same scene to obtain a more informative image for human eye perception. In this paper, a new fusion framework based on Quaternion Non-Subsampled Contourlet Transform (QNSCT) and Guided Filter detail enhancement is designed to address the problems of inconspicuous infrared targets and poor background texture in Infrared and visible image fusion. The proposed method uses the quaternion wavelet transform for the first time instead of the traditional Non-Subsampled Pyramid Filter Bank structure in the Non-Subsampled Contourlet Transform (NSCT). The flexible multi-resolution of quaternion wavelet and the multi-directionality of NSCT are fully utilized to refine the multi-scale decomposition scheme. On the other hand, the coefficient matrix obtained from the proposed QNSCT algorithm is fused using a weight refinement algorithm based on the guided filter. The fusion scheme is divided into four steps. First, the Infrared and visible images are decomposed into multi-directional and multiscale coefficient matrices using QNSCT. The experimental results show that the proposed algorithm not only extracts important visual information from the source image, but also preserves the texture information in the scene better. Meanwhile, the scheme outperforms state-of-the-art methods in both subjective and objective evaluations.

4 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, screen content, which is often computer-generated, has many characteristics distinctly different from conventional camera-captured natural scene content, and such characteristic differences impose majo...
Abstract: Screen content, which is often computer-generated, has many characteristics distinctly different from conventional camera-captured natural scene content. Such characteristic differences impose majo...

54 citations

Journal ArticleDOI
TL;DR: This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode.
Abstract: In an increasingly connected world, consumer video experiences have diversified away from traditional broadcast video into new applications with increased use of non-camera-captured content such as computer screen desktop recordings or animations created by computer rendering, collectively referred to as screen content. There has also been increased use of graphics and character content that is rendered and mixed or overlaid together with camera-generated content. The emerging Versatile Video Coding (VVC) standard, in its first version, addresses this market change by the specification of low-level coding tools suitable for screen content. This is in contrast to its predecessor, the High Efficiency Video Coding (HEVC) standard, where highly efficient screen content support is only available in extension profiles of its version 4. This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode. The specification of these coding tools in the first version of VVC enables the VVC reference software implementation (VTM) to achieve average bit-rate savings of about 41% to 61% relative to the HEVC test model (HM) reference software implementation using the Main 10 profile for 4:2:0 screen content test sequences. Compared to the HM using the Screen-Extended Main 10 profile and the same 4:2:0 test sequences, the VTM provides about 19% to 25% bit-rate savings. The same comparison with 4:4:4 test sequences revealed bit-rate savings of about 13% to 27% for $Y'C_{B}C_{R}$ and of about 6% to 14% for $R'G'B'$ screen content. Relative to the HM without the HEVC version 4 screen content coding extensions, the bit-rate savings for 4:4:4 test sequences are about 33% to 64% for $Y'C_{B}C_{R}$ and 43% to 66% for $R'G'B'$ screen content.

30 citations

Journal ArticleDOI
TL;DR: A novel intra prediction algorithm to improve the coding performance of screen content for the emerging Versatile Video Coding (VVC) standard, called in-loop residual coding with scalar quantization, employs in-block pixels as reference rather than the regular out-block ones.
Abstract: A novel intra prediction algorithm is proposed to improve the coding performance of screen content for the emerging Versatile Video Coding (VVC) standard. The algorithm, called in-loop residual coding with scalar quantization, employs in-block pixels as reference rather than the regular out-block ones. To this end, an additional in-loop residual signal is used to partially reconstruct the block at the pixel level, during the prediction. The proposed algorithm is essentially designed to target high detail textures, where deep block partitioning structure is required. Therefore, it is implemented to operate on $4\times 4$ blocks only, where further block split is not allowed and the standard algorithm is still unable to properly predict the texture. Experiments in the Joint Exploration Model (JEM) reference software show that the proposed algorithm brings a Bjontegaard Delta (BD)-rate gain of 13% on synthetic content, with a negligible computational complexity overhead at both encoder and decoder sides.

14 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: An intra coding algorithm with layer separation is proposed, designed on top of an adopted tool in VVC, called Block DPCM (BDPCM), and benefits from texture information in a neighborhood to derive intensity levels of background and foreground layers.
Abstract: An intra coding algorithm with layer separation is proposed. This algorithm is designed on top of an adopted tool in VVC, called Block DPCM (BDPCM), and benefits from texture information in a neighborhood to derive intensity levels of background and foreground layers. This information is used to reduce large rate of residual in case of incorrect layer prediction by BDPCM. For this purpose, three inter-layer transition states are defined that are either implicitly or explicitly conveyed to the decoder. Once a transition is signaled, the decoder corrects the prediction value using the derived layer information. Experiments on screen contents show a BD-rate gain of about 10% percent over VVC Test Model (VTM) and 1% over the regular BDPCM, with the cost of computational complexity.

12 citations

Journal ArticleDOI
TL;DR: It is verified by experimental results that the proposed general DNN architecture possesses higher prediction accuracy and lower computation load than those of conventional DNN architectures and is very suitable for multi-view videos with small correlations and big disparities.
Abstract: Classical video prediction methods exploit directly and shallowly the intra-frame, inter-frame and multi-view similarities within the video sequences; the proposed video prediction methods indirectly and intensively transform the frame correlations into nonlinear mappings by using a general deep neural network (DNN) with single output node. Traditional DNN based video prediction algorithms wholly and coarsely forecast the next frame, but the proposed video prediction algorithms severally and precisely anticipate single pixel of future frame in order to achieve high prediction accuracy and low computation cost. First of all, general DNN based prediction algorithms for intra-frame coding, inter-frame coding and multi-view coding are presented respectively. Then, general DNN based prediction algorithm for unified video coding is raised, which relies on the preceding three prediction algorithms. It is evaluated by simulation experiments that the proposed methods hold better performance than state of the art High Efficiency Video Coding (HEVC) in peak signal to noise ratio (PSNR) and bit per pixel (BPP) in the situation of low bitrate transmission. It is also verified by experimental results that the proposed general DNN architecture possesses higher prediction accuracy and lower computation load than those of conventional DNN architectures. It is further testified by experimental results that the proposed methods are very suitable for multi-view videos with small correlations and big disparities.

10 citations