scispace - formally typeset
Search or ask a question
Author

Shanshe Wang

Bio: Shanshe Wang is an academic researcher from Peking University. The author has contributed to research in topics: Computer science & Coding (social sciences). The author has an hindex of 18, co-authored 136 publications receiving 1188 citations. Previous affiliations of Shanshe Wang include City University of Hong Kong & Harbin Institute of Technology.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: The evolution and development of neural network-based compression methodologies are introduced for images and video respectively and the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision.
Abstract: In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network-based image and video compression techniques. The evolution and development of neural network-based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptors in the age of artificial intelligence.

235 citations

Journal ArticleDOI
TL;DR: This paper quantitatively analyzes the structure of the proposed CNN model from multiple dimensions to make the model interpretable and optimal for CNN-based loop filtering for high-efficiency video coding (HEVC).
Abstract: Recently, convolutional neural network (CNN) has attracted tremendous attention and has achieved great success in many image processing tasks. In this paper, we focus on CNN technology combined with image restoration to facilitate video coding performance and propose the content-aware CNN based in-loop filtering for high-efficiency video coding (HEVC). In particular, we quantitatively analyze the structure of the proposed CNN model from multiple dimensions to make the model interpretable and optimal for CNN-based loop filtering. More specifically, each coding tree unit (CTU) is treated as an independent region for processing, such that the proposed content-aware multimodel filtering mechanism is realized by the restoration of different regions with different CNN models under the guidance of the discriminative network. To adapt the image content, the discriminative neural network is learned to analyze the content characteristics of each region for the adaptive selection of the deep learning model. The CTU level control is also enabled in the sense of rate-distortion optimization. To learn the CNN model, an iterative training method is proposed by simultaneously labeling filter categories at the CTU level and fine-tuning the CNN model parameters. The CNN based in-loop filter is implemented after sample adaptive offset in HEVC, and extensive experiments show that the proposed approach significantly improves the coding performance and achieves up to 10.0% bit-rate reduction. On average, 4.1%, 6.0%, 4.7%, and 6.0% bit-rate reduction can be obtained under all intra, low delay, low delay P, and random access configurations, respectively.

171 citations

Journal ArticleDOI
TL;DR: Experimental results demonstrate the proposed Rate-GOP based rate control has much better R-D performance than the two state-of-the-art rate control schemes for HEVC.
Abstract: In this paper, a Rate-GOP based frame level rate control scheme is proposed for High Efficiency Video Coding (HEVC). The proposed scheme is developed with the consideration of the new coding tools adopted into HEVC, including the quad-tree coding structure and the new reference frame selection mechanism, called reference picture set (RPS). The contributions of this paper mainly include the following three aspects. Firstly, a RPS based hierarchical rate control structure is designed to maintain the high video quality of the key frames. Secondly, the inter-frame dependency based distortion model and bit rate model are proposed, considering the dependency between a coding frame and its reference frame. Thus the distortion and bit rate of the coding frame can be represented by the distortion and bit rate of its reference frame. Accordingly, the Rate-GOP based distortion model and rate model can be achieved via the inter-frame dependency based distortion model and bit rate model. Thirdly, based on these models and a mixed Laplacian distribution of residual information, a new ρ-domain Rate-GOP based rate control is proposed. Experimental results demonstrate the proposed Rate-GOP based rate control has much better R-D performance. Compared with the two state-of-the-art rate control schemes for HEVC, the coding gain with BD-PSNR can be up to 0.87 dB and 0.13 dB on average respectively for all testing configurations. Especially for random access low complexity testing configuration, the BD-PSNR gain can be up to 1.30 dB and 0.23 dB respectively.

98 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A systematic and comprehensive overview of the third generation of Audio Video Standard (AVS) in China is presented, which has adopted many novel coding techniques including block partitioning structure, intra/inter and transform coding tools.
Abstract: The paper presents a systematic and comprehensive overview of the third generation of Audio Video Standard (AVS) in China. The AVS standards continuously attract extensive attention both domestically and worldwide along with the industrialization of AVS2 standard and the broadcasting of China Central Television (CCTV) 4K channel. Currently, AVS3 has adopted many novel coding techniques including block partitioning structure, intra/inter and transform coding tools. The compression performance has been significantly improved that the latest version of AVS3 obtains about 24.3% and 26.88% bit-rate reduction against AVS2 and HEVC respectively under 4K resolution sequences.

96 citations

Journal ArticleDOI
TL;DR: A LF image compression framework driven by a generative adversarial network (GAN)-based sub-aperture image (SAI) generation and a cascaded hierarchical coding structure that outperforms the state-of-the-art learning-based LFimage compression approach with on average 4.9% BD-rate reductions over multiple LF datasets.
Abstract: Light field (LF) has become an attractive representation of immersive multimedia content for simultaneously capturing both the spatial and angular information of the light rays. In this paper, we present a LF image compression framework driven by a generative adversarial network (GAN)-based sub-aperture image (SAI) generation and a cascaded hierarchical coding structure. Specifically, we sparsely sample the SAIs in LF and propose the GAN of LF (LF-GAN) to generate the unsampled SAIs by analogy with adversarial learning conditioned on its surrounding contexts. In particular, the LF-GAN learns to interpret both the angular and spatial context of the LF structure and, meanwhile, generates intermediate hypothesis for the unsampled SAIs in a certain position. Subsequently, the sampled SAIs and the residues of the generated-unsampled SAIs are re-organized as pseudo-sequences and compressed by standard video codecs. Finally, the hierarchical coding structure is adopted for the sampled SAI to effectively remove the inter-view redundancies. During the training process of LF-GAN, the pixel-wise Euclidean loss and the adversarial loss are chosen as the optimization objective, such that sharp textures with less blurring in details can be produced. Extensive experimental results show that the proposed LF-GAN-based LF image compression framework outperforms the state-of-the-art learning-based LF image compression approach with on average 4.9% BD-rate reductions over multiple LF datasets.

64 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: A generative adversarial network (GAN)-based edge-enhancement network (EEGAN) for robust satellite image SR reconstruction along with the adversarial learning strategy that is insensitive to noise is proposed.
Abstract: The current superresolution (SR) methods based on deep learning have shown remarkable comparative advantages but remain unsatisfactory in recovering the high-frequency edge details of the images in noise-contaminated imaging conditions, e.g., remote sensing satellite imaging. In this paper, we propose a generative adversarial network (GAN)-based edge-enhancement network (EEGAN) for robust satellite image SR reconstruction along with the adversarial learning strategy that is insensitive to noise. In particular, EEGAN consists of two main subnetworks: an ultradense subnetwork (UDSN) and an edge-enhancement subnetwork (EESN). In UDSN, a group of 2-D dense blocks is assembled for feature extraction and to obtain an intermediate high-resolution result that looks sharp but is eroded with artifacts and noises as previous GAN-based methods do. Then, EESN is constructed to extract and enhance the image contours by purifying the noise-contaminated components with mask processing. The recovered intermediate image and enhanced edges can be combined to generate the result that enjoys high credibility and clear contents. Extensive experiments on Kaggle Open Source Data set , Jilin-1 video satellite images, and Digitalglobe show superior reconstruction performance compared to the state-of-the-art SR approaches.

305 citations

Journal ArticleDOI
TL;DR: The evolution and development of neural network-based compression methodologies are introduced for images and video respectively and the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision.
Abstract: In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network-based image and video compression techniques. The evolution and development of neural network-based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptors in the age of artificial intelligence.

235 citations