scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Compressed Domain Deep Video Super-Resolution

09 Aug 2021-IEEE Transactions on Image Processing (IEEE)-Vol. 30, pp 7156-7169
TL;DR: In this article, a guided spatial feature transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive.
Abstract: Real-world video processing algorithms are often faced with the great challenges of processing the compressed videos instead of pristine videos. Despite the tremendous successes achieved in deep-learning based video super-resolution (SR), much less work has been dedicated to the SR of compressed videos. Herein, we propose a novel approach for compressed domain deep video SR by jointly leveraging the coding priors and deep priors. By exploiting the diverse and ready-made spatial and temporal coding priors ( e.g., partition maps and motion vectors) extracted directly from the video bitstream in an effortless way, the video SR in the compressed domain allows us to accurately reconstruct the high resolution video with high flexibility and substantially economized computational complexity. More specifically, to incorporate the spatial coding prior, the Guided Spatial Feature Transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive. To incorporate the temporal coding prior, a guided soft alignment scheme is designed to generate local attention off-sets to compensate for decoded motion vectors. Our soft alignment scheme combines the merits of explicit and implicit motion modeling methods, rendering the alignment of features more effective for SR in terms of the computational complexity and robustness to inaccurate motion fields. Furthermore, to fully make use of the deep priors, the multi-scale fused features are generated from a scale-wise convolution reconstruction network for final SR video reconstruction. To promote the compressed domain video SR research, we build an extensive Compressed Videos with Coding Prior ( CVCP ) dataset, including compressed videos of diverse content and various coding priors extracted from the bitstream. Extensive experimental results show the effectiveness of coding priors in compressed domain video SR.
Citations
More filters
Journal ArticleDOI
20 Apr 2022
TL;DR: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video and proposed the LDV 2.0 dataset, which includes theLDV dataset (240 videos) and 95 additional videos.
Abstract: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

22 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: The NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video (VEnh_SR) as discussed by the authors was the first challenge to evaluate the state-of-the-art of super-resolution and quality enhancement of compressed video.
Abstract: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

14 citations

Journal ArticleDOI
23 Aug 2022
TL;DR: The AdamW optimizer with the learning rate of 2 × 10 − 4 learning rate to train the model for 1,000,000 iterations and theLearning rate is decayed with the cosine strategy, Weight decay is 10 −4 for all the training periodic.
Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.

11 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented a learning-based FH algorithm called Motion Blur Embedded Nearest Proximate Patch Representation (MBENPPR), which estimates the motion blur kernel from a motion-blurred LR test face.
Abstract: Face hallucination (FH) techniques have received a lot of attention in recent years for generating high-resolution (HR) face images from captured low-resolution (LR), noisy, and blurry images. However, existing FH techniques are incapable of dealing with motion blur, which is commonly introduced in captured images due to camera defocussing and other factors. Therefore, to make the FH process more resistant to motion blur, in this article, we present a novel learning-based FH algorithm called Motion Blur Embedded Nearest Proximate Patch Representation (MBENPPR). The MBENPPR algorithm begins by estimating the motion blur kernel from a motion-blurred LR test face. The estimated kernel is then embedded in training images to make them compatible with test images. It assists in reducing the effect of motion blur in the reconstruction process. Furthermore, the nearest proximate patches are selected from the training space to represent the test image patches as a weighted linear combination of selected patches. It facilitates the proposed algorithm in preserving sharp edges and texture information in the resulting faces. The results of simulations on standard datasets and locally captured real-life faces show that the MBENPPR algorithm outperforms the compared existing algorithms.

6 citations

Journal ArticleDOI
TL;DR: In this article , a learning-based FH algorithm called Motion Blur Embedded Nearest Proximate Patch Representation (MBENPPR) is proposed to make the FH process more resistant to motion blur.
Abstract: Face hallucination (FH) techniques have received a lot of attention in recent years for generating high-resolution (HR) face images from captured low-resolution (LR), noisy, and blurry images. However, existing FH techniques are incapable of dealing with motion blur, which is commonly introduced in captured images due to camera defocussing and other factors. Therefore, to make the FH process more resistant to motion blur, in this article, we present a novel learning-based FH algorithm called Motion Blur Embedded Nearest Proximate Patch Representation (MBENPPR). The MBENPPR algorithm begins by estimating the motion blur kernel from a motion-blurred LR test face. The estimated kernel is then embedded in training images to make them compatible with test images. It assists in reducing the effect of motion blur in the reconstruction process. Furthermore, the nearest proximate patches are selected from the training space to represent the test image patches as a weighted linear combination of selected patches. It facilitates the proposed algorithm in preserving sharp edges and texture information in the resulting faces. The results of simulations on standard datasets and locally captured real-life faces show that the MBENPPR algorithm outperforms the compared existing algorithms.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

28 Oct 2017
TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Abstract: In this article, we describe an automatic differentiation module of PyTorch — a library designed to enable rapid research on machine learning models. It builds upon a few projects, most notably Lua Torch, Chainer, and HIPS Autograd [4], and provides a high performance environment with easy access to automatic differentiation of models executed on different devices (CPU and GPU). To make prototyping easier, PyTorch does not follow the symbolic approach used in many other deep learning frameworks, but focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead. Note that this preprint is a draft of certain sections from an upcoming paper covering all PyTorch features.

13,268 citations

Journal ArticleDOI
TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.
Abstract: H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.

8,646 citations

Journal ArticleDOI
TL;DR: The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality.
Abstract: High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.

7,383 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations