scispace - formally typeset
Search or ask a question
Author

Long Sun

Bio: Long Sun is an academic researcher from Huawei. The author has contributed to research in topics: Bitstream & Motion vector. The author has an hindex of 1, co-authored 2 publications receiving 2 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a guided spatial feature transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive.
Abstract: Real-world video processing algorithms are often faced with the great challenges of processing the compressed videos instead of pristine videos. Despite the tremendous successes achieved in deep-learning based video super-resolution (SR), much less work has been dedicated to the SR of compressed videos. Herein, we propose a novel approach for compressed domain deep video SR by jointly leveraging the coding priors and deep priors. By exploiting the diverse and ready-made spatial and temporal coding priors ( e.g., partition maps and motion vectors) extracted directly from the video bitstream in an effortless way, the video SR in the compressed domain allows us to accurately reconstruct the high resolution video with high flexibility and substantially economized computational complexity. More specifically, to incorporate the spatial coding prior, the Guided Spatial Feature Transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive. To incorporate the temporal coding prior, a guided soft alignment scheme is designed to generate local attention off-sets to compensate for decoded motion vectors. Our soft alignment scheme combines the merits of explicit and implicit motion modeling methods, rendering the alignment of features more effective for SR in terms of the computational complexity and robustness to inaccurate motion fields. Furthermore, to fully make use of the deep priors, the multi-scale fused features are generated from a scale-wise convolution reconstruction network for final SR video reconstruction. To promote the compressed domain video SR research, we build an extensive Compressed Videos with Coding Prior ( CVCP ) dataset, including compressed videos of diverse content and various coding priors extracted from the bitstream. Extensive experimental results show the effectiveness of coding priors in compressed domain video SR.

17 citations

Proceedings ArticleDOI
12 Oct 2020
TL;DR: This paper proposes a motion vector guided multi-scale local attention module that explicitly exploits the temporal dependency and suppresses coding artifacts with substantially economized computational complexity and offers competitive performance compared to the sequential combinations of the state-of-the-art methods for compressed video artifacts removal and SR.
Abstract: The standard paradigm of video super-resolution (SR) is to generate the spatial-temporal coherent high-resolution (HR) sequence from the corresponding low-resolution (LR) version which has already been decoded from the bitstream. However, a highly practical while relatively under-studied way is enabling the built-in SR functionality in the decoder, in the sense that almost all videos are compactly represented. In this paper, we systematically investigate the SR of compressed LR videos by leveraging the interactivity between decoding prior and deep prior. By fully exploiting the compact video stream information, the proposed bitstream prior embedded SR framework achieves compressed video SR and quality enhancement simultaneously in a single feed-forward process. More specifically, we propose a motion vector guided multi-scale local attention module that explicitly exploits the temporal dependency and suppresses coding artifacts with substantially economized computational complexity. Moreover, a scale-wise deep residual-in-residual network is learned to reconstruct the SR frames from the multi-scale fused features. To facilitate the research of compressed video SR, we also build a large-scale dataset with compressed videos of diverse content, including ready-made diversified kinds of side information extracted from the bitstream. Both quantitative and qualitative evaluations show that our model achieves superior performance for compressed video SR, and offers competitive performance compared to the sequential combinations of the state-of-the-art methods for compressed video artifacts removal and SR.

8 citations


Cited by
More filters
Journal ArticleDOI
20 Apr 2022
TL;DR: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video and proposed the LDV 2.0 dataset, which includes theLDV dataset (240 videos) and 95 additional videos.
Abstract: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

22 citations

Journal ArticleDOI
TL;DR: In this article, a guided spatial feature transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive.
Abstract: Real-world video processing algorithms are often faced with the great challenges of processing the compressed videos instead of pristine videos. Despite the tremendous successes achieved in deep-learning based video super-resolution (SR), much less work has been dedicated to the SR of compressed videos. Herein, we propose a novel approach for compressed domain deep video SR by jointly leveraging the coding priors and deep priors. By exploiting the diverse and ready-made spatial and temporal coding priors ( e.g., partition maps and motion vectors) extracted directly from the video bitstream in an effortless way, the video SR in the compressed domain allows us to accurately reconstruct the high resolution video with high flexibility and substantially economized computational complexity. More specifically, to incorporate the spatial coding prior, the Guided Spatial Feature Transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive. To incorporate the temporal coding prior, a guided soft alignment scheme is designed to generate local attention off-sets to compensate for decoded motion vectors. Our soft alignment scheme combines the merits of explicit and implicit motion modeling methods, rendering the alignment of features more effective for SR in terms of the computational complexity and robustness to inaccurate motion fields. Furthermore, to fully make use of the deep priors, the multi-scale fused features are generated from a scale-wise convolution reconstruction network for final SR video reconstruction. To promote the compressed domain video SR research, we build an extensive Compressed Videos with Coding Prior ( CVCP ) dataset, including compressed videos of diverse content and various coding priors extracted from the bitstream. Extensive experimental results show the effectiveness of coding priors in compressed domain video SR.

17 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: The NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video (VEnh_SR) as discussed by the authors was the first challenge to evaluate the state-of-the-art of super-resolution and quality enhancement of compressed video.
Abstract: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.

14 citations

Journal ArticleDOI
23 Aug 2022
TL;DR: The AdamW optimizer with the learning rate of 2 × 10 − 4 learning rate to train the model for 1,000,000 iterations and theLearning rate is decayed with the cosine strategy, Weight decay is 10 −4 for all the training periodic.
Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.

11 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented a learning-based FH algorithm called Motion Blur Embedded Nearest Proximate Patch Representation (MBENPPR), which estimates the motion blur kernel from a motion-blurred LR test face.
Abstract: Face hallucination (FH) techniques have received a lot of attention in recent years for generating high-resolution (HR) face images from captured low-resolution (LR), noisy, and blurry images. However, existing FH techniques are incapable of dealing with motion blur, which is commonly introduced in captured images due to camera defocussing and other factors. Therefore, to make the FH process more resistant to motion blur, in this article, we present a novel learning-based FH algorithm called Motion Blur Embedded Nearest Proximate Patch Representation (MBENPPR). The MBENPPR algorithm begins by estimating the motion blur kernel from a motion-blurred LR test face. The estimated kernel is then embedded in training images to make them compatible with test images. It assists in reducing the effect of motion blur in the reconstruction process. Furthermore, the nearest proximate patches are selected from the training space to represent the test image patches as a weighted linear combination of selected patches. It facilitates the proposed algorithm in preserving sharp edges and texture information in the resulting faces. The results of simulations on standard datasets and locally captured real-life faces show that the MBENPPR algorithm outperforms the compared existing algorithms.

6 citations