scispace - formally typeset
Search or ask a question
Proceedings Article

Overfitting the Data: Compact Neural Video Delivery via Content-Aware Feature Modulation

TL;DR: In this paper, a joint training framework along with the Content-aware Feature Modulation (CaFM) layer is proposed to compress these models for neural video delivery, achieving better video quality compared with the commercial H.264 and H.265 standard.
Abstract: Internet video delivery has undergone a tremendous explosion of growth over the past few years. However, the quality of video delivery system greatly depends on the Internet bandwidth. Deep Neural Networks (DNNs) are utilized to improve the quality of video delivery recently. These methods divide a video into chunks, and stream LR video chunks and corresponding content-aware models to the client. The client runs the inference of models to super-resolve the LR chunks. Consequently, a large number of models are streamed in order to deliver a video. In this paper, we first carefully study the relation between models of different chunks, then we tactfully design a joint training framework along with the Content-aware Feature Modulation (CaFM) layer to compress these models for neural video delivery. {\bf With our method, each video chunk only requires less than $1\% $ of original parameters to be streamed, achieving even better SR performance.} We conduct extensive experiments across various SR backbones, video time length, and scaling factors to demonstrate the advantages of our method. Besides, our method can be also viewed as a new approach of video coding. Our primary experiments achieve better video quality compared with the commercial H.264 and H.265 standard under the same storage cost, showing the great potential of the proposed method. Code is available at:\url{this https URL}

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a cloud computing based deep compression framework named Pearl, which utilizes the power of deep learning and cloud computing to compress UHD videos, and an optimal compact representation of the original UHD video is learned with two deep convolutional neural networks (DCNNs): super resolution CNN and colorization CNN.
Abstract: Ultra-high-definition (UHD) videos has increased popularity. However, the data size of UHD videos is 4-16 times larger of HD videos. This brings many challenges to existing video delivery systems, such as the shortage of network bandwidth resources and longer network transmission latency. In this paper, we propose a cloud computing based deep compression framework named Pearl, which utilizes the power of deep learning and cloud computing to compress UHD videos. Pearl compresses UHD videos from two respects: the frame resolution and the colorful information. In pearl, an optimal compact representation of the original UHD video is learned with two deep convolutional neural networks (DCNNs): super resolution CNN (SR-CNN) and colorization CNN (CL-CNN). SR-CNN is used to reconstruct a high-resolution video from a low-resolution video, while CL-CNN is adopted to preserve the color information of the video. Pearl focuses on video content compression in two new directions. Thus, it can be integrated with any existing video compression system. We evaluate the performance of Pearl with a wide variety of network conditions, quality of experience metrics, and video properties. In all considered scenarios, Pearl can further compress 84% of video size and reduce 73% of network transmission latency.

3 citations

Journal ArticleDOI
TL;DR: In this paper , a mutual modulation self-supervised cross-modal super-resolution (MMSR) model is proposed to overcome the difficulty of acquiring paired training data, but the task is challenging because only low-resolution source and high-resolution guide images from different modalities are available.
Abstract: AbstractSelf-supervised cross-modal super-resolution (SR) can overcome the difficulty of acquiring paired training data, but is challenging because only low-resolution (LR) source and high-resolution (HR) guide images from different modalities are available. Existing methods utilize pseudo or weak supervision in LR space and thus deliver results that are blurry or not faithful to the source modality. To address this issue, we present a mutual modulation SR (MMSR) model, which tackles the task by a mutual modulation strategy, including a source-to-guide modulation and a guide-to-source modulation. In these modulations, we develop cross-domain adaptive filters to fully exploit cross-modal spatial dependency and help induce the source to emulate the resolution of the guide and induce the guide to mimic the modality characteristics of the source. Moreover, we adopt a cycle consistency constraint to train MMSR in a fully self-supervised manner. Experiments on various tasks demonstrate the state-of-the-art performance of our MMSR.KeywordsMutual modulationSelf-supervised super-resolutionCross-modalMulti-modalRemote sensing

1 citations

Proceedings ArticleDOI
04 Jun 2023
TL;DR: In this article , a super-resolution framework is proposed to improve the coding efficiency of computer-generated gaming videos at low bitrates, which is based on reconstructed high-quality luma components.
Abstract: Due to the increasing demand for game-streaming services, efficient compression of computer-generated video is more critical than ever, especially when the available bandwidth is low. This paper proposes a super-resolution framework that improves the coding efficiency of computer-generated gaming videos at low bitrates. Most state-of-the-art super-resolution networks generalize over a variety of RGB inputs and use a unified network architecture for frames of different levels of degradation, leading to high complexity and redundancy. Since games usually consist of a limited number of fixed scenarios, we specialize one model for each scenario and assign appropriate network capacities for different QPs to perform super-resolution under the guidance of reconstructed high-quality luma components. Experimental results show that our framework achieves a superior quality-complexity trade-off compared to the ESRnet baseline, saving at most 93.59% parameters while maintaining comparable performance. The compression efficiency compared to HEVC is also improved by more than 17% BD-rate gain.

1 citations

Proceedings ArticleDOI
10 Oct 2022
TL;DR: This work presents Sophon, a buffer-based and neural-enhanced streaming framework, which exploits the double buffer design, super- resolution technique, and viewport-aware strategy to improve user experience and proposes two well-suited ideas: visual saliency-aware prefetch and super-resolution model selection scheme to address the challenges of insufficient computing resources and dynamic user preferences.
Abstract: 360° video streaming requires ultra-high bandwidth to provide an excellent immersive experience. Traditional viewport-aware streaming methods are theoretically effective but unreliable in practice due to the adverse effects of time-varying available bandwidth on the small playback buffer. To this end, we ponder the complementarity between the large buffer-based approach and the viewport-aware strategy for 360°video streaming. In this work, we present Sophon, a buffer-based and neural-enhanced streaming framework, which exploits the double buffer design, super-resolution technique, and viewport-aware strategy to improve user experience. Furthermore, we propose two well-suited ideas: visual saliency-aware prefetch and super-resolution model selection scheme to address the challenges of insufficient computing resources and dynamic user preferences. Correspondingly, we respectively introduce the prefetch and model selection metric, and develop a lightweight buffer occupancy-based prefetch algorithm and a deep reinforcement learning method to trade off bandwidth consumption, computing resource utilization, and content quality enhancement. We implement a prototype of Sophon and extensive evaluations corroborate its superior performance over state-of-the-art works.

1 citations

Journal ArticleDOI
TL;DR: In Masked360 as discussed by the authors , instead of transmitting the complete video frame, the video server only transmits a masked low-resolution version of each video frame to reduce bandwidth significantly.
Abstract: 360-degree video streaming has gained tremendous growth over the past years. However, the delivery of 360-degree videos over the Internet still suffers from the scarcity of network bandwidth and adverse network conditions (e.g., packet loss, delay). In this paper, we propose a practical neural-enhanced 360-degree video streaming framework called Masked360, which can significantly reduce bandwidth consumption and achieve robustness against packet loss. In Masked360, instead of transmitting the complete video frame, the video server only transmits a masked low-resolution version of each video frame to reduce bandwidth significantly. When delivering masked video frames, the video server also sends a lightweight neural network model called MaskedEncoder to clients. Upon receiving masked frames, the client can reconstruct the original 360-degree video frames and start playback. To further improve the quality of video streaming, we also propose a set of optimization techniques, such as complexity-based patch selection, quarter masking strategy, redundant patch transmission and enhanced model training methods. In addition to bandwidth savings, Masked360 is also robust to packet loss during the transmission, because packet losses can be concealed by the reconstruction operation performed by the MaskedEncoder. Finally, we implement the whole Masked360 framework and evaluate its performance using real datasets. The experimental results show that Masked360 can achieve 4K 360-degree video streaming with bandwidth as low as 2.4 Mbps. Besides, video quality of Masked360 is also improved significantly, with an improvement of 5.24-16.61% in terms of PSNR and 4.74-16.15% in terms of SSIM compared to other baselines.