Showing papers on "Data compression published in 2021"

PDF

Open Access

Journal Article•DOI•

Overview of the Versatile Video Coding (VVC) Standard and its Applications

[...]

Benjamin Bross¹, Ye-Kui Wang, Yan Ye², Shan Liu³, Jianle Chen⁴, Gary J. Sullivan⁵, Jens-Rainer Ohm⁶ - Show less +3 more•Institutions (6)

Heinrich Hertz Institute¹, Alibaba Group², Tencent³, Qualcomm⁴, Microsoft⁵, RWTH Aachen University⁶

02 Aug 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Versatile Video Coding (VVC) was developed by the Joint Video Experts Team (JVET) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an evergrowing need for improved video compression as well as to support a wider variety of today's media content and emerging applications as mentioned in this paper.

...read moreread less

Abstract: Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.

...read moreread less

250 citations

Journal Article•DOI•

Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)

[...]

Benjamin Bross¹, Jianle Chen², Jens-Rainer Ohm³, Gary J. Sullivan⁴, Ye-Kui Wang - Show less +1 more•Institutions (4)

Heinrich Hertz Institute¹, Qualcomm², RWTH Aachen University³, Microsoft⁴

19 Jan 2021

TL;DR: This article summarizes these developments in video coding standardization after AVC, and focuses on providing an overview of the first version of VVC, including comparisons against HEVC.

...read moreread less

Abstract: In the last 17 years, since the finalization of the first version of the now-dominant H.264/Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC) standard in 2003, two major new generations of video coding standards have been developed. These include the standards known as High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). HEVC was finalized in 2013, repeating the ten-year cycle time set by its predecessor and providing about 50% bit-rate reduction over AVC. The cycle was shortened by three years for the VVC project, which was finalized in July 2020, yet again achieving about a 50% bit-rate reduction over its predecessor (HEVC). This article summarizes these developments in video coding standardization after AVC. It especially focuses on providing an overview of the first version of VVC, including comparisons against HEVC. Besides further advances in hybrid video compression, as in previous development cycles, the broad versatility of the application domain that is highlighted in the title of VVC is explained. Included in VVC is the support for a wide range of applications beyond the typical standard- and high-definition camera-captured content codings, including features to support computer-generated/screen content, high dynamic range content, multilayer and multiview coding, and support for immersive media such as 360° video.

...read moreread less

246 citations

Journal Article•DOI•

A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications

[...]

Uthayakumar Jayasankar¹, Vengattaraman Thirumal¹, Dhavachelvan Ponnurangam¹•Institutions (1)

Pondicherry University¹

01 Feb 2021-Journal of King Saud University - Computer and Information Sciences

TL;DR: Insight is gained to various open issues and research directions to explore the promising areas for future developments in data compression techniques and its applications.

...read moreread less

136 citations

Journal Article•DOI•

An End-to-End Learning Framework for Video Compression

[...]

Guo Lu¹, Xiaoyun Zhang¹, Wanli Ouyang², Li Chen¹, Zhiyong Gao¹, Dong Xu² - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Sydney²

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes the first end-to-end deep video compression framework that can outperform the widely used video coding standard H.264 and be even on par with the latest standard H265.

...read moreread less

Abstract: Traditional video compression approaches build upon the hybrid coding framework with motion-compensated prediction and residual transform coding. In this paper, we propose the first end-to-end deep video compression framework to take advantage of both the classical compression architecture and the powerful non-linear representation ability of neural networks. Our framework employs pixel-wise motion information, which is learned from an optical flow network and further compressed by an auto-encoder network to save bits. The other compression components are also implemented by the well-designed networks for high efficiency. All the modules are jointly optimized by using the rate-distortion trade-off and can collaborate with each other. More importantly, the proposed deep video compression framework is very flexible and can be easily extended by using lightweight or advanced networks for higher speed or better efficiency. We also propose to introduce the adaptive quantization layer to reduce the number of parameters for variable bitrate coding. Comprehensive experimental results demonstrate the effectiveness of the proposed framework on the benchmark datasets.

...read moreread less

123 citations

Proceedings Article•DOI•

FVC: A New Framework towards Deep Video Compression in Feature Space

[...]

Zhihao Hu¹, Guo Lu², Dong Xu³•Institutions (3)

Beihang University¹, Beijing Institute of Technology², University of Sydney³

01 Jun 2021

TL;DR: Wang et al. as mentioned in this paper proposed a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space.

...read moreread less

Abstract: Learning based video compression attracts increasing attention in the past few years. The previous hybrid coding approaches rely on pixel space operations to reduce spatial and temporal redundancy, which may suffer from inaccurate motion estimation or less effective motion compensation. In this work, we propose a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space. Specifically, in the proposed deformable compensation module, we first apply motion estimation in the feature space to produce motion information (i.e., the offset maps), which will be compressed by using the auto-encoder style network. Then we perform motion compensation by using deformable convolution and generate the predicted feature. After that, we compress the residual feature between the feature from the current frame and the predicted feature from our deformable compensation module. For better frame reconstruction, the reference features from multiple previous reconstructed frames are also fused by using the nonlocal attention mechanism in the multi-frame feature fusion module. Comprehensive experimental results demonstrate that the proposed framework achieves the state-of-the-art performance on four benchmark datasets including HEVC, UVG, VTL and MCL-JCV.

...read moreread less

120 citations

Journal Article•DOI•

Deep Residual Learning-Based Enhanced JPEG Compression in the Internet of Things

[...]

Han Qiu¹, Qinkai Zheng¹, Gerard Memmi¹, Jialiang Lu², Meikang Qiu³, Bhavani Thuraisingham⁴ - Show less +2 more•Institutions (4)

Télécom ParisTech¹, Shanghai Jiao Tong University², Texas A&M University–Commerce³, University of Texas at Dallas⁴

01 Mar 2021-IEEE Transactions on Industrial Informatics

TL;DR: A novel method to significantly enhance the transformation-based compression standards like JPEG by transmitting much fewer data of one image at the sender's end and a two-step method by combining the state-of-the-art signal processing based recovery method with a deep residual learning model to recover the original data is proposed.

...read moreread less

Abstract: With the development of big data and network technology, there are more use cases, such as edge computing, that require more secure and efficient multimedia big data transmission. Data compression methods can help achieving many tasks like providing data integrity, protection, as well as efficient transmission. Classical multimedia big data compression relies on methods like the spatial-frequency transformation for compressing with loss. Recent approaches use deep learning to further explore the limit of the data compression methods in communication constrained use cases like the Internet of Things (IoT). In this article, we propose a novel method to significantly enhance the transformation-based compression standards like JPEG by transmitting much fewer data of one image at the sender's end. At the receiver's end, we propose a two-step method by combining the state-of-the-art signal processing based recovery method with a deep residual learning model to recover the original data. Therefore, in the IoT use cases, the sender like edge device can transmit only 60% data of the original JPEG image without any additional calculation steps but the image quality can still be recovered at the receiver's end like cloud servers with peak signal-to-noise ratio over 31 dB.

...read moreread less

104 citations

Journal Article•DOI•

A Technical Overview of AV1

[...]

Jingning Han¹, Bohan Li¹, Debargha Mukherjee¹, Ching-Han Chiang¹, Adrian Grange¹, Cheng Chen¹, Hui Su¹, Sarah Parker¹, Sai Deng¹, Urvang Joshi¹, Yue Chen¹, Yunqing Wang¹, Paul Wilkins¹, Yaowu Xu¹, James Bankoski¹ - Show less +11 more•Institutions (1)

Google¹

26 Feb 2021

TL;DR: A technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility is provided.

...read moreread less

Abstract: The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 for the same decoded video quality. This article provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.

...read moreread less

95 citations

Journal Article•DOI•

Learning for Video Compression With Recurrent Auto-Encoder and Recurrent Probability Model

[...]

Ren Yang¹, Fabian Mentzer¹, Luc Van Gool¹, Radu Timofte¹•Institutions (1)

ETH Zurich¹

01 Feb 2021-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM), which achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.

...read moreread less

Abstract: The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265. The codes are available at https://github.com/RenYang-home/RLVC.git .

...read moreread less

87 citations

Journal Article•DOI•

MPEG Immersive Video Coding Standard

[...]

Jill Macdonald Boyce¹, Renaud Dore¹, Adrian Dziembowski², Julien Fleureau³, Joel Jung⁴, Bart Kroon⁵, Basel Salahieh¹, Vinod Kumar Malamal Vadakital⁶, Lu Yu⁷ - Show less +5 more•Institutions (7)

Intel¹, Poznań University of Technology², InterDigital, Inc.³, Tencent⁴, Philips⁵, Nokia⁶, Zhejiang University⁷

10 Mar 2021

TL;DR: The ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization is introduced, which provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom within a viewing space that is determined by the camera arrangement in the capture rig.

...read moreread less

Abstract: This article introduces the ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization. The draft MIV standard provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom (6DoF) within a viewing space that is determined by the camera arrangement in the capture rig. The bitstream format and decoding processes of the draft specification along with aspects of the Test Model for Immersive Video (TMIV) reference software encoder, decoder, and renderer are described. The use cases, test conditions, quality assessment methods, and experimental results are provided. In the TMIV, multiple texture and geometry views are coded as atlases of patches using a legacy 2-D video codec, while optimizing for bitrate, pixel rate, and quality. The design of the bitstream format and decoder is based on the visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC) standard, MPEG-I Part 5.

...read moreread less

74 citations

Proceedings Article•

COIN: COmpression with Implicit Neural representations

[...]

Emilien Dupont¹, Adam Golinski¹, Milad Alizadeh², Yee Whye Teh¹, Arnaud Doucet¹ - Show less +1 more•Institutions (2)

University of Oxford¹, Qualcomm²

03 Mar 2021

TL;DR: A new simple approach for image compression: instead of storing the RGB values for each pixel of an image, the weights of a neural network overfitted to the image are stored, and this approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights.

...read moreread less

Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

...read moreread less

73 citations

Journal Article•DOI•

Detecting Compressed Deepfake Videos in Social Networks Using Frame-Temporality Two-Stream Convolutional Network

[...]

Juan Hu¹, Xin Liao², Wei Wang², Zheng Qin¹•Institutions (2)

Hunan University¹, Chinese Academy of Sciences²

20 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Wang et al. as discussed by the authors proposed a two-stream method by analyzing the frame-level and temporality-level of compressed Deepfake videos, which gradually pruned the network to prevent the model from fitting the compression noise.

...read moreread less

Abstract: The development of technologies that can generate Deepfake videos is expanding rapidly. These videos are easily synthesized without leaving obvious traces of manipulation. Though forensically detection in high-definition video datasets has achieved remarkable results, the forensics of compressed videos is worth further exploring. In fact, compressed videos are common in social networks, such as videos from Instagram, Wechat, and Tiktok. Therefore, how to identify compressed Deepfake videos becomes a fundamental issue. In this paper, we propose a two-stream method by analyzing the frame-level and temporality-level of compressed Deepfake videos. Since the video compression brings lots of redundant information to frames, the proposed frame-level stream gradually prunes the network to prevent the model from fitting the compression noise. Aiming at the problem that the temporal consistency in Deepfake videos might be ignored, we apply a temporality-level stream to extract temporal correlation features. When combined with scores from the two streams, our proposed method performs better than the state-of-the-art methods in compressed Deepfake videos detection.

...read moreread less

Journal Article•DOI•

Hiding Data Using Efficient Combination of RSA Cryptography, and Compression Steganography Techniques

[...]

Osama F. Abdel Wahab¹, Ashraf A. M. Khalaf¹, Aziza I. Hussein², Hesham F. A. Hamed¹•Institutions (2)

Minia University¹, Effat University²

18 Feb 2021-IEEE Access

TL;DR: In this paper, a hybrid data compression algorithm was proposed to increase the security level of the compressed data by using RSA (Rivest-Shamir-Adleman) cryptography.

...read moreread less

Abstract: Data compression is an important part of information security because compressed data is more secure and easy to handle. Effective data compression technology creates efficient, secure, and easy-to-connect data. There are two types of compression algorithm techniques, lossy and lossless. These technologies can be used in any data format such as text, audio, video, or image file. The main objective of this study was to reduce the physical space on the various storage media and reduce the time of sending data over the Internet with a complete guarantee of encrypting this data and hiding it from intruders. Two techniques are implemented, with data loss (Lossy) and without data loss (Lossless). In the proposed paper a hybrid data compression algorithm increases the input data to be encrypted by RSA (Rivest–Shamir–Adleman) cryptography method to enhance the security level and it can be used in executing lossy and lossless compacting Steganography methods. This technique can be used to decrease the amount of every transmitted data aiding fast transmission while using slow internet or take a small space on different storage media. The plain text is compressed by the Huffman coding algorithm, and also the cover image is compressed by Discrete wavelet transform DWT based that compacts the cover image through lossy compression in order to reduce the cover image’s dimensions. The least significant bit LSB will then be used to implant the encrypted data in the compacted cover image. We evaluated that system on criteria such as percentage Savings percentage, Compression Time, Compression Ratio, Bits per pixel, Mean Squared Error, Peak Signal to Noise Ratio, Structural Similarity Index, and Compression Speed. This system shows a high-level performance and system methodology compared to other systems that use the same methodology.

...read moreread less

Journal Article•DOI•

Deep Compression for Dense Point Cloud Maps

[...]

Louis Wiesmann¹, Andres Milioto¹, Xieyuanli Chen¹, Cyrill Stachniss¹, Jens Behley¹ - Show less +1 more•Institutions (1)

University of Bonn¹

16 Feb 2021

TL;DR: In this article, a deep convolutional autoencoder architecture is proposed to learn a set of local feature descriptors from which the point cloud can be reconstructed efficiently and effectively.

...read moreread less

Abstract: Many modern robotics applications rely on 3D maps of the environment. Due to the large memory requirements of dense 3D maps, compression techniques are often necessary to store or transmit 3D maps efficiently. In this work, we investigate the problem of compressing dense 3D point cloud maps such as those obtained from an autonomous vehicle in large outdoor environments. We tackle the problem by learning a set of local feature descriptors from which the point cloud can be reconstructed efficiently and effectively. We propose a novel deep convolutional autoencoder architecture that directly operates on the points themselves so that we avoid voxelization. Additionally, we propose a deconvolution operator to upsample point clouds, which allows us to decompress to an arbitrary density. Our experiments show that our learned compression achieves better reconstructions at the same bit rate compared to other state-of-the-art compression algorithms. We furthermore demonstrate that our approach generalizes well to different LiDAR sensors. For example, networks learned on maps generated from KITTI point clouds still achieve state-of-the-art compression results for maps generated from nuScences point clouds.

...read moreread less

Proceedings Article•DOI•

Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation

[...]

Kai Zhao¹, Sheng Di², Maxim Dmitriev³, Thierry-Laurent Tonellot³, Zizhong Chen¹, Franck Cappello² - Show less +2 more•Institutions (3)

University of California, Riverside¹, Argonne National Laboratory², Saudi Aramco³

01 Apr 2021

TL;DR: In this article, the authors presented a novel error-bounded lossy compressor based on a state-of-the-art prediction-based compression framework with comparable compression speed.

...read moreread less

Abstract: Today’s scientific simulations are producing vast volumes of data that cannot be stored and transferred efficiently because of limited storage capacity, parallel I/O bandwidth, and network bandwidth. The situation is getting worse over time because of the ever-increasing gap between relatively slow data transfer speed and fast-growing computation power in modern supercomputers. Error-bounded lossy compression is becoming one of the most critical techniques for resolving the big scientific data issue, in that it can significantly reduce the scientific data volume while guaranteeing that the reconstructed data is valid for users because of its compression-error-bounding feature. In this paper, we present a novel error-bounded lossy compressor based on a state-of-the-art prediction-based compression framework. Our solution exhibits substantially better compression quality than all of the existing error-bounded lossy compressors, with comparable compression speed. Specifically, our contribution is threefold. (1) We provide an in-depth analysis of why the best-existing prediction-based lossy compressor can only minimally improve the compression quality. (2) We propose a dynamic spline interpolation approach with a series of optimization strategies that can significantly improve the data prediction accuracy, substantially improving the compression quality in turn. (3) We perform a thorough evaluation using six real-world scientific simulation datasets across different science domains to evaluate our solution vs. all other related works. Experiments show that the compression ratio of our solution is higher than that of the second-best lossy compressor by 20% 460% with the same error bound in most of the cases. ∼

...read moreread less

Journal Article•DOI•

Compressed Sensing Framework for Heart Sound Acquisition in Internet of Medical Things

[...]

Junxin Chen¹, Shuang Sun, Li-bo Zhang, Ben-qiang Yang, Wei Wang² - Show less +1 more•Institutions (2)

Northeastern University (China)¹, Macau University of Science and Technology²

11 Jun 2021-IEEE Transactions on Industrial Informatics

TL;DR: The proposed approach uses compressed sensing for signal sampling, and a two-stage reconstruction is developed for reconstruction, on which a peak detection technique is developed to identify whether there is a peak in current segment and, if so, its location.

...read moreread less

Abstract: For continuous monitoring of cardiovascular diseases, this paper presents a novel framework for heart sound acquisition. The proposed approach uses compressed sensing for signal sampling, and a two-stage reconstruction is developed for reconstruction. The first stage aims to give a tentative recovered signal, on which a peak detection technique is developed to identify whether there is a peak in current segment and, if so, its location. With such information, an adaptive dictionary is selected for the second round reconstruction. Because the selected dictionary is adaptive to the morphology of current frame, the signal reconstruction performance is consequently promoted. Experiment results indicate that a satisfactory performance can be obtained when the frame length is 256 and the signal morphology is divided into 16 categories. Furthermore, the proposed algorithm is compared with a series of counterparts and the results well demonstrate the advantages of our proposal, especially at high compression ratios.

...read moreread less

Journal Article•DOI•

Empirical Mode Decomposition and Wavelet Transform Based ECG Data Compression Scheme

[...]

Chandan Kumar Jha¹, Chandan Kumar Jha², Maheshkumar H. Kolekar²•Institutions (2)

KIIT University¹, Indian Institute of Technology Patna²

01 Feb 2021-Irbm

TL;DR: A new electrocardiogram (ECG) data compression scheme which employs sifting function based empirical mode decomposition (EMD) and discrete wavelet transform and offers better compression performance with preserving the key features of the signal very well.

...read moreread less

Abstract: Objective In health-care systems, compression is an essential tool to solve the storage and transmission problems. In this regard, this paper reports a new electrocardiogram (ECG) data compression scheme which employs sifting function based empirical mode decomposition (EMD) and discrete wavelet transform. Method EMD based on sifting function is utilized to get the first intrinsic mode function (IMF). After EMD, the first IMF and four significant sifting functions are combined together. This combination is free from many irrelevant components of the signal. Discrete wavelet transform (DWT) with mother wavelet ‘bior4.4’ is applied to this combination. The transform coefficients obtained after DWT are passed through dead-zone quantization. It discards small transform coefficients lying around zero. Further, integer conversion of coefficients and run-length encoding are utilized to achieve a compressed form of ECG data. Results Compression performance of the proposed scheme is evaluated using 48 ECG records of the MIT-BIH arrhythmia database. In the comparison of compression results, it is observed that the proposed method exhibits better performance than many recent ECG compressors. A mean opinion score test is also conducted to evaluate the true quality of the reconstructed ECG signals. Conclusion The proposed scheme offers better compression performance with preserving the key features of the signal very well.

...read moreread less

Proceedings Article•DOI•

Deep Learning in Latent Space for Video Prediction and Compression

[...]

Bowen Liu¹, Yu Chen¹, Shiyu Liu¹, Hun-Seok Kim¹•Institutions (1)

University of Michigan¹

20 Jun 2021

TL;DR: In this paper, a generative adversarial network (GAN) is used to predict the latent vector representation of the future frame and a convolutional long short-term memory (ConvLSTM) network is employed to predict future frames.

...read moreread less

Abstract: Learning-based video compression has achieved substantial progress during recent years. The most influential approaches adopt deep neural networks (DNNs) to remove spatial and temporal redundancies by finding the appropriate lower-dimensional representations of frames in the video. We propose a novel DNN based framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient lower-dimensional latent space representation of each video frame and then performs inter-frame prediction in that latent domain. The proposed latent domain compression of individual frames is obtained by a deep autoencoder trained with a generative adversarial network (GAN). To exploit the temporal correlation within the video frame sequence, we employ a convolutional long short-term memory (ConvLSTM) network to predict the latent vector representation of the future frame. We demonstrate our method with two applications; video compression and abnormal event detection that share the identical latent frame prediction network. The proposed method exhibits superior or competitive performance compared to the state-of-the-art algorithms specifically designed for either video compression or anomaly detection.1

...read moreread less

Journal Article•DOI•

VVC Complexity and Software Implementation Analysis

[...]

Frank Bossen, Karsten Suhring¹, Adam Wieckowski¹, Shan Liu²•Institutions (2)

Heinrich Hertz Institute¹, Tencent²

09 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A complexity analysis of VVC and its VTM reference software is provided, showing that most of the compression gains of V VC over HEVC can be obtained at a small fraction of the resources needed by the VTM encoder under common test conditions.

...read moreread less

Abstract: A steady increase in available processing power continues to drive advances in video compression technology. The recently completed Versatile Video Coding (VVC) standard aims to double the compression efficiency of HEVC and deliver a same quality of video at half the bitrate. To achieve this goal, VVC includes several new methods that improve coding efficiency at the cost of increased complexity. This paper provides a complexity analysis of VVC and its VTM reference software. Whereas VVC is more complex than HEVC, it remains readily implementable in software on current generation processors. Performance of practical decoders are reported, showing that real-time decoding of 8K content is feasible. An encoder is also presented, showing that most of the compression gains of VVC over HEVC can be obtained at a small fraction of the resources needed by the VTM encoder under common test conditions.

...read moreread less

Proceedings Article•DOI•

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

[...]

Jin Xu¹, Xu Tan², Renqian Luo³, Kaitao Song⁴, Jian Li¹, Tao Qin², Tie-Yan Liu² - Show less +3 more•Institutions (4)

Tsinghua University¹, Microsoft², University of Science and Technology of China³, Nanjing University of Science and Technology⁴

14 Aug 2021

TL;DR: This work uses techniques in neural architecture search (NAS) and proposes NAS-BERT, an efficient method for BERT compression that can find lightweight models with better accuracy than previous approaches, and can be directly applied to different downstream tasks with adaptive model sizes for different requirements of memory or latency.

...read moreread less

Abstract: While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation and memory cost of pre-trained models. In this work, we aim to compress BERT and address the following two challenging practical issues: (1) The compression algorithm should be able to output multiple compressed models with different sizes and latencies, in order to support devices with different memory and latency limitations; (2) The algorithm should be downstream task agnostic, so that the compressed models are generally applicable for different downstream tasks. We leverage techniques in neural architecture search (NAS) and propose NAS-BERT, an efficient method for BERT compression. NAS-BERT trains a big supernet on a carefully designed search space containing a variety of architectures and outputs multiple compressed models with adaptive sizes and latency. Furthermore, the training of NAS-BERT is conducted on standard self-supervised pre-training tasks (e.g., masked language model) and does not depend on specific downstream tasks. Thus, the compressed models can be used across various downstream tasks. The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly. We employ several techniques including block-wise search, search space pruning, and performance approximation to improve search efficiency and accuracy. Extensive experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches, and can be directly applied to different downstream tasks with adaptive model sizes for different requirements of memory or latency.

...read moreread less

Journal Article•DOI•

Advances in Video Compression System Using Deep Neural Network: A Review and Case Studies

[...]

Dandan Ding¹, Zhan Ma², Di Chen³, Qingshuang Chen⁴, Zoe Liu, Fengqing Zhu⁴ - Show less +2 more•Institutions (4)

Hangzhou Normal University¹, Nanjing University², Google³, Purdue University⁴

16 Jan 2021

TL;DR: This article presents an end-to-end neural video coding framework that takes advantage of the stacked DNNs to efficiently and compactly code input raw videos via fully data-driven learning.

...read moreread less

Abstract: Significant advances in video compression systems have been made in the past several decades to satisfy the near-exponential growth of Internet-scale video traffic. From the application perspective, we have identified three major functional blocks, including preprocessing, coding, and postprocessing, which have been continuously investigated to maximize the end-user quality of experience (QoE) under a limited bit rate budget. Recently, artificial intelligence (AI)-powered techniques have shown great potential to further increase the efficiency of the aforementioned functional blocks, both individually and jointly. In this article, we review recent technical advances in video compression systems extensively, with an emphasis on deep neural network (DNN)-based approaches, and then present three comprehensive case studies. On preprocessing, we show a switchable texture-based video coding example that leverages DNN-based scene understanding to extract semantic areas for the improvement of a subsequent video coder. On coding, we present an end-to-end neural video coding framework that takes advantage of the stacked DNNs to efficiently and compactly code input raw videos via fully data-driven learning. On postprocessing, we demonstrate two neural adaptive filters to, respectively, facilitate the in-loop and postfiltering for the enhancement of compressed frames. Finally, a companion website hosting the contents developed in this work can be accessed publicly at https://purdueviper.github.io/dnn-coding/ .

...read moreread less

Journal Article•DOI•

Efficient Projected Frame Padding for Video-Based Point Cloud Compression

[...]

Li Li¹, Zhu Li², Shan Liu³, Houqiang Li¹•Institutions (3)

University of Science and Technology of China¹, University of Missouri–Kansas City², Tencent³

01 Jan 2021-IEEE Transactions on Multimedia

TL;DR: This paper designs padding algorithms tailored to each group of unoccupied pixels and proposes padding the residue of these pixels using the average residue of the occupied pixels in order to reduce the residue bitrate as much as possible.

...read moreread less

Abstract: The state-of-the-art 2D-based dynamic point cloud (DPC) compression algorithm is the video-based point cloud compression (V-PCC) developed by the Moving Pictures Experts Group (MPEG). It first projects the DPC patch by patch from 3D to 2D and organizes the projected patches into a video. The video is then efficiently compressed by High Efficiency Video Coding. However, there are many unoccupied pixels that may have a significant influence on the coding efficiency. These unoccupied pixels are currently padded using either the average of 4-neighbors for the geometry or the push-pull algorithm for the color attribute. While these algorithms are simple, the unoccupied pixels are not handled in the most efficient way. In this paper, we divide the unoccupied pixels into two groups: those that should be occupied and those that should not be occupied according to the occupancy map. We then design padding algorithms tailored to each group to improve the rate-distortion performance of the V-PCC reference software, for both the geometry and the color attribute. The first group is the unoccupied pixels that should be occupied according to the block-based occupancy map. We attempt to pad those pixels using the real points in the original DPC to improve the quality of the reconstructed DPC. Additionally, we attempt to maintain the smoothness of each block so as not to negatively influence the video compression efficiency. The second group is the unoccupied pixels that were correctly identified as unoccupied according to the block-based occupancy map. These pixels are useless for improving the reconstructed quality of the DPC. Therefore, we attempt to minimize the bit cost of these pixels without considering their reconstruction qualities. The bit cost is determined by the residue of these pixels obtained by subtracting the prediction pixels from the original pixels. Therefore, we propose padding the residue using the average residue of the occupied pixels in order to minimize the bit cost. The proposed algorithms are implemented in the V-PCC and the corresponding HEVC reference software. The experimental results show the proposed algorithms can bring significant bitrate savings compared with the V-PCC.

...read moreread less

Journal Article•DOI•

Adaptive Multivariate Data Compression in Smart Metering Internet of Things

[...]

Mayukh Roy Chowdhury¹, Sharda Tripathi², Swades De¹•Institutions (2)

Indian Institute of Technology Delhi¹, Polytechnic University of Turin²

01 Feb 2021-IEEE Transactions on Industrial Informatics

TL;DR: Performance studies indicate that compared to the state-of-the-art, the proposed technique is able to achieve impressive bandwidth saving for transmission of data over communication network without compromising faithful reconstruction of data at the receiver.

...read moreread less

Abstract: Recent advances in electric metering infrastructure have given rise to the generation of gigantic chunks of data. Transmission of all of these data certainly poses a significant challenge in bandwidth and storage constrained Internet of Things (IoT), where smart meters act as sensors. In this work, a novel multivariate data compression scheme is proposed for smart metering IoT. The proposed algorithm exploits the cross correlation between different variables sensed by smart meters to reduce the dimension of data. Subsequently, sparsity in each of the decorrelated streams is utilized for temporal compression. To examine the quality of compression, the multivariate data is characterized using multivariate normal–autoregressive integrated moving average modeling before compression as well as after reconstruction of the compressed data. Our performance studies indicate that compared to the state-of-the-art, the proposed technique is able to achieve impressive bandwidth saving for transmission of data over communication network without compromising faithful reconstruction of data at the receiver. The proposed algorithm is tested in a real smart metering setup and its time complexity is also analyzed.

...read moreread less

Journal Article•DOI•

A Reconfigurable Neural Network ASIC for Detector Front-End Data Compression at the HL-LHC

[...]

Giuseppe Di Guglielmo¹, Farah Fahim², Christian Herwig², Manuel Blanco Valentin³, Javier Duarte⁴, Cristian V. Gingu², Philip Harris⁵, J. Hirschauer², Martin Kwok⁶, Vladimir Loncar⁷, Yingyi Luo³, Llovizna Miranda², Jennifer Ngadiuba⁸, D. Noonan⁹, Seda Ogrenci-Memik³, Maurizio Pierini⁷, Sioni Summers⁷, Nhan Tran² - Show less +14 more•Institutions (9)

Columbia University¹, Fermilab², Northwestern University³, University of California, San Diego⁴, Massachusetts Institute of Technology⁵, Brown University⁶, CERN⁷, California Institute of Technology⁸, Florida Institute of Technology⁹

04 May 2021-IEEE Transactions on Nuclear Science

TL;DR: It is demonstrated that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile.

...read moreread less

Abstract: Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network (NN) autoencoder model can be implemented in a radiation-tolerant application-specific integrated circuit (ASIC) to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the Compact Muon Solenoid (CMS) experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the NN weights, a unique data compression algorithm can be deployed for each sensor in different detector regions and changing detector or collider conditions. To meet area, performance, and power constraints, we perform quantization-aware training to create an optimized NN hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework and was processed through synthesis and physical layout flows based on a low-power (LP)-CMOS 65-nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates and reports a total area of 3.6 mm2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation-tolerant on-detector ASIC implementation of an NN that has been designed for particle physics applications.

...read moreread less

Journal Article•DOI•

Compression of Sparse and Dense Dynamic Point Clouds—Methods and Standards

[...]

Chao Cao¹, Marius Preda¹, Vladyslav Zakharchenko, Euee S. Jang², Titus Zaharia¹ - Show less +1 more•Institutions (2)

Telecom SudParis¹, Hanyang University²

16 Jun 2021

TL;DR: A survey of the point cloud compression methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented, providing guidance for potential standard implementors.

...read moreread less

Abstract: In this article, a survey of the point cloud compression (PCC) methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented. Two paramount families of approaches reported in the literature—the projection- and octree-based methods—are proven to be efficient for encoding dense and sparse point clouds, respectively. These approaches are the pillars on which the Moving Picture Experts Group Committee developed two PCC standards published as final international standards in 2020 and early 2021, respectively, under the names: video-based PCC and geometry-based PCC. After surveying the current approaches for PCC, the technologies underlying the two standards are described in detail from an encoder perspective, providing guidance for potential standard implementors. In addition, experiment evaluations in terms of compression performances for both solutions are provided.

...read moreread less

Journal Article•DOI•

Fast Multi-Type Tree Partitioning for Versatile Video Coding Using a Lightweight Neural Network

[...]

Sang-hyo Park¹, Je-Won Kang²•Institutions (2)

Kyungpook National University¹, Ewha Womans University²

01 Jan 2021-IEEE Transactions on Multimedia

TL;DR: A fast decision scheme using a lightweight neural network (LNN) to avoid redundant block partitioning in versatile video coding (VVC) and substantially decreases the encoding complexity of VVC with a slight coding loss under the All Intra configuration.

...read moreread less

Abstract: In this paper, we propose a fast decision scheme using a lightweight neural network (LNN) to avoid redundant block partitioning in versatile video coding (VVC). A more versatile block structure, named the multi-type tree (MTT) structure, which includes binary trees (BTs) and ternary trees (TTs), is adopted by VCC, in addition to the traditional quadtree structure. The MTT improved the coding efficiency compared with previous video coding standards. However, the new tree structures, mainly TT, significantly increased the complexity of the VVC encoder. Although widespread application of VVC has been inhibited, this problem has not yet been investigated thoroughly in the literature. In this study, we first determine the statistical characteristics of coded parameters that exhibit correlation with the TT and develop two useful types of features —explicit VVC features (EVFs) and derived VVC features (DVFs) — to facilitate the intra coding of VVC. These features can be obtained efficiently during the intra prediction before the determination of the best block partitioning during rate-distortion optimization in VVC encoding. Our LNN model decides whether to terminate the nested TT block structures subsequent to a quadtree based on the features. The experimental results confirm that the proposed method substantially decreases the encoding complexity of VVC with a slight coding loss under the All Intra configuration. Our code, models, and dataset are available at https://github.com/foriamweak/MTTPartitioning_LNN .

...read moreread less

Journal Article•DOI•

Just Noticeable Distortion Profile Inference: A Patch-Level Structural Visibility Learning Approach

[...]

Xuelin Shen¹, Zhangkai Ni¹, Wenhan Yang¹, Xinfeng Zhang², Shiqi Wang¹, Sam Kwong¹ - Show less +2 more•Institutions (2)

City University of Hong Kong¹, Chinese Academy of Sciences²

01 Jan 2021-IEEE Transactions on Image Processing

TL;DR: An effective approach to infer the just noticeable distortion (JND) profile based on patch-level structural visibility learning is proposed, with extensive experimental results showing the superiority of the proposed approach over the state-of-the-art.

...read moreread less

Abstract: In this paper, we propose an effective approach to infer the just noticeable distortion (JND) profile based on patch-level structural visibility learning. Instead of pixel-level JND profile estimation, the image patch, which is regarded as the basic processing unit to better correlate with the human perception, can be further decomposed into three conceptually independent components for visibility estimation. In particular, to incorporate the structural degradation into the patch-level JND model, a deep learning-based structural degradation estimation model is trained to approximate the masking of structural visibility. In order to facilitate the learning process, a JND dataset is further established, including 202 pristine images and 7878 distorted images generated by advanced compression algorithms based on the upcoming Versatile Video Coding (VVC) standard. Extensive experimental results further show the superiority of the proposed approach over the state-of-the-art. Our dataset is available at: https://github.com/ShenXuelin-CityU/PWJNDInfer .

...read moreread less

Journal Article•DOI•

UAV Anti-Jamming Video Transmissions With QoE Guarantee: A Reinforcement Learning-Based Approach

[...]

Liang Xiao¹, Yuzhen Ding¹, Jinhao Huang¹, Sicong Liu¹, Yuliang Tang¹, Huaiyu Dai² - Show less +2 more•Institutions (2)

Xiamen University¹, North Carolina State University²

09 Jun 2021-IEEE Transactions on Communications

TL;DR: A reinforcement learning (RL)-based UAV anti-jamming video transmission scheme to choose the video compression quantization parameter, the channel coding rate, the modulation and power control strategies against jamming attacks is proposed.

...read moreread less

Abstract: Unmanned aerial vehicles (UAVs) that are widely utilized for video capturing, processing and transmission have to address jamming attacks with dynamic topology and limited energy. In this paper, we propose a reinforcement learning (RL)-based UAV anti-jamming video transmission scheme to choose the video compression quantization parameter, the channel coding rate, the modulation and power control strategies against jamming attacks. More specifically, this scheme applies RL to choose the UAV video compression and transmission policy based on the observed video task priority, the UAV-controller channel state and the received jamming power. This scheme enables the UAV to guarantee the video quality-of-experience (QoE) and reduce the energy consumption without relying on the jamming model or the video service model. A safe RL-based approach is further proposed, which uses deep learning to accelerate the UAV learning process and reduce the video transmission outage probability. The computational complexity is provided and the optimal utility of the UAV is derived and verified via simulations. Simulation results show that the proposed schemes significantly improve the video quality and reduce the transmission latency and energy consumption of the UAV compared with existing schemes.

...read moreread less

Proceedings Article•DOI•

DCT based Enhanced Tchebichef Moment using Huffman Encoding Algorithm (ETMH)

[...]

S. Anantha Babu, R. Joshua Samuel Raj¹, Arul Xavier V M², N. Muthukumaran•Institutions (2)

CMR Institute of Technology¹, Karunya University²

04 Feb 2021

TL;DR: In this paper, the authors proposed an enhanced Tchebichef moment utilizing Huffman Encoding (ETMH) technique, which obtains the first picture and its remodel into a grid design.

...read moreread less

Abstract: Data Compression includes enhancing a flood of images and dynamically rearranging codes. The consequent arrangement of compressed codes will be greater and simpler than the initial set of images. The choice is to produce a clear-cut code for a definite image or group of images. The proposed algorithm is close to an assortment of data and rules that are consumed to be shared with the input images and the numeric value to Figure out which repeated code(s) can be sustained. The current Huffman coding upholds for double picture and neglects to give the sensible pressure proportion of the genuine yield of the encoder, which is controlled by a bunch of probabilities. While utilizing this kind of coding, an image that has an extremely high likelihood of event, produces a code with few pieces. An image with a low likelihood produces a code with a larger number of pieces. The source images are then diminished by applying improved Huffman encoding calculation to get a high pressure proportion. The proposed Enhanced Tchebichef Moment utilizing Huffman Encoding (ETMH) technique obtains the first picture and its remodel into a grid design. The maximum number of network sizes is separated into various non-covering small sized block lattices. The lesser the number of pixel size, grouping blocks are done to accomplish the pressure proportion. The proposed ETMH calculation is very useful for distinct picture organizations to discover strategies to provide enhanced outcomes and quality. The proposed ETMH calculation is actualized for dim scale picture and consequently, the pressure proportion table is produced. The proposed ETMH calculation is tried and actualized through different boundaries, like MSE, SNR and PSNR by utilizing MATLAB.

...read moreread less

Journal Article•DOI•

A 3.3 Gbps CCSDS 123.0-B-1 Multispectral & Hyperspectral Image Compression Hardware Accelerator on a Space-Grade SRAM FPGA

[...]

Antonis Tsigkanos¹, N. Kranitis¹, George Theodorou¹, Antonis Paschalis¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Jan 2021-IEEE Transactions on Emerging Topics in Computing

TL;DR: A very high data-rate performance hardware accelerator is presented implementing the CCSDS-123.0-B-1 algorithm as an IP core targeting a space-grade FPGA, achieving high throughput performance.

...read moreread less

Abstract: The explosive growth of data volume from next generation high-resolution and high-speed hyperspectral remote sensing systems will compete with the limited on-board storage resources and bandwidth available for the transmission of data to ground stations making hyperspectral image compression a mission critical and challenging on-board payload data processing task. The Consultative Committee for Space Data Systems (CCSDS) has issued recommended standard CCSDS-123.0-B-1 for lossless multispectral and hyperspectral image compression. In this paper, a very high data-rate performance hardware accelerator is presented implementing the CCSDS-123.0-B-1 algorithm as an IP core targeting a space-grade FPGA. For the first time, the introduced architecture based on the principles of C-slow retiming, exploits the inherent task-level parallelism of the algorithm under BIP ordering and implements a reconfigurable fine-grained pipeline in critical feedback loops, achieving high throughput performance. The CCSDS-123.0-B-1 IP core achieves beyond the current state-of-the-art data-rate performance with a maximum throughput of 213 MSamples/s (3.3 Gbps @ 16-bits) using 11 percent of LUTs and 27 percent of BRAMs of the Virtex-5QV FPGA resources for a typical hyperspectral image, leveraging the full throughput of a single SpaceFibre lane. To the best of our knowledge, it is the fastest implementation of CCSDS-123.0-B-1 targeting a space-grade FPGA to date.

...read moreread less

Journal Article•DOI•

Occupancy Map Guided Fast Video-based Dynamic Point Cloud Coding

[...]

Jian Xiong¹, Hao Gao¹, Miaohui Wang², Hongliang Li³, Weisi Lin⁴ - Show less +1 more•Institutions (4)

Nanjing University of Posts and Telecommunications¹, Shenzhen University², University of Electronic Science and Technology of China³, Nanyang Technological University⁴

02 Mar 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper proposes an occupancy map guided fast V-PCC method, in which coding is performed on the different types of blocks, by taking advantage of the fact that occupancy maps can explicitly indicate the block types.

...read moreread less

Abstract: In video-based dynamic point cloud compression (V-PCC), 3D point clouds are projected into patches, and then the patches are padded into 2D images suitable for the video compression framework. However, the patch projection-based method produces a large number of empty pixels; the far and near components are projected to generate different 2D images (video frames), respectively. As a result, the generated video is with high resolutions and double frame rates, so the V-PCC has huge computational complexity. This paper proposes an occupancy map guided fast V-PCC method. Firstly, the relationship between the prediction coding and block complexity is studied based on a local linear image gradient model. Secondly, according to the V-PCC strategies of patch projection and block generation, we investigate the differences of rate-distortion characteristics between different types of blocks, and the temporal correlations between the far and near layers. Finally, by taking advantage of the fact that occupancy maps can explicitly indicate the block types, we propose an occupancy map guided fast coding method, in which coding is performed on the different types of blocks. Experiments have tested typical dynamic point clouds, and shown that the proposed method achieves an average 43.66% time-saving at the cost of only 0.27% and 0.16% Bjontegaard Delta (BD) rate increment under the geometry Point-to-Point (D1) error and attribute Luma Peak-Signal-Noise-Ratio (PSNR), respectively.

...read moreread less

Collapse