scispace - formally typeset
Search or ask a question

Showing papers on "Data compression published in 2021"


Journal ArticleDOI
TL;DR: Versatile Video Coding (VVC) was developed by the Joint Video Experts Team (JVET) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an evergrowing need for improved video compression as well as to support a wider variety of today's media content and emerging applications as mentioned in this paper.
Abstract: Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.

250 citations


Journal ArticleDOI
19 Jan 2021
TL;DR: This article summarizes these developments in video coding standardization after AVC, and focuses on providing an overview of the first version of VVC, including comparisons against HEVC.
Abstract: In the last 17 years, since the finalization of the first version of the now-dominant H.264/Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC) standard in 2003, two major new generations of video coding standards have been developed. These include the standards known as High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). HEVC was finalized in 2013, repeating the ten-year cycle time set by its predecessor and providing about 50% bit-rate reduction over AVC. The cycle was shortened by three years for the VVC project, which was finalized in July 2020, yet again achieving about a 50% bit-rate reduction over its predecessor (HEVC). This article summarizes these developments in video coding standardization after AVC. It especially focuses on providing an overview of the first version of VVC, including comparisons against HEVC. Besides further advances in hybrid video compression, as in previous development cycles, the broad versatility of the application domain that is highlighted in the title of VVC is explained. Included in VVC is the support for a wide range of applications beyond the typical standard- and high-definition camera-captured content codings, including features to support computer-generated/screen content, high dynamic range content, multilayer and multiview coding, and support for immersive media such as 360° video.

246 citations


Journal ArticleDOI
TL;DR: Insight is gained to various open issues and research directions to explore the promising areas for future developments in data compression techniques and its applications.

136 citations


Journal ArticleDOI
TL;DR: This paper proposes the first end-to-end deep video compression framework that can outperform the widely used video coding standard H.264 and be even on par with the latest standard H265.
Abstract: Traditional video compression approaches build upon the hybrid coding framework with motion-compensated prediction and residual transform coding. In this paper, we propose the first end-to-end deep video compression framework to take advantage of both the classical compression architecture and the powerful non-linear representation ability of neural networks. Our framework employs pixel-wise motion information, which is learned from an optical flow network and further compressed by an auto-encoder network to save bits. The other compression components are also implemented by the well-designed networks for high efficiency. All the modules are jointly optimized by using the rate-distortion trade-off and can collaborate with each other. More importantly, the proposed deep video compression framework is very flexible and can be easily extended by using lightweight or advanced networks for higher speed or better efficiency. We also propose to introduce the adaptive quantization layer to reduce the number of parameters for variable bitrate coding. Comprehensive experimental results demonstrate the effectiveness of the proposed framework on the benchmark datasets.

123 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Wang et al. as mentioned in this paper proposed a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space.
Abstract: Learning based video compression attracts increasing attention in the past few years. The previous hybrid coding approaches rely on pixel space operations to reduce spatial and temporal redundancy, which may suffer from inaccurate motion estimation or less effective motion compensation. In this work, we propose a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space. Specifically, in the proposed deformable compensation module, we first apply motion estimation in the feature space to produce motion information (i.e., the offset maps), which will be compressed by using the auto-encoder style network. Then we perform motion compensation by using deformable convolution and generate the predicted feature. After that, we compress the residual feature between the feature from the current frame and the predicted feature from our deformable compensation module. For better frame reconstruction, the reference features from multiple previous reconstructed frames are also fused by using the nonlocal attention mechanism in the multi-frame feature fusion module. Comprehensive experimental results demonstrate that the proposed framework achieves the state-of-the-art performance on four benchmark datasets including HEVC, UVG, VTL and MCL-JCV.

120 citations


Journal ArticleDOI
TL;DR: A novel method to significantly enhance the transformation-based compression standards like JPEG by transmitting much fewer data of one image at the sender's end and a two-step method by combining the state-of-the-art signal processing based recovery method with a deep residual learning model to recover the original data is proposed.
Abstract: With the development of big data and network technology, there are more use cases, such as edge computing, that require more secure and efficient multimedia big data transmission. Data compression methods can help achieving many tasks like providing data integrity, protection, as well as efficient transmission. Classical multimedia big data compression relies on methods like the spatial-frequency transformation for compressing with loss. Recent approaches use deep learning to further explore the limit of the data compression methods in communication constrained use cases like the Internet of Things (IoT). In this article, we propose a novel method to significantly enhance the transformation-based compression standards like JPEG by transmitting much fewer data of one image at the sender's end. At the receiver's end, we propose a two-step method by combining the state-of-the-art signal processing based recovery method with a deep residual learning model to recover the original data. Therefore, in the IoT use cases, the sender like edge device can transmit only 60% data of the original JPEG image without any additional calculation steps but the image quality can still be recovered at the receiver's end like cloud servers with peak signal-to-noise ratio over 31 dB.

104 citations


Journal ArticleDOI
26 Feb 2021
TL;DR: A technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility is provided.
Abstract: The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 for the same decoded video quality. This article provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.

95 citations


Journal ArticleDOI
TL;DR: This paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM), which achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM.
Abstract: The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265. The codes are available at https://github.com/RenYang-home/RLVC.git .

87 citations


Journal ArticleDOI
10 Mar 2021
TL;DR: The ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization is introduced, which provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom within a viewing space that is determined by the camera arrangement in the capture rig.
Abstract: This article introduces the ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization. The draft MIV standard provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom (6DoF) within a viewing space that is determined by the camera arrangement in the capture rig. The bitstream format and decoding processes of the draft specification along with aspects of the Test Model for Immersive Video (TMIV) reference software encoder, decoder, and renderer are described. The use cases, test conditions, quality assessment methods, and experimental results are provided. In the TMIV, multiple texture and geometry views are coded as atlases of patches using a legacy 2-D video codec, while optimizing for bitrate, pixel rate, and quality. The design of the bitstream format and decoder is based on the visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC) standard, MPEG-I Part 5.

74 citations


Proceedings Article
03 Mar 2021
TL;DR: A new simple approach for image compression: instead of storing the RGB values for each pixel of an image, the weights of a neural network overfitted to the image are stored, and this approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights.
Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

73 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a two-stream method by analyzing the frame-level and temporality-level of compressed Deepfake videos, which gradually pruned the network to prevent the model from fitting the compression noise.
Abstract: The development of technologies that can generate Deepfake videos is expanding rapidly. These videos are easily synthesized without leaving obvious traces of manipulation. Though forensically detection in high-definition video datasets has achieved remarkable results, the forensics of compressed videos is worth further exploring. In fact, compressed videos are common in social networks, such as videos from Instagram, Wechat, and Tiktok. Therefore, how to identify compressed Deepfake videos becomes a fundamental issue. In this paper, we propose a two-stream method by analyzing the frame-level and temporality-level of compressed Deepfake videos. Since the video compression brings lots of redundant information to frames, the proposed frame-level stream gradually prunes the network to prevent the model from fitting the compression noise. Aiming at the problem that the temporal consistency in Deepfake videos might be ignored, we apply a temporality-level stream to extract temporal correlation features. When combined with scores from the two streams, our proposed method performs better than the state-of-the-art methods in compressed Deepfake videos detection.

Journal ArticleDOI
TL;DR: In this paper, a hybrid data compression algorithm was proposed to increase the security level of the compressed data by using RSA (Rivest-Shamir-Adleman) cryptography.
Abstract: Data compression is an important part of information security because compressed data is more secure and easy to handle. Effective data compression technology creates efficient, secure, and easy-to-connect data. There are two types of compression algorithm techniques, lossy and lossless. These technologies can be used in any data format such as text, audio, video, or image file. The main objective of this study was to reduce the physical space on the various storage media and reduce the time of sending data over the Internet with a complete guarantee of encrypting this data and hiding it from intruders. Two techniques are implemented, with data loss (Lossy) and without data loss (Lossless). In the proposed paper a hybrid data compression algorithm increases the input data to be encrypted by RSA (Rivest–Shamir–Adleman) cryptography method to enhance the security level and it can be used in executing lossy and lossless compacting Steganography methods. This technique can be used to decrease the amount of every transmitted data aiding fast transmission while using slow internet or take a small space on different storage media. The plain text is compressed by the Huffman coding algorithm, and also the cover image is compressed by Discrete wavelet transform DWT based that compacts the cover image through lossy compression in order to reduce the cover image’s dimensions. The least significant bit LSB will then be used to implant the encrypted data in the compacted cover image. We evaluated that system on criteria such as percentage Savings percentage, Compression Time, Compression Ratio, Bits per pixel, Mean Squared Error, Peak Signal to Noise Ratio, Structural Similarity Index, and Compression Speed. This system shows a high-level performance and system methodology compared to other systems that use the same methodology.

Journal ArticleDOI
16 Feb 2021
TL;DR: In this article, a deep convolutional autoencoder architecture is proposed to learn a set of local feature descriptors from which the point cloud can be reconstructed efficiently and effectively.
Abstract: Many modern robotics applications rely on 3D maps of the environment. Due to the large memory requirements of dense 3D maps, compression techniques are often necessary to store or transmit 3D maps efficiently. In this work, we investigate the problem of compressing dense 3D point cloud maps such as those obtained from an autonomous vehicle in large outdoor environments. We tackle the problem by learning a set of local feature descriptors from which the point cloud can be reconstructed efficiently and effectively. We propose a novel deep convolutional autoencoder architecture that directly operates on the points themselves so that we avoid voxelization. Additionally, we propose a deconvolution operator to upsample point clouds, which allows us to decompress to an arbitrary density. Our experiments show that our learned compression achieves better reconstructions at the same bit rate compared to other state-of-the-art compression algorithms. We furthermore demonstrate that our approach generalizes well to different LiDAR sensors. For example, networks learned on maps generated from KITTI point clouds still achieve state-of-the-art compression results for maps generated from nuScences point clouds.

Proceedings ArticleDOI
01 Apr 2021
TL;DR: In this article, the authors presented a novel error-bounded lossy compressor based on a state-of-the-art prediction-based compression framework with comparable compression speed.
Abstract: Today’s scientific simulations are producing vast volumes of data that cannot be stored and transferred efficiently because of limited storage capacity, parallel I/O bandwidth, and network bandwidth. The situation is getting worse over time because of the ever-increasing gap between relatively slow data transfer speed and fast-growing computation power in modern supercomputers. Error-bounded lossy compression is becoming one of the most critical techniques for resolving the big scientific data issue, in that it can significantly reduce the scientific data volume while guaranteeing that the reconstructed data is valid for users because of its compression-error-bounding feature. In this paper, we present a novel error-bounded lossy compressor based on a state-of-the-art prediction-based compression framework. Our solution exhibits substantially better compression quality than all of the existing error-bounded lossy compressors, with comparable compression speed. Specifically, our contribution is threefold. (1) We provide an in-depth analysis of why the best-existing prediction-based lossy compressor can only minimally improve the compression quality. (2) We propose a dynamic spline interpolation approach with a series of optimization strategies that can significantly improve the data prediction accuracy, substantially improving the compression quality in turn. (3) We perform a thorough evaluation using six real-world scientific simulation datasets across different science domains to evaluate our solution vs. all other related works. Experiments show that the compression ratio of our solution is higher than that of the second-best lossy compressor by 20% 460% with the same error bound in most of the cases. ∼

Journal ArticleDOI
TL;DR: The proposed approach uses compressed sensing for signal sampling, and a two-stage reconstruction is developed for reconstruction, on which a peak detection technique is developed to identify whether there is a peak in current segment and, if so, its location.
Abstract: For continuous monitoring of cardiovascular diseases, this paper presents a novel framework for heart sound acquisition. The proposed approach uses compressed sensing for signal sampling, and a two-stage reconstruction is developed for reconstruction. The first stage aims to give a tentative recovered signal, on which a peak detection technique is developed to identify whether there is a peak in current segment and, if so, its location. With such information, an adaptive dictionary is selected for the second round reconstruction. Because the selected dictionary is adaptive to the morphology of current frame, the signal reconstruction performance is consequently promoted. Experiment results indicate that a satisfactory performance can be obtained when the frame length is 256 and the signal morphology is divided into 16 categories. Furthermore, the proposed algorithm is compared with a series of counterparts and the results well demonstrate the advantages of our proposal, especially at high compression ratios.

Journal ArticleDOI
01 Feb 2021-Irbm
TL;DR: A new electrocardiogram (ECG) data compression scheme which employs sifting function based empirical mode decomposition (EMD) and discrete wavelet transform and offers better compression performance with preserving the key features of the signal very well.
Abstract: Objective In health-care systems, compression is an essential tool to solve the storage and transmission problems. In this regard, this paper reports a new electrocardiogram (ECG) data compression scheme which employs sifting function based empirical mode decomposition (EMD) and discrete wavelet transform. Method EMD based on sifting function is utilized to get the first intrinsic mode function (IMF). After EMD, the first IMF and four significant sifting functions are combined together. This combination is free from many irrelevant components of the signal. Discrete wavelet transform (DWT) with mother wavelet ‘bior4.4’ is applied to this combination. The transform coefficients obtained after DWT are passed through dead-zone quantization. It discards small transform coefficients lying around zero. Further, integer conversion of coefficients and run-length encoding are utilized to achieve a compressed form of ECG data. Results Compression performance of the proposed scheme is evaluated using 48 ECG records of the MIT-BIH arrhythmia database. In the comparison of compression results, it is observed that the proposed method exhibits better performance than many recent ECG compressors. A mean opinion score test is also conducted to evaluate the true quality of the reconstructed ECG signals. Conclusion The proposed scheme offers better compression performance with preserving the key features of the signal very well.

Proceedings ArticleDOI
20 Jun 2021
TL;DR: In this paper, a generative adversarial network (GAN) is used to predict the latent vector representation of the future frame and a convolutional long short-term memory (ConvLSTM) network is employed to predict future frames.
Abstract: Learning-based video compression has achieved substantial progress during recent years. The most influential approaches adopt deep neural networks (DNNs) to remove spatial and temporal redundancies by finding the appropriate lower-dimensional representations of frames in the video. We propose a novel DNN based framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient lower-dimensional latent space representation of each video frame and then performs inter-frame prediction in that latent domain. The proposed latent domain compression of individual frames is obtained by a deep autoencoder trained with a generative adversarial network (GAN). To exploit the temporal correlation within the video frame sequence, we employ a convolutional long short-term memory (ConvLSTM) network to predict the latent vector representation of the future frame. We demonstrate our method with two applications; video compression and abnormal event detection that share the identical latent frame prediction network. The proposed method exhibits superior or competitive performance compared to the state-of-the-art algorithms specifically designed for either video compression or anomaly detection.1

Journal ArticleDOI
TL;DR: A complexity analysis of VVC and its VTM reference software is provided, showing that most of the compression gains of V VC over HEVC can be obtained at a small fraction of the resources needed by the VTM encoder under common test conditions.
Abstract: A steady increase in available processing power continues to drive advances in video compression technology. The recently completed Versatile Video Coding (VVC) standard aims to double the compression efficiency of HEVC and deliver a same quality of video at half the bitrate. To achieve this goal, VVC includes several new methods that improve coding efficiency at the cost of increased complexity. This paper provides a complexity analysis of VVC and its VTM reference software. Whereas VVC is more complex than HEVC, it remains readily implementable in software on current generation processors. Performance of practical decoders are reported, showing that real-time decoding of 8K content is feasible. An encoder is also presented, showing that most of the compression gains of VVC over HEVC can be obtained at a small fraction of the resources needed by the VTM encoder under common test conditions.

Proceedings ArticleDOI
14 Aug 2021
TL;DR: This work uses techniques in neural architecture search (NAS) and proposes NAS-BERT, an efficient method for BERT compression that can find lightweight models with better accuracy than previous approaches, and can be directly applied to different downstream tasks with adaptive model sizes for different requirements of memory or latency.
Abstract: While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation and memory cost of pre-trained models. In this work, we aim to compress BERT and address the following two challenging practical issues: (1) The compression algorithm should be able to output multiple compressed models with different sizes and latencies, in order to support devices with different memory and latency limitations; (2) The algorithm should be downstream task agnostic, so that the compressed models are generally applicable for different downstream tasks. We leverage techniques in neural architecture search (NAS) and propose NAS-BERT, an efficient method for BERT compression. NAS-BERT trains a big supernet on a carefully designed search space containing a variety of architectures and outputs multiple compressed models with adaptive sizes and latency. Furthermore, the training of NAS-BERT is conducted on standard self-supervised pre-training tasks (e.g., masked language model) and does not depend on specific downstream tasks. Thus, the compressed models can be used across various downstream tasks. The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly. We employ several techniques including block-wise search, search space pruning, and performance approximation to improve search efficiency and accuracy. Extensive experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches, and can be directly applied to different downstream tasks with adaptive model sizes for different requirements of memory or latency.

Journal ArticleDOI
16 Jan 2021
TL;DR: This article presents an end-to-end neural video coding framework that takes advantage of the stacked DNNs to efficiently and compactly code input raw videos via fully data-driven learning.
Abstract: Significant advances in video compression systems have been made in the past several decades to satisfy the near-exponential growth of Internet-scale video traffic. From the application perspective, we have identified three major functional blocks, including preprocessing, coding, and postprocessing, which have been continuously investigated to maximize the end-user quality of experience (QoE) under a limited bit rate budget. Recently, artificial intelligence (AI)-powered techniques have shown great potential to further increase the efficiency of the aforementioned functional blocks, both individually and jointly. In this article, we review recent technical advances in video compression systems extensively, with an emphasis on deep neural network (DNN)-based approaches, and then present three comprehensive case studies. On preprocessing, we show a switchable texture-based video coding example that leverages DNN-based scene understanding to extract semantic areas for the improvement of a subsequent video coder. On coding, we present an end-to-end neural video coding framework that takes advantage of the stacked DNNs to efficiently and compactly code input raw videos via fully data-driven learning. On postprocessing, we demonstrate two neural adaptive filters to, respectively, facilitate the in-loop and postfiltering for the enhancement of compressed frames. Finally, a companion website hosting the contents developed in this work can be accessed publicly at https://purdueviper.github.io/dnn-coding/ .

Journal ArticleDOI
TL;DR: This paper designs padding algorithms tailored to each group of unoccupied pixels and proposes padding the residue of these pixels using the average residue of the occupied pixels in order to reduce the residue bitrate as much as possible.
Abstract: The state-of-the-art 2D-based dynamic point cloud (DPC) compression algorithm is the video-based point cloud compression (V-PCC) developed by the Moving Pictures Experts Group (MPEG). It first projects the DPC patch by patch from 3D to 2D and organizes the projected patches into a video. The video is then efficiently compressed by High Efficiency Video Coding. However, there are many unoccupied pixels that may have a significant influence on the coding efficiency. These unoccupied pixels are currently padded using either the average of 4-neighbors for the geometry or the push-pull algorithm for the color attribute. While these algorithms are simple, the unoccupied pixels are not handled in the most efficient way. In this paper, we divide the unoccupied pixels into two groups: those that should be occupied and those that should not be occupied according to the occupancy map. We then design padding algorithms tailored to each group to improve the rate-distortion performance of the V-PCC reference software, for both the geometry and the color attribute. The first group is the unoccupied pixels that should be occupied according to the block-based occupancy map. We attempt to pad those pixels using the real points in the original DPC to improve the quality of the reconstructed DPC. Additionally, we attempt to maintain the smoothness of each block so as not to negatively influence the video compression efficiency. The second group is the unoccupied pixels that were correctly identified as unoccupied according to the block-based occupancy map. These pixels are useless for improving the reconstructed quality of the DPC. Therefore, we attempt to minimize the bit cost of these pixels without considering their reconstruction qualities. The bit cost is determined by the residue of these pixels obtained by subtracting the prediction pixels from the original pixels. Therefore, we propose padding the residue using the average residue of the occupied pixels in order to minimize the bit cost. The proposed algorithms are implemented in the V-PCC and the corresponding HEVC reference software. The experimental results show the proposed algorithms can bring significant bitrate savings compared with the V-PCC.

Journal ArticleDOI
TL;DR: Performance studies indicate that compared to the state-of-the-art, the proposed technique is able to achieve impressive bandwidth saving for transmission of data over communication network without compromising faithful reconstruction of data at the receiver.
Abstract: Recent advances in electric metering infrastructure have given rise to the generation of gigantic chunks of data. Transmission of all of these data certainly poses a significant challenge in bandwidth and storage constrained Internet of Things (IoT), where smart meters act as sensors. In this work, a novel multivariate data compression scheme is proposed for smart metering IoT. The proposed algorithm exploits the cross correlation between different variables sensed by smart meters to reduce the dimension of data. Subsequently, sparsity in each of the decorrelated streams is utilized for temporal compression. To examine the quality of compression, the multivariate data is characterized using multivariate normal–autoregressive integrated moving average modeling before compression as well as after reconstruction of the compressed data. Our performance studies indicate that compared to the state-of-the-art, the proposed technique is able to achieve impressive bandwidth saving for transmission of data over communication network without compromising faithful reconstruction of data at the receiver. The proposed algorithm is tested in a real smart metering setup and its time complexity is also analyzed.

Journal ArticleDOI
TL;DR: It is demonstrated that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile.
Abstract: Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network (NN) autoencoder model can be implemented in a radiation-tolerant application-specific integrated circuit (ASIC) to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the Compact Muon Solenoid (CMS) experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the NN weights, a unique data compression algorithm can be deployed for each sensor in different detector regions and changing detector or collider conditions. To meet area, performance, and power constraints, we perform quantization-aware training to create an optimized NN hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework and was processed through synthesis and physical layout flows based on a low-power (LP)-CMOS 65-nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates and reports a total area of 3.6 mm2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation-tolerant on-detector ASIC implementation of an NN that has been designed for particle physics applications.

Journal ArticleDOI
16 Jun 2021
TL;DR: A survey of the point cloud compression methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented, providing guidance for potential standard implementors.
Abstract: In this article, a survey of the point cloud compression (PCC) methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented. Two paramount families of approaches reported in the literature—the projection- and octree-based methods—are proven to be efficient for encoding dense and sparse point clouds, respectively. These approaches are the pillars on which the Moving Picture Experts Group Committee developed two PCC standards published as final international standards in 2020 and early 2021, respectively, under the names: video-based PCC and geometry-based PCC. After surveying the current approaches for PCC, the technologies underlying the two standards are described in detail from an encoder perspective, providing guidance for potential standard implementors. In addition, experiment evaluations in terms of compression performances for both solutions are provided.

Journal ArticleDOI
TL;DR: A fast decision scheme using a lightweight neural network (LNN) to avoid redundant block partitioning in versatile video coding (VVC) and substantially decreases the encoding complexity of VVC with a slight coding loss under the All Intra configuration.
Abstract: In this paper, we propose a fast decision scheme using a lightweight neural network (LNN) to avoid redundant block partitioning in versatile video coding (VVC). A more versatile block structure, named the multi-type tree (MTT) structure, which includes binary trees (BTs) and ternary trees (TTs), is adopted by VCC, in addition to the traditional quadtree structure. The MTT improved the coding efficiency compared with previous video coding standards. However, the new tree structures, mainly TT, significantly increased the complexity of the VVC encoder. Although widespread application of VVC has been inhibited, this problem has not yet been investigated thoroughly in the literature. In this study, we first determine the statistical characteristics of coded parameters that exhibit correlation with the TT and develop two useful types of features —explicit VVC features (EVFs) and derived VVC features (DVFs) — to facilitate the intra coding of VVC. These features can be obtained efficiently during the intra prediction before the determination of the best block partitioning during rate-distortion optimization in VVC encoding. Our LNN model decides whether to terminate the nested TT block structures subsequent to a quadtree based on the features. The experimental results confirm that the proposed method substantially decreases the encoding complexity of VVC with a slight coding loss under the All Intra configuration. Our code, models, and dataset are available at https://github.com/foriamweak/MTTPartitioning_LNN .

Journal ArticleDOI
TL;DR: An effective approach to infer the just noticeable distortion (JND) profile based on patch-level structural visibility learning is proposed, with extensive experimental results showing the superiority of the proposed approach over the state-of-the-art.
Abstract: In this paper, we propose an effective approach to infer the just noticeable distortion (JND) profile based on patch-level structural visibility learning. Instead of pixel-level JND profile estimation, the image patch, which is regarded as the basic processing unit to better correlate with the human perception, can be further decomposed into three conceptually independent components for visibility estimation. In particular, to incorporate the structural degradation into the patch-level JND model, a deep learning-based structural degradation estimation model is trained to approximate the masking of structural visibility. In order to facilitate the learning process, a JND dataset is further established, including 202 pristine images and 7878 distorted images generated by advanced compression algorithms based on the upcoming Versatile Video Coding (VVC) standard. Extensive experimental results further show the superiority of the proposed approach over the state-of-the-art. Our dataset is available at: https://github.com/ShenXuelin-CityU/PWJNDInfer .

Journal ArticleDOI
TL;DR: A reinforcement learning (RL)-based UAV anti-jamming video transmission scheme to choose the video compression quantization parameter, the channel coding rate, the modulation and power control strategies against jamming attacks is proposed.
Abstract: Unmanned aerial vehicles (UAVs) that are widely utilized for video capturing, processing and transmission have to address jamming attacks with dynamic topology and limited energy. In this paper, we propose a reinforcement learning (RL)-based UAV anti-jamming video transmission scheme to choose the video compression quantization parameter, the channel coding rate, the modulation and power control strategies against jamming attacks. More specifically, this scheme applies RL to choose the UAV video compression and transmission policy based on the observed video task priority, the UAV-controller channel state and the received jamming power. This scheme enables the UAV to guarantee the video quality-of-experience (QoE) and reduce the energy consumption without relying on the jamming model or the video service model. A safe RL-based approach is further proposed, which uses deep learning to accelerate the UAV learning process and reduce the video transmission outage probability. The computational complexity is provided and the optimal utility of the UAV is derived and verified via simulations. Simulation results show that the proposed schemes significantly improve the video quality and reduce the transmission latency and energy consumption of the UAV compared with existing schemes.

Proceedings ArticleDOI
04 Feb 2021
TL;DR: In this paper, the authors proposed an enhanced Tchebichef moment utilizing Huffman Encoding (ETMH) technique, which obtains the first picture and its remodel into a grid design.
Abstract: Data Compression includes enhancing a flood of images and dynamically rearranging codes. The consequent arrangement of compressed codes will be greater and simpler than the initial set of images. The choice is to produce a clear-cut code for a definite image or group of images. The proposed algorithm is close to an assortment of data and rules that are consumed to be shared with the input images and the numeric value to Figure out which repeated code(s) can be sustained. The current Huffman coding upholds for double picture and neglects to give the sensible pressure proportion of the genuine yield of the encoder, which is controlled by a bunch of probabilities. While utilizing this kind of coding, an image that has an extremely high likelihood of event, produces a code with few pieces. An image with a low likelihood produces a code with a larger number of pieces. The source images are then diminished by applying improved Huffman encoding calculation to get a high pressure proportion. The proposed Enhanced Tchebichef Moment utilizing Huffman Encoding (ETMH) technique obtains the first picture and its remodel into a grid design. The maximum number of network sizes is separated into various non-covering small sized block lattices. The lesser the number of pixel size, grouping blocks are done to accomplish the pressure proportion. The proposed ETMH calculation is very useful for distinct picture organizations to discover strategies to provide enhanced outcomes and quality. The proposed ETMH calculation is actualized for dim scale picture and consequently, the pressure proportion table is produced. The proposed ETMH calculation is tried and actualized through different boundaries, like MSE, SNR and PSNR by utilizing MATLAB.

Journal ArticleDOI
TL;DR: A very high data-rate performance hardware accelerator is presented implementing the CCSDS-123.0-B-1 algorithm as an IP core targeting a space-grade FPGA, achieving high throughput performance.
Abstract: The explosive growth of data volume from next generation high-resolution and high-speed hyperspectral remote sensing systems will compete with the limited on-board storage resources and bandwidth available for the transmission of data to ground stations making hyperspectral image compression a mission critical and challenging on-board payload data processing task. The Consultative Committee for Space Data Systems (CCSDS) has issued recommended standard CCSDS-123.0-B-1 for lossless multispectral and hyperspectral image compression. In this paper, a very high data-rate performance hardware accelerator is presented implementing the CCSDS-123.0-B-1 algorithm as an IP core targeting a space-grade FPGA. For the first time, the introduced architecture based on the principles of C-slow retiming, exploits the inherent task-level parallelism of the algorithm under BIP ordering and implements a reconfigurable fine-grained pipeline in critical feedback loops, achieving high throughput performance. The CCSDS-123.0-B-1 IP core achieves beyond the current state-of-the-art data-rate performance with a maximum throughput of 213 MSamples/s (3.3 Gbps @ 16-bits) using 11 percent of LUTs and 27 percent of BRAMs of the Virtex-5QV FPGA resources for a typical hyperspectral image, leveraging the full throughput of a single SpaceFibre lane. To the best of our knowledge, it is the fastest implementation of CCSDS-123.0-B-1 targeting a space-grade FPGA to date.

Journal ArticleDOI
TL;DR: This paper proposes an occupancy map guided fast V-PCC method, in which coding is performed on the different types of blocks, by taking advantage of the fact that occupancy maps can explicitly indicate the block types.
Abstract: In video-based dynamic point cloud compression (V-PCC), 3D point clouds are projected into patches, and then the patches are padded into 2D images suitable for the video compression framework. However, the patch projection-based method produces a large number of empty pixels; the far and near components are projected to generate different 2D images (video frames), respectively. As a result, the generated video is with high resolutions and double frame rates, so the V-PCC has huge computational complexity. This paper proposes an occupancy map guided fast V-PCC method. Firstly, the relationship between the prediction coding and block complexity is studied based on a local linear image gradient model. Secondly, according to the V-PCC strategies of patch projection and block generation, we investigate the differences of rate-distortion characteristics between different types of blocks, and the temporal correlations between the far and near layers. Finally, by taking advantage of the fact that occupancy maps can explicitly indicate the block types, we propose an occupancy map guided fast coding method, in which coding is performed on the different types of blocks. Experiments have tested typical dynamic point clouds, and shown that the proposed method achieves an average 43.66% time-saving at the cost of only 0.27% and 0.16% Bjontegaard Delta (BD) rate increment under the geometry Point-to-Point (D1) error and attribute Luma Peak-Signal-Noise-Ratio (PSNR), respectively.