scispace - formally typeset
Search or ask a question

Showing papers on "Data compression published in 2020"


Posted Content
TL;DR: This paper proposes to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model and achieves a state-of-the-art performance against existing learned compression methods.
Abstract: Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. This project page is at this https URL this https URL

310 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this paper, a discretized Gaussian mixture likelihood (GMM) is used to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model.
Abstract: Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM.

240 citations


Journal ArticleDOI
TL;DR: The evolution and development of neural network-based compression methodologies are introduced for images and video respectively and the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision.
Abstract: In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network-based image and video compression techniques. The evolution and development of neural network-based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptors in the age of artificial intelligence.

235 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This paper proposes scale-space flow, an intuitive generalization of optical flow that adds a scale parameter to allow the network to better model uncertainty and outperform analogous state-of-the art learned video compression models while being trained using a much simpler procedure and without any pre-trained optical flow networks.
Abstract: Despite considerable progress on end-to-end optimized deep networks for image compression, video coding remains a challenging task. Recently proposed methods for learned video compression use optical flow and bilinear warping for motion compensation and show competitive rate-distortion performance relative to hand-engineered codecs like H.264 and HEVC. However, these learning-based methods rely on complex architectures and training schemes including the use of pre-trained optical flow networks, sequential training of sub-networks, adaptive rate control, and buffering intermediate reconstructions to disk during training. In this paper, we show that a generalized warping operator that better handles common failure cases, e.g. disocclusions and fast motion, can provide competitive compression results with a greatly simplified model and training procedure. Specifically, we propose scale-space flow, an intuitive generalization of optical flow that adds a scale parameter to allow the network to better model uncertainty. Our experiments show that a low-latency video compression model (no B-frames) using scale-space flow for motion compensation can outperform analogous state-of-the art learned video compression models while being trained using a much simpler procedure and without any pre-trained optical flow networks.

197 citations


Journal ArticleDOI
TL;DR: Advancing over previous work, this system is able to reproduce challenging content such as view-dependent reflections, semi-transparent surfaces, and near-field objects as close as 34 cm to the surface of the camera rig.
Abstract: We present a system for capturing, reconstructing, compressing, and rendering high quality immersive light field video. We accomplish this by leveraging the recently introduced DeepView view interpolation algorithm, replacing its underlying multi-plane image (MPI) scene representation with a collection of spherical shells that are better suited for representing panoramic light field content. We further process this data to reduce the large number of shell layers to a small, fixed number of RGBA+depth layers without significant loss in visual quality. The resulting RGB, alpha, and depth channels in these layers are then compressed using conventional texture atlasing and video compression techniques. The final compressed representation is lightweight and can be rendered on mobile VR/AR platforms or in a web browser. We demonstrate light field video results using data from the 16-camera rig of [Pozo et al. 2019] as well as a new low-cost hemispherical array made from 46 synchronized action sports cameras. From this data we produce 6 degree of freedom volumetric videos with a wide 70 cm viewing baseline, 10 pixels per degree angular resolution, and a wide field of view, at 30 frames per second video frame rates. Advancing over previous work, we show that our system is able to reproduce challenging content such as view-dependent reflections, semi-transparent surfaces, and near-field objects as close as 34 cm to the surface of the camera rig.

179 citations


Posted Content
TL;DR: CompressAI is presented, a platform that provides custom operations, layers, models and tools to research, develop and evaluate end-to-end image and video compression codecs and is intended to be soon extended to the video compression domain.
Abstract: This paper presents CompressAI, a platform that provides custom operations, layers, models and tools to research, develop and evaluate end-to-end image and video compression codecs. In particular, CompressAI includes pre-trained models and evaluation tools to compare learned methods with traditional codecs. Multiple models from the state-of-the-art on learned end-to-end compression have thus been reimplemented in PyTorch and trained from scratch. We also report objective comparison results using PSNR and MS-SSIM metrics vs. bit-rate, using the Kodak image dataset as test set. Although this framework currently implements models for still-picture compression, it is intended to be soon extended to the video compression domain.

175 citations


Journal ArticleDOI
TL;DR: The proposed PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks can model spatiotemporal coherence to effectively perform predictive coding inside the learning network and provides a possible new direction to further improve compression efficiency and functionalities of future video coding.
Abstract: One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

116 citations


Journal ArticleDOI
Ling-Yu Duan1, Jiaying Liu1, Wenhan Yang1, Tiejun Huang1, Wen Gao1 
TL;DR: In this paper, a new area, Video Coding for Machines (VCM), is proposed to bridge the gap between feature coding for machine vision and video coding for human vision, and the preliminary results have demonstrated the performance and efficiency gains.
Abstract: Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this article, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts. 1 Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well. 1 https://lists.aau.at/mailman/listinfo/mpeg-vcm

113 citations


Journal ArticleDOI
TL;DR: A deep network compression algorithm that performs weight pruning and quantization jointly, and in parallel with fine-tuning, that improves the state-of-the-art in network compression on AlexNet, VGGNet, GoogLeNet, and ResNet is proposed.
Abstract: Deep neural networks enable state-of-the-art accuracy on visual recognition tasks such as image classification and object detection. However, modern networks contain millions of learned connections, and the current trend is towards deeper and more densely connected architectures. This poses a challenge to the deployment of state-of-the-art networks on resource-constrained systems, such as smartphones or mobile robots. In general, a more efficient utilization of computation resources would assist in deployment scenarios from embedded platforms to computing clusters running ensembles of networks. In this paper, we propose a deep network compression algorithm that performs weight pruning and quantization jointly, and in parallel with fine-tuning. Our approach takes advantage of the complementary nature of pruning and quantization and recovers from premature pruning errors, which is not possible with two-stage approaches. In experiments on ImageNet, CLIP-Q (Compression Learning by In-Parallel Pruning-Quantization) improves the state-of-the-art in network compression on AlexNet, VGGNet, GoogLeNet, and ResNet. We additionally demonstrate that CLIP-Q is complementary to efficient network architecture design by compressing MobileNet and ShuffleNet, and that CLIP-Q generalizes beyond convolutional networks by compressing a memory network for visual question answering.

100 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: An end-to-end learned video compression scheme for low-latency scenarios that introduces the usage of the previous multiple frames as references and designs a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well.
Abstract: We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respectively. To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss function. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available.

99 citations


Proceedings ArticleDOI
26 May 2020
TL;DR: This paper introduces PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds, an optimally-weighted linear combination of geometry-based and color-based features that outperforms all previous metrics in terms of correlation with mean opinion scores.
Abstract: 3D point clouds constitute an emerging multimedia content, now used in a wide range of applications. The main drawback of this representation is the size of the data since typical point clouds may contain millions of points, usually associated with both geometry and color information. Consequently, a significant amount of work has been devoted to the efficient compression of this representation. Lossy compression leads to a degradation of the data and thus impacts the visual quality of the displayed content. In that context, predicting perceived visual quality computationally is essential for the optimization and evaluation of compression algorithms. In this paper, we introduce PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds. The metric is an optimally-weighted linear combination of geometry-based and color-based features. We evaluate its performance on an open subjective dataset of colored point clouds compressed by several algorithms; the proposed quality assessment approach outperforms all previous metrics in terms of correlation with mean opinion scores.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Wang et al. as discussed by the authors proposed a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network, where the frames in the first layer are compressed by an image compression method with the highest quality.
Abstract: In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. The frames in the first layer are compressed by an image compression method with the highest quality. Using these frames as references, we propose the Bi-Directional Deep Compression (BDDC) network to compress the second layer with relatively high quality. Then, the third layer frames are compressed with the lowest quality, by the proposed Single Motion Deep Compression (SMDC) network, which adopts a single motion map to estimate the motions of multiple frames, thus saving bits for motion information. In our deep decoder, we develop the Weighted Recurrent Quality Enhancement (WRQE) network, which takes both compressed frames and the bit stream as inputs. In the recurrent cell of WRQE, the memory and update signal are weighted by quality features to reasonably leverage multi-frame information for enhancement. In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides, respectively. Finally, the experiments validate that our HLVC approach advances the state-of-the-art of deep video compression methods, and outperforms the "Low-Delay P (LDP) very fast" mode of x265 in terms of both PSNR and MS-SSIM. The project page is at https://github.com/RenYang-home/HLVC.

Journal ArticleDOI
TL;DR: In this article, a novel lossy compression algorithm for multidimensional data over regular grids is proposed, which leverages the higher-order singular value decomposition (HOSVD), a generalization of the SVD to three dimensions and higher, together with bit-plane, run-length and arithmetic coding to compress the HOSVD transform coefficients.
Abstract: Memory and network bandwidth are decisive bottlenecks when handling high-resolution multidimensional data sets in visualization applications, and they increasingly demand suitable data compression strategies. We introduce a novel lossy compression algorithm for multidimensional data over regular grids. It leverages the higher-order singular value decomposition (HOSVD), a generalization of the SVD to three dimensions and higher, together with bit-plane, run-length and arithmetic coding to compress the HOSVD transform coefficients. Our scheme degrades the data particularly smoothly and achieves lower mean squared error than other state-of-the-art algorithms at low-to-medium bit rates, as it is required in data archiving and management for visualization purposes. Further advantages of the proposed algorithm include very fine bit rate selection granularity and the ability to manipulate data at very small cost in the compression domain, for example to reconstruct filtered and/or subsampled versions of all (or selected parts) of the data set.

Journal ArticleDOI
TL;DR: An efficient and secure multi-user multi-task computation offloading model with guaranteed performance in latency, energy, and security for mobile-edge computing and can scale well for large-scale IoT networks.
Abstract: Mobile edge computing (MEC) is a new paradigm to alleviate resource limitations of mobile IoT networks through computation offloading with low latency. This article presents an efficient and secure multi-user multi-task computation offloading model with guaranteed performance in latency, energy, and security for mobile-edge computing. It does not only investigate offloading strategy but also considers resource allocation, compression and security issues. Firstly, to guarantee efficient utilization of the shared resource in multi-user scenarios, radio and computation resources are jointly addressed. In addition, JPEG and MPEG4 compression algorithms are used to reduce the transfer overhead. To fulfill security requirements, a security layer is introduced to protect the transmitted data from cyber-attacks. Furthermore, an integrated model of resource allocation, compression, and security is formulated as an integer nonlinear problem with the objective of minimizing the weighted sum of energy under a latency constraint. As this problem is considered as NP-hard, linearization and relaxation approaches are applied to transform the problem into a convex one. Finally, an efficient offloading algorithm is designed with detailed processes to make the computation offloading decision for computation tasks of mobile users. Simulation results show that our model not only saves about 46% of system overhead consumption in comparison with local execution but also scale well for large-scale IoT networks.

Journal ArticleDOI
TL;DR: Experimental results illustrated that: 1) the proposed GPU-based parallel implementation frameworks could significantly reduce the computational time for both trajectory compression and visualization; 2) the influence of compressed vessel trajectories on trajectory visualization could be negligible if the compression threshold was selected suitably; and the Gaussian kernel was capable of generating more appropriate KDE-based visualization performance by comparing with other seven kernel functions.
Abstract: The automatic identification system (AIS), an automatic vessel-tracking system, has been widely adopted to perform intelligent traffic management and collision avoidance services in maritime Internet-of-Things (IoT) industries. With the rapid development of maritime transportation, tremendous numbers of AIS-based vessel trajectory data have been collected, which make trajectory data compression imperative and challenging. This article mainly focuses on the compression and visualization of large-scale vessel trajectories and their graphics processing unit (GPU)-accelerated implementations. The visualization was implemented to investigate the influence of compression on vessel trajectory data quality. In particular, the Douglas–Peucker (DP) and kernel density estimation (KDE) algorithms, respectively, utilized for trajectory compression and visualization, were significantly accelerated through the massively parallel computation capabilities of the GPU architecture. Comprehensive experiments on trajectory compression and visualization have been conducted on large-scale AIS data of recording ship movements collected from three different water areas, i.e., the South Channel of Yangtze River Estuary, the Chengshan Jiao Promontory, and the Zhoushan Islands. Experimental results illustrated that: 1) the proposed GPU-based parallel implementation frameworks could significantly reduce the computational time for both trajectory compression and visualization; 2) the influence of compressed vessel trajectories on trajectory visualization could be negligible if the compression threshold was selected suitably; and 3) the Gaussian kernel was capable of generating more appropriate KDE-based visualization performance by comparing with other seven kernel functions.

Posted Content
TL;DR: The experiments validate that the HLVC approach advances the state-of-the-art of deep video compression methods, and outperforms the "Low-Delay P (LDP) very fast" mode of x265 in terms of both PSNR and MS-SSIM.
Abstract: In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. The frames in the first layer are compressed by an image compression method with the highest quality. Using these frames as references, we propose the Bi-Directional Deep Compression (BDDC) network to compress the second layer with relatively high quality. Then, the third layer frames are compressed with the lowest quality, by the proposed Single Motion Deep Compression (SMDC) network, which adopts a single motion map to estimate the motions of multiple frames, thus saving bits for motion information. In our deep decoder, we develop the Weighted Recurrent Quality Enhancement (WRQE) network, which takes both compressed frames and the bit stream as inputs. In the recurrent cell of WRQE, the memory and update signal are weighted by quality features to reasonably leverage multi-frame information for enhancement. In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides, respectively. Finally, the experiments validate that our HLVC approach advances the state-of-the-art of deep video compression methods, and outperforms the "Low-Delay P (LDP) very fast" mode of x265 in terms of both PSNR and MS-SSIM. The project page is at this https URL.

Journal ArticleDOI
TL;DR: A lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach that uses Random Forest classifiers to determine for each coding block the most probable partition modes and to minimize the encoding loss induced by misclassification is proposed.
Abstract: Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of the future video coding standard, named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined in High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a lightweight and tunable QTBT partitioning scheme based on a Machine Learning (ML) approach. The proposed solution uses Random Forest classifiers to determine for each coding block the most probable partition modes. To minimize the encoding loss induced by misclassification, risk intervals for classifier decisions are introduced in the proposed solution. By varying the size of risk intervals, tunable trade-off between encoding complexity reduction and coding loss is achieved. The proposed solution implemented in the JEM-7.0 software offers encoding complexity reductions ranging from 30% to 70% in average for only 0.7% to 3.0% Bjontegaard Delta Rate (BD-BR) increase in Random Access (RA) coding configuration, with very slight overhead induced by Random Forest. The proposed solution based on Random Forest classifiers is also efficient to reduce the complexity of the Multi-Type Tree (MTT) partitioning scheme under the VTM-5.0 software, with complexity reductions ranging from 25% to 61% in average for only 0.4% to 2.2% BD-BR increase.

Journal ArticleDOI
TL;DR: All technology proposed in the responses to the CfP was based on the classic block-based hybrid video coding design, extending it by new elements of partitioning, intra- and inter-picture prediction, prediction signal filtering, transforms, quantization/scaling, entropy coding, and in-loop filtering.
Abstract: After the development of the High-Efficiency Video Coding Standard (HEVC), ITU-T VCEG and ISO/IEC MPEG formed the Joint Video Exploration Team (JVET), which started exploring video coding technology with higher coding efficiency, including development of a Joint Exploration Model (JEM) algorithm and a corresponding software implementation. The technology explored in the last version of the JEM further increases the compression capabilities of the hybrid video coding approach by adding new tools, reaching up to 30% bit rate reduction compared to HEVC based on the Bjontegaard delta bit rate (BD-rate) metric, and further improvement beyond that in terms of subjective visual quality. This provided enough evidence to issue a joint Call for Proposals (CfP) for a new standardization activity now known as Versatile Video Coding (VVC). All technology proposed in the responses to the CfP was based on the classic block-based hybrid video coding design, extending it by new elements of partitioning, intra- and inter-picture prediction, prediction signal filtering, transforms, quantization/scaling, entropy coding, and in-loop filtering. This article provides an overview of technology that was proposed in the responses to the CfP, with a focus on techniques that were not already explored in the JEM context.

Posted Content
Lila Huang1, Shenlong Wang1, Kelvin Wong1, Jerry Liu1, Raquel Urtasun1 
TL;DR: In this article, a tree-structured conditional entropy model was proposed to encode the LiDAR points into an octree, a data-efficient structure suitable for sparse point clouds.
Abstract: We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds. Our method exploits the sparsity and structural redundancy between points to reduce the bitrate. Towards this goal, we first encode the LiDAR points into an octree, a data-efficient structure suitable for sparse point clouds. We then design a tree-structured conditional entropy model that models the probabilities of the octree symbols to encode the octree into a compact bitstream. We validate the effectiveness of our method over two large-scale datasets. The results demonstrate that our approach reduces the bitrate by 10-20% at the same reconstruction quality, compared to the previous state-of-the-art. Importantly, we also show that for the same bitrate, our approach outperforms other compression algorithms when performing downstream 3D segmentation and detection tasks using compressed representations. Our algorithm can be used to reduce the onboard and offboard storage of LiDAR points for applications such as self-driving cars, where a single vehicle captures 84 billion points per day

Journal ArticleDOI
TL;DR: This paper proposes a deep learning-based YLP feature extraction that jointly captures daily and seasonal variations by leveraging convolutional autoencoder (CAE) and confirms that year-round characteristics are well captured during the clustering process and also clearly visualized with load images.
Abstract: As the number of smart meters increases, compression of metering data becomes essential for data transmission, storing and processing perspectives. Specifically, feature extraction can be used for the compression of metering data and further be utilized for smart grid applications such as customer clustering. So far, there are many studies for compression and clustering based on daily load profiles. However, in order to account for long-term characteristics of electricity load, utilizing yearly load profiles (YLPs) is vital for customer load clustering and analysis. In this paper, we propose a deep learning-based YLP feature extraction that jointly captures daily and seasonal variations. By leveraging convolutional autoencoder (CAE), YLPs in 8,640-dimensional space are compressed to 100-dimensional vectors. We apply the proposed CAE framework to YLPs of 1,405 residential customers and verify that the proposed CAE outperforms other dimensionality reduction methods in terms of reconstruction errors, e.g., by 19–40%, or the compression ratio is increased by 130% or higher than other methods for the same reconstruction error. In addition, clustering analysis is performed on the encoded YLPs. Our results confirm that year-round characteristics are well captured during the clustering process and also clearly visualized with load images.

Journal ArticleDOI
TL;DR: A new extensive and representative video database, BVI-DVC, is presented for training CNN-based video compression systems, with specific emphasis on machine learning tools that enhance conventional coding architectures, including spatial resolution and bit depth up-sampling, post-processing and in-loop filtering.
Abstract: Deep learning methods are increasingly being applied in the optimisation of video compression algorithms and can achieve significantly enhanced coding gains, compared to conventional approaches. Such approaches often employ Convolutional Neural Networks (CNNs) which are trained on databases with relatively limited content coverage. In this paper, a new extensive and representative video database, BVI-DVC, is presented for training CNN-based video compression systems, with specific emphasis on machine learning tools that enhance conventional coding architectures, including spatial resolution and bit depth up-sampling, post-processing and in-loop filtering. BVI-DVC contains 800 sequences at various spatial resolutions from 270p to 2160p and has been evaluated on ten existing network architectures for four different coding tools. Experimental results show that this database produces significant improvements in terms of coding gains over three existing (commonly used) image/video training databases under the same training and evaluation configurations. The overall additional coding improvements by using the proposed database for all tested coding modules and CNN architectures are up to 10.3% based on the assessment of PSNR and 8.1% based on VMAF.

Posted Content
TL;DR: This work proposes a content adaptive and error propagation aware video compression system that outperforms the state-of-the-art learning based video codecs on benchmark datasets without increasing the model size or decreasing the decoding speed.
Abstract: Recently, learning based video compression methods attract increasing attention. However, the previous works suffer from error propagation due to the accumulation of reconstructed error in inter predictive coding. Meanwhile, the previous learning based video codecs are also not adaptive to different video contents. To address these two problems, we propose a content adaptive and error propagation aware video compression system. Specifically, our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame. Based on the learned long-term temporal information, our approach effectively alleviates error propagation in reconstructed frames. More importantly, instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system. The proposed approach updates the parameters for encoder according to the rate-distortion criterion but keeps the decoder unchanged in the inference stage. Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets. Our method is simple yet effective and outperforms the state-of-the-art learning based video codecs on benchmark datasets without increasing the model size or decreasing the decoding speed.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed compression scheme for the attributes of voxelized 3D point clouds is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.
Abstract: 3D point clouds associated with attributes are considered as a promising paradigm for immersive communication. However, the corresponding compression schemes for this media are still in the infant stage. Moreover, in contrast to conventional image/video compression, it is a more challenging task to compress 3D point cloud data, arising from the irregular structure. In this paper, we propose a novel and effective compression scheme for the attributes of voxelized 3D point clouds. In the first stage, an input voxelized 3D point cloud is divided into blocks of equal size. Then, to deal with the irregular structure of 3D point clouds, a geometry-guided sparse representation (GSR) is proposed to eliminate the redundancy within each block, which is formulated as an $\ell _{0}$ -norm regularized optimization problem. Also, an inter-block prediction scheme is applied to remove the redundancy between blocks. Finally, by quantitatively analyzing the characteristics of the resulting transform coefficients by GSR, an effective entropy coding strategy that is tailored to our GSR is developed to generate the bitstream. Experimental results over various benchmark datasets show that the proposed compression scheme is able to achieve better rate-distortion performance and visual quality, compared with state-of-the-art methods.

Posted Content
TL;DR: A higher-order convolutional LSTM model that can efficiently learn long-term spatio-temporal correlations in the video sequence, along with a succinct representations of the history is proposed, which outperforms existing approaches, but uses only a fraction of parameters, including the baseline models.
Abstract: Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation.However, existing methods still perform poorly on challenging video tasks such as long-term forecasting. This is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. In this paper, we propose a higher-order convolutional LSTM model that can efficiently learn these correlations, along with a succinct representations of the history. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. To make this feasible in terms of computation and memory requirements, we propose a novel convolutional tensor-train decomposition of the higher-order model. This decomposition reduces the model complexity by jointly approximating a sequence of convolutional kernels asa low-rank tensor-train factorization. As a result, our model outperforms existing approaches, but uses only a fraction of parameters, including the baseline models.Our results achieve state-of-the-art performance in a wide range of applications and datasets, including the multi-steps video prediction on the Moving-MNIST-2and KTH action datasets as well as early activity recognition on the Something-Something V2 dataset.

Journal ArticleDOI
TL;DR: This paper describes technologies relevant to 360° video for VVC, including projection formats, pre- and post-processing methods, and 360°-video specific coding tool modifications in these proposals.
Abstract: Augmented reality (AR) and virtual reality (VR) applications have seen rising popularity in recent years. Omnidirectional 360° video is a video format often used in AR and VR applications. To address the industry needs, a new HEVC edition recently published includes several supplemental enhancement information (SEI) messages to enable the carriage of omnidirectional video using HEVC. However, further improvement in 360° video compression efficiency is needed. In order to address this challenge, the Joint Video Exploration Team (JVET) of ITU-T VCEG and ISO/IEC MPEG has been investigating 360° video coding technologies, including projection formats, pre- and post-processing technologies, as well as 360°-video-specific coding tools since 2016. The joint call for proposals (CfP) recently issued by ITU-T VCEG and ISO/IEC MPEG on video compression technologies beyond HEVC included a category on 360° video. Twelve CfP responses in the 360° video category were received. This paper describes technologies relevant to 360° video for VVC. A summary of projection formats, pre- and post-processing methods, and 360°-video specific coding tool modifications in these proposals is provided.

Journal ArticleDOI
TL;DR: This paper provides an overview of the coding algorithms of the Joint Exploration Model (JEM) for video compression with capability beyond HEVC, which was developed by the Joint Video Exploration Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts group (MPEG).
Abstract: This paper provides an overview of the coding algorithms of the Joint Exploration Model (JEM) for video compression with capability beyond HEVC, which was developed by the Joint Video Exploration Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The goal of the JEM development and experimentation was to provide evidence that sufficient coding efficiency improvement over the High Efficiency Video Coding (HEVC) standard can be achieved, which would justify the need for a new video coding standard with a compression capability significantly exceeding that of HEVC. The development of the JEM provided an ability to conduct studies toward that goal in a verifiable and collaborative manner and led to the launching of the project to develop the new Versatile Video Coding (VVC) standard. Objective metric gains exceeding 30% were measured for most of the tested high-resolution video content that represents current demanding new applications, and subjective testing using human observers showed even more benefit.

Proceedings ArticleDOI
TL;DR: In this paper, the authors analyzed the complexity of Versatile Video Coding (VVC) encoder and decoder for six video sequences of 720p, 1080p, and 2160p under Low-Delay (LD), Random Access (RA), and All-Intra (AI) conditions.
Abstract: While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.

Journal ArticleDOI
TL;DR: DeepCABAC as mentioned in this paper applies a novel quantization scheme that minimizes a rate-distortion function while simultaneously taking the impact of quantization to the DNN performance into account, achieving higher compression rates than previously proposed coding techniques for DNN compression.
Abstract: In the past decade deep neural networks (DNNs) have shown state-of-the-art performance on a wide range of complex machine learning tasks. Many of these results have been achieved while growing the size of DNNs, creating a demand for efficient compression and transmission of them. In this work we present DeepCABAC, a universal compression algorithm for DNNs that is based on applying Context-based Adaptive Binary Arithmetic Coder (CABAC) to the DNN parameters. CABAC was originally designed for the H.264/AVC video coding standard and became the state-of-the-art for the lossless compression part of video compression. DeepCABAC applies a novel quantization scheme that minimizes a rate-distortion function while simultaneously taking the impact of quantization to the DNN performance into account. Experimental results show that DeepCABAC consistently attains higher compression rates than previously proposed coding techniques for DNN compression. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy, thus being able to represent the entire network with merely 9 MB. The source code for encoding and decoding can be found at https://github.com/fraunhoferhhi/DeepCABAC .

Proceedings ArticleDOI
Lila Huang1, Shenlong Wang1, Kelvin Wong1, Jerry Liu1, Raquel Urtasun1 
14 Jun 2020
TL;DR: A novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds and designs a tree-structured conditional entropy model that can be directly applied to octree structures to predict the probability of a symbol’s occurrence.
Abstract: We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds. Our method exploits the sparsity and structural redundancy between points to reduce the bitrate. Towards this goal, we first encode the point cloud into an octree, a data-efficient structure suitable for sparse point clouds. We then design a tree-structured conditional entropy model that can be directly applied to octree structures to predict the probability of a symbol’s occurrence. We validate the effectiveness of our method over two large-scale datasets. The results demonstrate that our approach reduces the bitrate by 10- 20% at the same reconstruction quality, compared to the previous state-of-the-art. Importantly, we also show that for the same bitrate, our approach outperforms other compression algorithms when performing downstream 3D segmentation and detection tasks using compressed representations. This helps advance the feasibility of using point cloud compression to reduce the onboard and offboard storage for safety-critical applications such as self-driving cars, where a single vehicle captures 84 billion points per day.

Posted Content
TL;DR: Inspired by the recent success of AutoML in deep compression, AutoML is introduced to GAN compression and an AutoGAN-Distiller (AGD) framework is developed and yields remarkably lightweight yet more competitive compressed models, that largely outperform existing alternatives.
Abstract: The compression of Generative Adversarial Networks (GANs) has lately drawn attention, due to the increasing demand for deploying GANs into mobile devices for numerous applications such as image translation, enhancement and editing. However, compared to the substantial efforts to compressing other deep models, the research on compressing GANs (usually the generators) remains at its infancy stage. Existing GAN compression algorithms are limited to handling specific GAN architectures and losses. Inspired by the recent success of AutoML in deep compression, we introduce AutoML to GAN compression and develop an AutoGAN-Distiller (AGD) framework. Starting with a specifically designed efficient search space, AGD performs an end-to-end discovery for new efficient generators, given the target computational resource constraints. The search is guided by the original GAN model via knowledge distillation, therefore fulfilling the compression. AGD is fully automatic, standalone (i.e., needing no trained discriminators), and generically applicable to various GAN models. We evaluate AGD in two representative GAN tasks: image translation and super resolution. Without bells and whistles, AGD yields remarkably lightweight yet more competitive compressed models, that largely outperform existing alternatives. Our codes and pretrained models are available at this https URL.