scispace - formally typeset
Search or ask a question

Showing papers on "Image compression published in 2023"


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper exploited the relationship between the code maps produced by deep neural networks and introduced the proxy similarity functions as a workaround to take the global similarity within the context into account and thus hinders accurate entropy estimation.
Abstract: The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in reducing the entropy and boosting the joint rate-distortion performance. However, existing deep learning based entropy models generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinders the accurate entropy estimation. To address this issue, we propose a special nonlocal operation for context modeling by employing the global similarity within the context. Specifically, due to the constraint of context, nonlocal operation is incalculable in context modeling. We exploit the relationship between the code maps produced by deep neural networks and introduce the proxy similarity functions as a workaround. Then, we combine the local and the global context via a nonlocal attention block and employ it in masked convolutional networks for entropy modeling. Taking the consideration that the width of the transforms is essential in training low distortion models, we finally produce a U-net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the priority of the proposed context-based nonlocal attention block in entropy modeling and the U-net block in low distortion situations. On the whole, our model performs favorably against the existing image compression standards and recent deep image compression models.

6 citations


Journal ArticleDOI
TL;DR: In this paper , the authors proposed Adaptive Block Compressed Sensing (ABCS) for compressing different medical images with a high compression ratio, achieving 40% to 70% compression.

6 citations


Journal ArticleDOI
TL;DR: In this paper , a wavelet packet transformation (WPT) based convolutional autoencoder (WPTCAE) was proposed for massive ultrasonic data compression on the order of megabytes, which achieved a compression accuracy of 98% by using only 6% of the original data.
Abstract: Ultrasonic signal acquisition platforms generate considerable amounts of data to be stored and processed, especially when multichannel scanning or beamforming is employed. Reducing the mass storage and allowing high-speed data transmissions necessitate the compression of ultrasonic data into a representation with fewer bits. High compression accuracy is crucial in many applications, such as ultrasonic medical imaging and nondestructive testing (NDT). In this study, we present learning models for massive ultrasonic data compression on the order of megabytes. A common and highly efficient compression method for ultrasonic data is signal decomposition and subband elimination using wavelet packet transformation (WPT). We designed an algorithm for finding the wavelet kernel that provides maximum energy compaction and the optimal subband decomposition tree structure for a given ultrasonic signal. Furthermore, the WPT convolutional autoencoder (WPTCAE) compression algorithm is proposed based on the WPT compression tree structure and the use of machine learning for estimating the optimal kernel. To further improve the compression accuracy, an autoencoder (AE) is incorporated into the WPTCAE model to build a hybrid model. The performance of the WPTCAE compression model is examined and benchmarked against other compression algorithms using ultrasonic radio frequency (RF) datasets acquired in NDT and medical imaging applications. The experimental results clearly show that the WPTCAE compression model provides improved compression ratios while maintaining high signal fidelity. The proposed learning models can achieve a compression accuracy of 98% by using only 6% of the original data.

4 citations


Journal ArticleDOI
TL;DR: In this paper , a residual-enhanced mask-based progressive generative coding (RMPGC) framework is proposed for image compression in wireless communications, multi-user broadcasting, and multi-tasking applications.
Abstract: Progressive deep image compression (DIC) with hybrid contexts is an under-investigated problem that aims to jointly maximize the utility of a compressed image for multiple contexts or tasks under variable rates. In this paper, we consider the contexts of image reconstruction and classification. We propose a DIC framework, called residual-enhanced mask-based progressive generative coding (RMPGC), designed for explicit control of the performance within the rate-distortion-classification-perception (RDCP) trade-off. Three independent mechanisms are introduced to yield a semantically structured latent representation that can support parameterized control of rate and context adaptation. Experimental results show that the proposed RMPGC outperforms a benchmark DIC scheme using the same generative adversarial nets (GANs) backbone in all six metrics related to classification, distortion, and perception. Moreover, RMPGC is a flexible framework that can be applied to different neural network backbones. Some typical implementations are given and shown to outperform the classic BPG codec and four state-of-the-art DIC schemes in classification and perception metrics, with a slight degradation in distortion metrics. Our proposal of a nonlinear-neural-coded and richly structured latent space makes the proposed DIC scheme well suited for image compression in wireless communications, multi-user broadcasting, and multi-tasking applications.

3 citations


Journal ArticleDOI
TL;DR: In this article , an image compression-encryption system is proposed to achieve security with low bandwidth and image de-noising issues during image transmission, where a 3D chaotic logistic map with DNA encoding and Tuna Swarm Optimization is employed for innovative image encryption.
Abstract: Images and video-based multimedia data are growing rapidly due to communication network technology. During image compression and transmission, images are inevitably corrupted by noise due to the influence of the environment, transmission channels, and other factors, resulting in the damage and degradation of digital images. Numerous real-time applications, such as digital photography, traffic monitoring, obstacle detection, surveillance applications, automated character recognition, etc are affected by this information loss. Therefore, the efficient and safe transmission of data has become a vital study area. In this research, an image compression–encryption system is proposed to achieve security with low bandwidth and image de-noising issues during image transmission. The Chevrolet transformation is proposed to improve image compression quality, reduce storage space, and enhance de- noising. A 3D chaotic logistic map with DNA encoding and Tuna Swarm Optimization is employed for innovative image encryption. This optimization approach may significantly increase the image's encryption speed and transmission security. The proposed system is built using the Xilinx system generator tool on a field-programmable gate array (FPGA). Experimental analysis and experimental findings show the reliability and scalability of the image compression and encryption technique designed. For different images, the security analysis is performed using several metrics and attains 32.33 dB PSNR, 0.98 SSIM, and 7.99721 information entropy. According to the simulation results, the implemented work is more secure and reduces image redundancy more than existing methods.

2 citations


Journal ArticleDOI
TL;DR: In this article , a complete image processing system including improved median filtering with better PSNR and operating frequency, image compression module with improved PSNR & compression ratio, image encryption module using Advanced Encryption Standard.

2 citations


Journal ArticleDOI
TL;DR: In this paper , a curvelet transform based hyperspectral image compression algorithm (HSICA) was proposed, which has high coding gain, low coding complexity, at par coding memory requirement, and works for both (lossy and lossless) compression.
Abstract: The wavelet transform is widely used in the task of hyperspectral image compression (HSIC). They have achieved outstanding performance in the compression of a hyperspectral (HS) image, which has attracted great interest. However, transform based hyperspectral image compression algorithm (HSICA) has low-coding gain than the other state of art HSIC algorithms. To solve this problem, this manuscript proposes a curvelet transform based HSIC algorithm. The curvelet transform is a multiscale mathematical transform that represents the curve and edges of the HS image more efficiently than the wavelet transform. The experiment results show that the proposed compression algorithm has high-coding gain, low-coding complexity, at par coding memory requirement, and works for both (lossy and lossless) compression. Thus, it is a suitable contender for the compression process in the HS image sensors.

2 citations



Journal ArticleDOI
23 Jan 2023-Sensors
TL;DR: In this paper , a subjective and objective image quality assessment (IQA) method is integrated to evaluate the range of the image quality metrics (IQMs) values that guarantee the visually or near-visually lossless compression performed by the JPEG 1 standard (ISO/IEC 10918).
Abstract: The usage of media such as images and videos has been extensively increased in recent years. It has become impractical to store images and videos acquired by camera sensors in their raw form due to their huge storage size. Generally, image data is compressed with a compression algorithm and then stored or transmitted to another platform. Thus, image compression helps to reduce the storage size and transmission cost of the images and videos. However, image compression might cause visual artifacts, depending on the compression level. In this regard, performance evaluation of the compression algorithms is an essential task needed to reconstruct images with visually or near-visually lossless quality in case of lossy compression. The performance of the compression algorithms is assessed by both subjective and objective image quality assessment (IQA) methodologies. In this paper, subjective and objective IQA methods are integrated to evaluate the range of the image quality metrics (IQMs) values that guarantee the visually or near-visually lossless compression performed by the JPEG 1 standard (ISO/IEC 10918). A novel “Flicker Test Software” is developed for conducting the proposed subjective and objective evaluation study. In the flicker test, the selected test images are subjectively analyzed by subjects at different compression levels. The IQMs are calculated at the previous compression level, when the images were visually lossless for each subject. The results analysis shows that the objective IQMs with more closely packed values having the least standard deviation that guaranteed the visually lossless compression of the images with JPEG 1 are the feature similarity index measure (FSIM), the multiscale structural similarity index measure (MS-SSIM), and the information content weighted SSIM (IW-SSIM), with average values of 0.9997, 0.9970, and 0.9970 respectively.

1 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper designed two optimized JPEGXT and JPEG (JPEG_OPT) approaches by amplifying discrete cosine transform coefficients and using the entire anatomical region as ROI (region of interest).

1 citations


Journal ArticleDOI
TL;DR: In this article , a procedure for two coders based on discrete cosine transform (DCT) is proposed, which is based on a prediction of mean square errors for a given quantization step using a simple analysis of image complexity.
Abstract: Since the number of acquired images and their size have the tendency to increase, their lossy compression is widely applied for their storage, transfer, and dissemination. Simultaneously with providing a relatively large compression ratio, lossy compression produces distortions that are inevitably introduced and have to be controlled. The properties of these distortions depend on several factors such as image properties, the coder used, and a parameter that controls compression, which is different for particular coders. Then, one has to set a parameter that controls compression individually for an image to be compressed to provide image quality appropriate for a given application, and it is often desirable to do this quickly. Iterative procedures are usually not fast enough, and therefore fast and accurate procedures for providing a desired quality are needed. In the paper, such a procedure for two coders based on discrete cosine transform is proposed. This procedure is based on a prediction of mean square errors for a given quantization step using a simple analysis of image complexity (local activity in blocks). The statistical and spatial–spectral characteristics of distortions introduced by DCT-based coders are analyzed, and it is shown that they depend on the quantization step and local content. Generalizing the data for sets of grayscale test images and quantization step values, it is shown that the MSE can be easily predicted. These predictions are accurate enough and can be used to set the quantization step properly, as verified by experiments performed using more than 300 remote sensing and conventional optical images. The proposed approach is applicable to the lossy compression of grayscale images and the component-wise compression of multichannel data.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors designed a 3D trained wavelet-like transform to enable signal-dependent and non-separable transform, and an affine wavelet basis is introduced to capture the various local correlations in different regions of volumetric images.
Abstract: Volumetric image compression has become an urgent task to effectively transmit and store images produced in biological research and clinical practice. At present, the most commonly used volumetric image compression methods are based on wavelet transform, such as JP3D. However, JP3D employs an ideal, separable, global, and fixed wavelet basis to convert input images from pixel domain to frequency domain, which seriously limits its performance. In this paper, we first design a 3-D trained wavelet-like transform to enable signal-dependent and non-separable transform. Then, an affine wavelet basis is introduced to capture the various local correlations in different regions of volumetric images. Furthermore, we embed the proposed wavelet-like transform to an end-to-end compression framework called aiWave to enable an adaptive compression scheme for various datasets. Last but not least, we introduce the weight sharing strategies of the affine wavelet-like transform according to the volumetric data characteristics in the axial direction to reduce the number of parameters. The experimental results show that: 1) when cooperating our trained 3-D affine wavelet-like transform with a simple factorized entropy coding module, aiWave performs better than JP3D and is comparable in terms of encoding and decoding complexities; 2) when adding a context module to remove signal redundancy further, aiWave can achieve a much better performance than HEVC.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an end-to-end robust data hiding scheme for JPEG images, where the embedding and extracting secret messages on the quantized discrete cosine transform (DCT) coefficients are implemented by the bi-directional process of the invertible neural network (INN).

Journal ArticleDOI
TL;DR: In this article , the use of aggressive compression algorithms is discussed to cut the wasted transmission and resources for selected land cover classification problems, where the satellite image patches are compressed by two methods.
Abstract: In the last decades, the domain of spatial computing became more and more data driven, especially when using remote sensing-based images. Furthermore, the satellites provide huge amounts of images, so the number of available datasets is increasing. This leads to the need for large storage requirements and high computational costs when estimating the label scene classification problem using deep learning. This consumes and blocks important hardware recourses, energy, and time. In this article, the use of aggressive compression algorithms will be discussed to cut the wasted transmission and resources for selected land cover classification problems. To compare the different compression methods and the classification performance, the satellite image patches are compressed by two methods. The first method is the image quantization of the data to reduce the bit depth. Second is the lossy and lossless compression of images with the use of image file formats, such as JPEG and TIFF. The performance of the classification is evaluated with the use of convolutional neural networks (CNNs) like VGG16. The experiments indicated that not all remote sensing image classification problems improve their performance when taking the full available information into account. Moreover, compression can set the focus on specific image features, leading to fewer storage needs and a reduction in computing time with comparably small costs in terms of quality and accuracy. All in all, quantization and embedding into file formats do support CNNs to estimate the labels of images, by strengthening the features.

Journal ArticleDOI
TL;DR: In this paper , a deep learning-based super-resolution model is proposed to recover high-quality decompressed images at the application server level using a DNN-based model.
Abstract: Multimedia Internet of Things (MIoT) devices and networks will face many power and communication overhead constraints given the volume of multimedia sensed data. One classic approach to overcoming the difficulty of large-scale data is to use lossy compression. However, current lossy compression algorithms require a limited compression rate to maintain acceptable perceived image quality. This is commonly referred to as the image quality-compression ratio trade-off. Motivated by current breakthroughs in computer vision, this article proposes recovering high-quality decompressed images at the application server level using a deep learning-based super-resolution model. As a result, this paper proposes ignoring the trade-off between image quality and size and increasing the reduction size further by using a lossy compressor with downscaling to conserve energy. The experimental study demonstrates that the proposed technique effectively improves the visual quality of compressed and downscaled images. The proposed solution was evaluated on resource-constrained microcontrollers. The obtained results show that the transmission latency and energy consumption can be decreased by up to 10% compared to conventional lossy compression techniques.



Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a self-attention approach for image compression by integrating graph attention and asymmetric convolutional neural network (ACNN) to enhance the effect of local key features and reduce the cost of model training.
Abstract: Recent deep image compression methods have achieved prominent progress by using nonlinear modeling and powerful representation capabilities of neural networks. However, most existing learning-based image compression approaches employ customized convolutional neural network (CNN) to utilize visual features by treating all pixels equally, neglecting the effect of local key features. Meanwhile, the convolutional filters in CNN usually express the local spatial relationship within the receptive field and seldom consider the long-range dependencies from distant locations. This results in the long-range dependencies of latent representations not being fully compressed. To address these issues, an end-to-end image compression method is proposed by integrating graph attention and asymmetric convolutional neural network (ACNN). Specifically, ACNN is used to strengthen the effect of local key features and reduce the cost of model training. Graph attention is introduced into image compression to address the bottleneck problem of CNN in modeling long-range dependencies. Meanwhile, regarding the limitation that existing attention mechanisms for image compression hardly share information, we propose a self-attention approach which allows information flow to achieve reasonable bit allocation. The proposed self-attention approach is in compliance with the perceptual characteristics of human visual system, as information can interact with each other via attention modules. Moreover, the proposed self-attention approach takes into account channel-level relationship and positional information to promote the compression effect of rich-texture regions. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performances after being optimized by MS-SSIM compared to recent deep compression models on the benchmark datasets of Kodak and Tecnick. The project page with the source code can be found in https://mic.tongji.edu.cn .

Journal ArticleDOI
TL;DR: In this paper , a framework combining image compression techniques with DIC methods is proposed, which effectively reduces the memory footprint of images while preserving the primary speckle features with subpixel errors.

Journal ArticleDOI
TL;DR: In this article , an autoencoder-style network-based efficient image compression method was introduced, which contains three novel blocks, i.e., adjacent attention block, Gaussian merge block, and decoded image refinement block, to improve the overall image compression performance.
Abstract: Recently, learned image compression algorithms have shown incredible performance compared to classic hand-crafted image codecs. Despite its considerable achievements, the fundamental disadvantage is not optimized for retaining local redundancies, particularly non-repetitive patterns, which have a detrimental influence on the reconstruction quality. This paper introduces the autoencoder-style network-based efficient image compression method, which contains three novel blocks, i.e., adjacent attention block, Gaussian merge block, and decoded image refinement block, to improve the overall image compression performance. The adjacent attention block allocates the additional bits required to capture spatial correlations (both vertical and horizontal) and effectively remove worthless information. The Gaussian merge block assists the rate-distortion optimization performance, while the decoded image refinement block improves the defects in low-resolution reconstructed images. A comprehensive ablation study analyzes and evaluates the qualitative and quantitative capabilities of the proposed model. Experimental results on two publicly available datasets reveal that our method outperforms the state-of-the-art methods on the KODAK dataset (by around 4dB and 5dB) and CLIC dataset (by about 4dB and 3dB) in terms of PSNR and MS-SSIM.

Journal ArticleDOI
TL;DR: In this article , a long-range convolution compression network (LRCompNet) was proposed for remote sensing images, and an improved non-local attention model was proposed to reduce the computation complexity in order to accommodate remote sensing image compression.

Journal ArticleDOI
TL;DR: In this article , a novel compression method based on partial differential equations complemented by block sorting and symbol prediction is presented, which is compared with the current standards, JPEG and JPEG 2000.
Abstract: In this paper, we present a novel compression method based on partial differential equations complemented by block sorting and symbol prediction. Block sorting is performed using the Burrows–Wheeler transform, while symbol prediction is performed using the context mixing method. With these transformations, the range coder is used as a lossless compression method. The objective and subjective quality evaluation of the reconstructed image illustrates the efficiency of this new compression method and is compared with the current standards, JPEG and JPEG 2000.

Journal ArticleDOI
TL;DR: Stable Diffusion as discussed by the authors is a text-to-image technology that can enable billions of individuals to produce beautiful works of art in in a few seconds, and is an improvement in both performance and quality.
Abstract: A text-to-image technology called Stable Diffusion will enable billions of individuals to produce beautiful works of art in in a few seconds. It can run on GPUs available in consumer gadgets and is an improvement in both performance and quality. It will advance democratic. image generation by enabling both researchers and the general public to use it under various conditions. We anticipate the open ecosystem that will grow up around it as well as new models to really probe the limits of latent space. Keywords— laion5B, Perceptual Image Compression,Latent Diffusion Models, Conditioning Mechanisms , Text to image generation, Image Modification

Journal ArticleDOI
23 Mar 2023-Sensors
TL;DR: In this article , the authors evaluated the effects of JPEG compression on image classification using the Vision Transformer (ViT) and showed that the classification accuracy can be maintained at over 98% with a more than 90% reduction in the amount of image data.
Abstract: This paper evaluates the effects of JPEG compression on image classification using the Vision Transformer (ViT). In recent years, many studies have been carried out to classify images in the encrypted domain for privacy preservation. Previously, the authors proposed an image classification method that encrypts both a trained ViT model and test images. Here, an encryption-then-compression system was employed to encrypt the test images, and the ViT model was preliminarily trained by plain images. The classification accuracy in the previous method was exactly equal to that without any encryption for the trained ViT model and test images. However, even though the encrypted test images can be compressible, the practical effects of JPEG, which is a typical lossy compression method, have not been investigated so far. In this paper, we extend our previous method by compressing the encrypted test images with JPEG and verify the classification accuracy for the compressed encrypted-images. Through our experiments, we confirm that the amount of data in the encrypted images can be significantly reduced by JPEG compression, while the classification accuracy of the compressed encrypted-images is highly preserved. For example, when the quality factor is set to 85, this paper shows that the classification accuracy can be maintained at over 98% with a more than 90% reduction in the amount of image data. Additionally, the effectiveness of JPEG compression is demonstrated through comparison with linear quantization. To the best of our knowledge, this is the first study to classify JPEG-compressed encrypted images without sacrificing high accuracy. Through our study, we have come to the conclusion that we can classify compressed encrypted-images without degradation to accuracy.

Proceedings ArticleDOI
04 Jun 2023
TL;DR: In this article , an end-to-end trainable neural network for joint compression and demosaicing of satellite images is proposed, combining a perceptual loss with the classical mean square error, which is shown to better preserve the highfrequency details present in satellite images.
Abstract: Image sensors used in real camera systems are equipped with colour filter arrays which sample the light rays in different spectral bands. Each colour channel can thus be obtained sep-arately by considering the corresponding colour filter. While existing compression solutions mostly assume that the captured raw data has been demosaicked prior to compression, in this paper, we describe an end-to-end trainable neural network for joint compression and demosaicking of satellite images. We first introduce a training loss combining a perceptual loss with the classical mean square error, which is shown to better preserve the high-frequency details present in satellite images. We then present a multi-loss balancing strategy which significantly improves the performance of the proposed joint demosaicking-compression solution.

Journal ArticleDOI
01 Jan 2023
TL;DR: In this paper , a hybrid coding framework for the lossless recompression of JPEG images (LLJPEG) using transform domain intra prediction is proposed, including block partition and intraprediction, transform and quantization, and entropy coding.
Abstract: JPEG, which was developed 30 years ago, is the most widely used image coding format, especially favored by the resource-deficient devices, due to its simplicity and efficiency. With the evolution of the Internet and the popularity of mobile devices, a huge amount of user-generated JPEG images are uploaded to social media sites like Facebook and Flickr or stored in personal computers or notebooks, which leads to an increase in storage cost. However, the performance of JPEG is far from the-state-of-the art coding methods. Therefore, the lossless recompression of JPEG images is urgent to be studied, which will further reduce the storage cost while maintaining the image fidelity. In this paper, a hybrid coding framework for the lossless recompression of JPEG images (LLJPEG) using transform domain intra prediction is proposed, including block partition and intraprediction, transform and quantization, and entropy coding. Specifically, in LLJPEG, intra prediction is first used to obtain a predicted block. Then the predicted block is transformed by DCT and then quantized to obtain the predicted coefficients. After that, the predicted coefficients are subtracted from the original coefficients to get the DCT coefficient residuals. Finally, the DCT residuals are entropy coded. In LLJPEG, some new coding tools are proposed for intra prediction and the entropy coding is redesigned. The experiments show that LLJPEG can reduce the storage space by 29.43% and 26.40% on the Kodak and DIV2K datasets respectively without any loss for JPEG images, while maintaining low decoding complexity.

Journal ArticleDOI
TL;DR: In this paper , the robustness of learned image compression models is examined by injecting negligible adversarial perturbation into the original source image, which reveals the general vulnerability in existing methods regardless of their settings (e.g., network architecture, loss function, quality scale).
Abstract: Deep neural network-based image compression has been extensively studied. However, the model robustness which is crucial to practical application is largely overlooked. We propose to examine the robustness of prevailing learned image compression models by injecting negligible adversarial perturbation into the original source image. Severe distortion in decoded reconstruction reveals the general vulnerability in existing methods regardless of their settings (e.g., network architecture, loss function, quality scale). A variety of defense strategies including geometric self-ensemble based pre-processing, and adversarial training, are investigated against the adversarial attack to improve the model's robustness. Later the defense efficiency is further exemplified in real-life image recompression case studies. Overall, our methodology is simple, effective, and generalizable, making it attractive for developing robust learned image compression solutions. All materials are made publicly accessible at https://njuvision.github.io/RobustNIC for reproducible research.

Posted ContentDOI
12 Apr 2023
TL;DR: Zhang et al. as discussed by the authors analyzed the performance of various types of neural networks in image compression tasks, and summarized the advantages and disadvantages of different types of CNNs in compression tasks.
Abstract: Abstract Images can carry more information than words, but the data space of images format is much larger than the text format when they are containing the same information. Therefore, how to efficiently compress images to improve their storability and transmissibility is one of the key research issues in the field of computer vision. Through consulting the relevant literature, this paper analyzes the development process of the current image compression technology, and introduces traditional compression methods and deep learning compression methods, while focusing on the compression methods based on deep learning. Through comparative experiments, this paper analyzes the performance of various types of neural networks in image compression tasks, and summarizes the advantages and disadvantages of various types of neural networks in compression tasks.

Proceedings ArticleDOI
04 Jun 2023
TL;DR: Li et al. as discussed by the authors proposed an end-to-end mutual compression framework for the image coding for machines (ICM) such that the compression efficiency can be significantly improved by removing the cross-scale redundancy.
Abstract: Recently, image coding for machines (ICM) has been playing an important role in facilitating intelligent vision tasks. Unfortunately, the existing ICM methods separately compress features at each scale, neglecting the redundancy across multi-scale features. To address this issue, this paper proposes an end-to-end mutual compression framework for the ICM, such that the compression efficiency can be significantly improved by removing the cross-scale redundancy. Specifically, the proposed framework consists of a mutual feature compression network (MFCNet) and a basic feature compression network (BFCNet). The MFCNet predicts large-scale features from basic small-scale features, such that the large amount of bitrates assigned to compress large-scale features can be saved. Moreover, the BFCNet is proposed to compress small-scale features of high quality by removing spatial and channel-wise redundancy. This guarantees superior performances whilst consuming extremely small amount of bit-rates. The experimental results show that our method achieves 90.10% and 74.97% BD-rate saving against the VVC feature anchor and VVC image anchor that have been recently accepted by the moving picture experts group (MPEG).

Journal ArticleDOI
TL;DR: In this paper , a tensor product mixed transform (TPMM) is proposed for image compression and quality metrics such as compression ratio (CR), rate-distortion (RD), peak signal-to-noise ratio (PSNR), and structural content (SC) are utilized for evaluating the hybrid techniques.
Abstract: Image quality plays a vital role in improving and assessing image compression performance. Image compression represents big image data to a new image with a smaller size suitable for storage and transmission. This paper aims to evaluate the implementation of the hybrid techniques-based tensor product mixed transform. Compression and quality metrics such as compression-ratio (CR), rate-distortion (RD), peak signal-to-noise ratio (PSNR), and Structural Content (SC) are utilized for evaluating the hybrid techniques. Then, a comparison between techniques is achieved according to these metrics to estimate the best technique. The main contribution is to improve the hybrid techniques. The proposed hybrid techniques are consisting of discrete wavelet transform (W), multi-wavelet transform (M), and tensor product mixed transform (T) as 1-level W, M, and T techniques. WT and MT are the 2-level techniques, while WWT, WMT, MWT, and MMT are the 3-level techniques. For each level of each technique, a reconstructed process is applied. The simulation results using MATLAB 2019a indicated that the MMT is the best technique with CR=1024, R(D)=4.154, and PSNR=81.9085. Also, it is faster than the other techniques in the previous works as compared with them. Further research might investigate whether this technique can benefit image and speech recognition.