scispace - formally typeset
Search or ask a question

Showing papers on "Data compression published in 2014"


Patent
01 Dec 2014
TL;DR: In this article, a system and method for data storage by shredding and deshredding of the data allows for various combinations of processing of data to provide various resultant storage of data.
Abstract: A system and method for data storage by shredding and deshredding of the data allows for various combinations of processing of the data to provide various resultant storage of the data. Data storage and retrieval functions include various combinations of data redundancy generation, data compression and decompression, data encryption and decryption, and data integrity by signature generation and verification. Data shredding is performed by shredders and data deshredding is performed by deshredders that have some implementations that allocate processing internally in the shredder and deshredder either in parallel to multiple processors or sequentially to a single processor. Other implementations use multiple processing through multi-level shredders and deshredders. Redundancy generation includes implementations using non-systematic encoding, systematic encoding, or a hybrid combination. Shredder based tag generators and deshredder based tag readers are used in some implementations to allow the deshredders to adapt to various versions of the shredders.

901 citations


Book
26 Jan 2014
TL;DR: An introduction to the algorithms and architectures that form the underpinnings of the image and video compressions standards, including JPEG, H.261 and H.263, while fully addressing the architecturalconsiderations involved when implementing these standards.
Abstract: From the Publisher: Image and Video Compression Standards: Algorithms and Architectures, Second Edition presents an introduction to thealgorithms and architectures that form the underpinnings of the imageand video compressions standards, including JPEG (compression ofstill-images), H.261 and H.263 (video teleconferencing), and MPEG-1and MPEG-2 (video storage and broadcasting). The next generation ofaudiovisual coding standards, such as MPEG-4 and MPEG-7, are alsobriefly described. In addition, the book covers the MPEG and DolbyAC-3 audio coding standards and emerging techniques for image andvideo compression, such as those based on wavelets and vectorquantization. Image and Video Compression Standards: Algorithms andArchitectures, Second Edition emphasizes the foundations ofthese standards; namely, techniques such as predictive coding,transform-based coding such as the discrete cosine transform (DCT),motion estimation, motion compensation, and entropy coding, as well ashow they are applied in the standards. The implementation details ofeach standard are avoided; however, the book provides all the materialnecessary to understand the workings of each of the compressionstandards, including information that can be used by the reader toevaluate the efficiency of various software and hardwareimplementations conforming to these standards. Particular emphasis isplaced on those algorithms and architectures that have been found tobe useful in practical software or hardware implementations. Image and Video Compression Standards: Algorithms andArchitectures, Second Edition uniquely covers all majorstandards (JPEG, MPEG-1, MPEG-2, MPEG-4, H.261, H.263) in asimple andtutorial manner, while fully addressing the architecturalconsiderations involved when implementing these standards. As such, itserves as a valuable reference for the graduate student, researcher orengineer. The book is also used frequently as a text for courses onthe subject, in both academic and professional settings.

726 citations


Journal ArticleDOI
TL;DR: This article considers product graphs as a graph model that helps extend the application of DSPG methods to large data sets through efficient implementation based on parallelization and vectorization and relates the presented framework to existing methods for large-scale data processing.
Abstract: Analysis and processing of very large data sets, or big data, poses a significant challenge. Massive data sets are collected and studied in numerous domains, from engineering sciences to social networks, biomolecular research, commerce, and security. Extracting valuable information from big data requires innovative approaches that efficiently process large amounts of data as well as handle and, moreover, utilize their structure. This article discusses a paradigm for large-scale data analysis based on the discrete signal processing (DSP) on graphs (DSPG). DSPG extends signal processing concepts and methodologies from the classical signal processing theory to data indexed by general graphs. Big data analysis presents several challenges to DSPG, in particular, in filtering and frequency analysis of very large data sets. We review fundamental concepts of DSPG, including graph signals and graph filters, graph Fourier transform, graph frequency, and spectrum ordering, and compare them with their counterparts from the classical signal processing theory. We then consider product graphs as a graph model that helps extend the application of DSPG methods to large data sets through efficient implementation based on parallelization and vectorization. We relate the presented framework to existing methods for large-scale data processing and illustrate it with an application to data compression.

713 citations


Journal ArticleDOI
TL;DR: A fixed-rate, near-lossless compression scheme that maps small blocks of 4d values in d dimensions to a fixed, user-specified number of bits per block, thereby allowing read and write random access to compressed floating-point data at block granularity.
Abstract: Current compression schemes for floating-point data commonly take fixed-precision values and compress them to a variable-length bit stream, complicating memory management and random access. We present a fixed-rate, near-lossless compression scheme that maps small blocks of 4(d) values in d dimensions to a fixed, user-specified number of bits per block, thereby allowing read and write random access to compressed floating-point data at block granularity. Our approach is inspired by fixed-rate texture compression methods widely adopted in graphics hardware, but has been tailored to the high dynamic range and precision demands of scientific applications. Our compressor is based on a new, lifted, orthogonal block transform and embedded coding, allowing each per-block bit stream to be truncated at any point if desired, thus facilitating bit rate selection using a single compression scheme. To avoid compression or decompression upon every data access, we employ a software write-back cache of uncompressed blocks. Our compressor has been designed with computational simplicity and speed in mind to allow for the possibility of a hardware implementation, and uses only a small number of fixed-point arithmetic operations per compressed value. We demonstrate the viability and benefits of lossy compression in several applications, including visualization, quantitative data analysis, and numerical simulation.

449 citations


Book
24 Aug 2014
TL;DR: This book provides a detailed explanation of the various parts of the HEVC standard, insight into how it was developed, and in-depth discussion of algorithms and architectures for its implementation.
Abstract: This book provides developers, engineers, researchers and students with detailed knowledge about the High Efficiency Video Coding (HEVC) standard. HEVC is the successor to the widely successful H.264/AVC video compression standard, and it provides around twice as much compression as H.264/AVC for the same level of quality. The applications for HEVC will not only cover the space of the well-known current uses and capabilities of digital video they will also include the deployment of new services and the delivery of enhanced video quality, such as ultra-high-definition television (UHDTV) and video with higher dynamic range, wider range of representable color, and greater representation precision than what is typically found today. HEVC is the next major generation of video coding design a flexible, reliable and robust solution that will support the next decade of video applications and ease the burden of video on world-wide network traffic. This book provides a detailed explanation of the various parts of the standard, insight into how it was developed, and in-depth discussion of algorithms and architectures for its implementation.

356 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed saliency-aware video compression method is able to improve visual quality of encoded video relative to conventional rate distortion optimized video coding, as well as two state-of-the art perceptual video coding methods.
Abstract: In region-of-interest (ROI)-based video coding, ROI parts of the frame are encoded with higher quality than non-ROI parts. At low bit rates, such encoding may produce attention-grabbing coding artifacts, which may draw viewer's attention away from ROI, thereby degrading visual quality. In this paper, we present a saliency-aware video compression method for ROI-based video coding. The proposed method aims at reducing salient coding artifacts in non-ROI parts of the frame in order to keep user's attention on ROI. Further, the method allows saliency to increase in high quality parts of the frame, and allows saliency to reduce in non-ROI parts. Experimental results indicate that the proposed method is able to improve visual quality of encoded video relative to conventional rate distortion optimized video coding, as well as two state-of-the art perceptual video coding methods.

307 citations


Journal ArticleDOI
TL;DR: By compressing the size of the dictionary in the time domain, this work is able to speed up the pattern recognition algorithm, by a factor of between 3.4-4.8, without sacrificing the high signal-to-noise ratio of the original scheme presented previously.
Abstract: Magnetic resonance (MR) fingerprinting is a technique for acquiring and processing MR data that simultaneously provides quantitative maps of different tissue parameters through a pattern recognition algorithm. A predefined dictionary models the possible signal evolutions simulated using the Bloch equations with different combinations of various MR parameters and pattern recognition is completed by computing the inner product between the observed signal and each of the predicted signals within the dictionary. Though this matching algorithm has been shown to accurately predict the MR parameters of interest, one desires a more efficient method to obtain the quantitative images. We propose to compress the dictionary using the singular value decomposition, which will provide a low-rank approximation. By compressing the size of the dictionary in the time domain, we are able to speed up the pattern recognition algorithm, by a factor of between 3.4-4.8, without sacrificing the high signal-to-noise ratio of the original scheme presented previously.

253 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work develops highly efficient video features using motion information in video compression and improves the speed of video feature extraction, feature encoding and action classification by two orders of magnitude at the cost of minor reduction in recognition accuracy.
Abstract: Local video features provide state-of-the-art performance for action recognition. While the accuracy of action recognition has been continuously improved over the recent years, the low speed of feature extraction and subsequent recognition prevents current methods from scaling up to real-size problems. We address this issue and first develop highly efficient video features using motion information in video compression. We next explore feature encoding by Fisher vectors and demonstrate accurate action recognition using fast linear classifiers. Our method improves the speed of video feature extraction, feature encoding and action classification by two orders of magnitude at the cost of minor reduction in recognition accuracy. We validate our approach and compare it to the state of the art on four recent action recognition datasets.

246 citations


Journal ArticleDOI
TL;DR: A novel video saliency detection model based on feature contrast in compressed domain is proposed that can predict the salient regions efficiently for video frames and shows superior performance on a public database.
Abstract: Saliency detection is widely used to extract regions of interest in images for various image processing applications. Recently, many saliency detection models have been proposed for video in uncompressed (pixel) domain. However, video over Internet is always stored in compressed domains, such as MPEG2, H.264, and MPEG4 Visual. In this paper, we propose a novel video saliency detection model based on feature contrast in compressed domain. Four types of features including luminance, color, texture, and motion are extracted from the discrete cosine transform coefficients and motion vectors in video bitstream. The static saliency map of unpredicted frames (I frames) is calculated on the basis of luminance, color, and texture features, while the motion saliency map of predicted frames (P and B frames) is computed by motion feature. A new fusion method is designed to combine the static saliency and motion saliency maps to get the final saliency map for each video frame. Due to the directly derived features in compressed domain, the proposed model can predict the salient regions efficiently for video frames. Experimental results on a public database show superior performance of the proposed video saliency detection model in compressed domain.

217 citations


Journal ArticleDOI
TL;DR: A low-storage method for performing dynamic mode decomposition that can be updated inexpensively as new data become available and introduces a compression step that maintains computational efficiency, while enhancing the ability to isolate pertinent dynamical information from noisy measurements.
Abstract: We formulate a low-storage method for performing dynamic mode decomposition that can be updated inexpensively as new data become available; this formulation allows dynamical information to be extracted from large datasets and data streams. We present two algorithms: the first is mathematically equivalent to a standard “batch-processed” formulation; the second introduces a compression step that maintains computational efficiency, while enhancing the ability to isolate pertinent dynamical information from noisy measurements. Both algorithms reliably capture dominant fluid dynamic behaviors, as demonstrated on cylinder wake data collected from both direct numerical simulations and particle image velocimetry experiments.

211 citations


Journal ArticleDOI
TL;DR: The binocular integration behaviors-the binocular combination and the binocular frequency integration, are utilized as the bases for measuring the quality of stereoscopic 3D images and it is found that the proposed metrics can also address the quality assessment of the synthesized color-plus-depth3D images well.
Abstract: The objective approaches of 3D image quality assessment play a key role for the development of compression standards and various 3D multimedia applications. The quality assessment of 3D images faces more new challenges, such as asymmetric stereo compression, depth perception, and virtual view synthesis, than its 2D counterparts. In addition, the widely used 2D image quality metrics (e.g., PSNR and SSIM) cannot be directly applied to deal with these newly introduced challenges. This statement can be verified by the low correlation between the computed objective measures and the subjectively measured mean opinion scores (MOSs), when 3D images are the tested targets. In order to meet these newly introduced challenges, in this paper, besides traditional 2D image metrics, the binocular integration behaviors-the binocular combination and the binocular frequency integration, are utilized as the bases for measuring the quality of stereoscopic 3D images. The effectiveness of the proposed metrics is verified by conducting subjective evaluations on publicly available stereoscopic image databases. Experimental results show that significant consistency could be reached between the measured MOS and the proposed metrics, in which the correlation coefficient between them can go up to 0.88. Furthermore, we found that the proposed metrics can also address the quality assessment of the synthesized color-plus-depth 3D images well. Therefore, it is our belief that the binocular integration behaviors are important factors in the development of objective quality assessment for 3D images.

Journal ArticleDOI
TL;DR: This work presents a VQA algorithm that estimates quality via separate estimates of perceived degradation due to spatial distortion and joint spatial and temporal distortion, and demonstrates that this algorithm performs well in predicting video quality and is competitive with current state-of-the-art V QA algorithms.
Abstract: Algorithms for video quality assessment (VQA) aim to estimate the qualities of videos in a manner that agrees with human judgments of quality. Modern VQA algorithms often estimate video quality by comparing localized space-time regions or groups of frames from the reference and distorted videos, using comparisons based on visual features, statistics, and/or perceptual models. We present a VQA algorithm that estimates quality via separate estimates of perceived degradation due to (1) spatial distortion and (2) joint spatial and temporal distortion. The first stage of the algorithm estimates perceived quality degradation due to spatial distortion; this stage operates by adaptively applying to groups of spatial video frames the two strategies from the most apparent distortion algorithm with an extension to account for temporal masking. The second stage of the algorithm estimates perceived quality degradation due to joint spatial and temporal distortion; this stage operates by measuring the dissimilarity between the reference and distorted videos represented in terms of two-dimensional spatiotemporal slices. Finally, the estimates obtained from the two stages are combined to yield an overall estimate of perceived quality degradation. Testing on various video-quality databases demonstrates that our algorithm performs well in predicting video quality and is competitive with current state-of-the-art VQA algorithms.

Journal ArticleDOI
TL;DR: A novel artifact reducing approach for the JPEG decompression is proposed via sparse and redundant representations over a learned dictionary, and an effective two-step algorithm is developed that outperforms the total variation and weighted total variation decompression methods.
Abstract: The JPEG compression method is among the most successful compression schemes since it readily provides good compressed results at a rather high compression ratio. However, the decompressed result of the standard JPEG decompression scheme usually contains some visible artifacts, such as blocking artifacts and Gibbs artifacts (ringing), especially when the compression ratio is rather high. In this paper, a novel artifact reducing approach for the JPEG decompression is proposed via sparse and redundant representations over a learned dictionary. Indeed, an effective two-step algorithm is developed. The first step involves dictionary learning and the second step involves the total variation regularization for decompressed images. Numerical experiments are performed to demonstrate that the proposed method outperforms the total variation and weighted total variation decompression methods in the measure of peak of signal to noise ratio, and structural similarity.

Journal ArticleDOI
TL;DR: A highly efficient image encryption-then-compression (ETC) system, where both lossless and lossy compression are considered, and the proposed image encryption scheme operated in the prediction error domain is shown to be able to provide a reasonably high level of security.
Abstract: In many practical scenarios, image encryption has to be conducted prior to image compression. This has led to the problem of how to design a pair of image encryption and compression algorithms such that compressing the encrypted images can still be efficiently performed. In this paper, we design a highly efficient image encryption-then-compression (ETC) system, where both lossless and lossy compression are considered. The proposed image encryption scheme operated in the prediction error domain is shown to be able to provide a reasonably high level of security. We also demonstrate that an arithmetic coding-based approach can be exploited to efficiently compress the encrypted images. More notably, the proposed compression approach applied to encrypted images is only slightly worse, in terms of compression efficiency, than the state-of-the-art lossless/lossy image coders, which take original, unencrypted images as inputs. In contrast, most of the existing ETC solutions induce significant penalty on the compression efficiency.

Patent
11 Apr 2014
TL;DR: In this article, a method for compressing data comprises the steps of: analyzing a data block of an input data stream to identify a data type of the data block, the input dataset consisting of a plurality of disparate data types; performing content dependent data compression on the block; and performing content independent data compression if the data type is not identified.
Abstract: Systems and methods for providing fast and efficient data compression using a combination of content independent data compression and content dependent data compression. In one aspect, a method for compressing data comprises the steps of: analyzing a data block of an input data stream to identify a data type of the data block, the input data stream comprising a plurality of disparate data types; performing content dependent data compression on the data block, if the data type of the data block is identified; performing content independent data compression on the data block, if the data type of the data block is not identified.

Journal ArticleDOI
TL;DR: Experimental result shows that the proposed scheme significantly outperforms the previous approaches on reversible data hiding in encrypted images based on lossless compression of encrypted data.

Journal ArticleDOI
TL;DR: In this paper, information hiding methods in the H.264/AVC compressed video domain are surveyed and perspectives and recommendations are presented to provide a better understanding of the current trend of information hiding and to identify new opportunities for information hiding in compressed video.
Abstract: Information hiding refers to the process of inserting information into a host to serve specific purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video domain are surveyed. First, the general framework of information hiding is conceptualized by relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by using various data representation schemes such as bit plane replacement, spread spectrum, histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which information hiding takes place are then identified, including prediction process, transformation, quantization, and entropy coding. Related information hiding methods at each venue are briefly reviewed, along with the presentation of the targeted applications, appropriate diagrams, and references. A timeline diagram is constructed to chronologically summarize the invention of information hiding methods in the compressed still image and video domains since 1992. A comparison among the considered information hiding methods is also conducted in terms of venue, payload, bitstream size overhead, video quality, computational complexity, and video criteria. Further perspectives and recommendations are presented to provide a better understanding of the current trend of information hiding and to identify new opportunities for information hiding in compressed video.

Journal ArticleDOI
08 May 2014-PLOS ONE
TL;DR: In this article, the authors present a numerical approach to the problem of approximating the Kolmogorov-Chaitin complexity of short strings, motivated by the notion of algorithmic probability.
Abstract: Drawing on various notions from theoretical computer science, we present a novel numerical approach, motivated by the notion of algorithmic probability, to the problem of approximating the Kolmogorov-Chaitin complexity of short strings. The method is an alternative to the traditional lossless compression algorithms, which it may complement, the two being serviceable for different string lengths. We provide a thorough analysis for all binary strings of length and for most strings of length by running all Turing machines with 5 states and 2 symbols ( with reduction techniques) using the most standard formalism of Turing machines, used in for example the Busy Beaver problem. We address the question of stability and error estimation, the sensitivity of the continued application of the method for wider coverage and better accuracy, and provide statistical evidence suggesting robustness. As with compression algorithms, this work promises to deliver a range of applications, and to provide insight into the question of complexity calculation of finite (and short) strings. Additional material can be found at the Algorithmic Nature Group website at http://www.algorithmicnature.org. An Online Algorithmic Complexity Calculator implementing this technique and making the data available to the research community is accessible at http://www.complexitycalculator.com.

Journal ArticleDOI
TL;DR: A new lossy image compression technique which uses singular value decomposition (SVD) and wavelet difference reduction (WDR) in order for the SVD compression to boost the performance of the WDR compression.

Journal ArticleDOI
TL;DR: A novel 8-point DCT approximation that requires only 14 addition operations and no multiplications is introduced and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio.
Abstract: Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology

Journal ArticleDOI
TL;DR: A prospective review of wavelet-based ECG compression methods and their performances based upon findings obtained from various experiments conducted using both clean and noisy ECG signals is presented.

Proceedings ArticleDOI
12 May 2014
TL;DR: This work uses the Open Computing Language (OpenCL) to implement high-speed data compression (Gzip) on a field-programmable gate-arrays (FPGA) to achieve the high throughput of 3 GB/s with more than 2x compression ratio over standard compression benchmarks.
Abstract: Hardware implementation of lossless data compression is important for optimizing the capacity/cost/power of storage devices in data centers, as well as communication channels in high-speed networks. In this work we use the Open Computing Language (OpenCL) to implement high-speed data compression (Gzip) on a field-programmable gate-arrays (FPGA). We show how we make use of a heavily-pipelined custom hardware implementation to achieve the high throughput of ~3 GB/s with more than 2x compression ratio over standard compression benchmarks. When compared against a highly-tuned CPU implementation, the performance-per-watt of our OpenCL FPGA implementation is 12x better and compression ratio is on-par. Additionally, we compare our implementation to a hand-coded commercial implementation of Gzip to quantify the gap between a high-level language like OpenCL, and a hardware description language like Verilog. OpenCL performance is 5.3% lower than Verilog, and area is 2% more logic and 25% more of the FPGA's available memory resources but the productivity gains are significant.

Journal ArticleDOI
TL;DR: A background-modeling-based adaptive prediction (BMAP) method that can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity.
Abstract: The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.

Journal ArticleDOI
TL;DR: This letter presents a no-reference quality assessment algorithm for JPEG compressed images (NJQA), which testing on various image-quality databases demonstrates that NJQA is either competitive with or outperforms modern competing methods on JPEG images.
Abstract: This letter presents a no-reference quality assessment algorithm for JPEG compressed images (NJQA). Our method does not specifically aim to measure blockiness. Instead, quality is estimated by first counting the number of zero-valued DCT coefficients within each block, and then using a map, which we call the quality relevance map, to weight these counts. The quality relevance map for an image is a map that indicates which blocks are naturally uniform (or near-uniform) vs. which blocks have been made uniform (or near-uniform) via JPEG compression. Testing on various image-quality databases demonstrates that NJQA is either competitive with or outperforms modern competing methods on JPEG images.

Journal ArticleDOI
TL;DR: This article carries out a performance evaluation of existing and new compression schemes, considering linear, autoregressive, FFT-/DCT- and wavelet-based models, by looking at their performance as a function of relevant signal statistics and results reveal that the DCT-based schemes are the best option in terms of compression efficiency but are inefficient in termsof energy consumption.
Abstract: Lossy temporal compression is key for energy-constrained wireless sensor networks (WSNs), where the imperfect reconstruction of the signal is often acceptable at the data collector, subject to some maximum error tolerance. In this article, we evaluate a number of selected lossy compression methods from the literature and extensively analyze their performance in terms of compression efficiency, computational complexity, and energy consumption. Specifically, we first carry out a performance evaluation of existing and new compression schemes, considering linear, autoregressive, FFT-/DCT- and wavelet-based models , by looking at their performance as a function of relevant signal statistics. Second, we obtain formulas through numerical fittings to gauge their overall energy consumption and signal representation accuracy. Third, we evaluate the benefits that lossy compression methods bring about in interference-limited multihop networks, where the channel access is a source of inefficiency due to collisions and transmission scheduling. Our results reveal that the DCT-based schemes are the best option in terms of compression efficiency but are inefficient in terms of energy consumption. Instead, linear methods lead to substantial savings in terms of energy expenditure by, at the same time, leading to satisfactory compression ratios, reduced network delay, and increased reliability performance.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: It is found that the diversity of the climate data requires the individual treatment of variables, and, in doing so, the reconstructed data can fall within the natural variability of the system, while achieving compression rates of up to 5:1.
Abstract: High-resolution climate simulations require tremendous computing resources and can generate massive datasets. At present, preserving the data from these simulations consumes vast storage resources at institutions such as the National Center for Atmospheric Research (NCAR). The historical data generation trends are economically unsustainable, and storage resources are already beginning to limit science objectives. To mitigate this problem, we investigate the use of data compression techniques on climate simulation data from the Community Earth System Model. Ultimately, to convince climate scientists to compress their simulation data, we must be able to demonstrate that the reconstructed data reveals the same mean climate as the original data, and this paper is a first step toward that goal. To that end, we develop an approach for verifying the climate data and use it to evaluate several compression algorithms. We find that the diversity of the climate data requires the individual treatment of variables, and, in doing so, the reconstructed data can fall within the natural variability of the system, while achieving compression rates of up to 5:1.

Journal ArticleDOI
TL;DR: MFCompress is described, specially designed for the compression of FASTA and multi-FASTA files, which can provide additional average compression gains of almost 50%, and potentially doubles the available storage, although at the cost of some more computation time.
Abstract: Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. Results: In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Availability: Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/. Contact: tp.au@pa Supplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: This article proposes GoKrimp, an algorithm that directly mines compressing patterns by greedily extending a pattern until no additional compression benefit of adding the extension into the dictionary, and proposes a dependency test which only chooses related events for extending a given pattern.
Abstract: Pattern mining based on data compression has been successfully applied in many data mining tasks. For itemset data, the Krimp algorithm based on the minimumdescription length MDL principle was shown to be very effective in solving the redundancy issue in descriptive pattern mining. However, for sequence data, the redundancy issue of the set of frequent sequential patterns is not fully addressed in the literature. In this article, we study MDL-based algorithms for mining non-redundant sets of sequential patterns from a sequence database. First, we propose an encoding scheme for compressing sequence data with sequential patterns. Second, we formulate the problem of mining the most compressing sequential patterns from a sequence database. We show that this problem is intractable and belongs to the class of inapproximable problems. Therefore, we propose two heuristic algorithms. The first of these uses a two-phase approach similar to Krimp for itemset data. To overcome performance issues in candidate generation, we also propose GoKrimp, an algorithm that directly mines compressing patterns by greedily extending a pattern until no additional compression benefit of adding the extension into the dictionary. Since checks for additional compression benefit of an extension are computationally expensive we propose a dependency test which only chooses related events for extending a given pattern. This technique improves the efficiency of the GoKrimp algorithm significantly while it still preserves the quality of the set of patterns. We conduct an empirical study on eight datasets to show the effectiveness of our approach in comparison to the state-of-the-art algorithms in terms of interpretability of the extracted patterns, run time, compression ratio, and classification accuracy using the discovered patterns as features for different classifiers. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013

Journal ArticleDOI
TL;DR: In this paper, a Bayesian compressive sensing (BCS) method is investigated that uses sparse Bayesian learning to reconstruct signals from a compressive sensor, which can achieve perfect loss-less compression performance with quite high compression ratio.
Abstract: In structural health monitoring (SHM) systems for civil structures, massive amounts of data are often generated that need data compression techniques to reduce the cost of signal transfer and storage, meanwhile offering a simple sensing system. Compressive sensing (CS) is a novel data acquisition method whereby the compression is done in a sensor simultaneously with the sampling. If the original sensed signal is sufficiently sparse in terms of some orthogonal basis (e.g., a sufficient number of wavelet coefficients are zero or negligibly small), the decompression can be done essentially perfectly up to some critical compression ratio; otherwise there is a trade-off between the reconstruction error and how much compression occurs. In this article, a Bayesian compressive sensing (BCS) method is investigated that uses sparse Bayesian learning to reconstruct signals from a compressive sensor. By explicitly quantifying the uncertainty in the reconstructed signal from compressed data, the BCS technique exhibits an obvious benefit over existing regularized norm-minimization CS methods that provide a single signal estimate. However, current BCS algorithms suffer from a robustness problem: sometimes the reconstruction errors are very large when the number of measurements K are a lot less than the number of signal degrees of freedom N that are needed to capture the signal accurately in a directly sampled form. In this article, we present improvements to the BCS reconstruction method to enhance its robustness so that even higher compression ratios N/K can be used and we examine the trade-off between efficiently compressing data and accurately decompressing it. Synthetic data and actual acceleration data collected from a bridge SHM system are used as examples. Compared with the state-of-the-art BCS reconstruction algorithms, the improved BCS algorithm demonstrates superior performance. With the same acceptable error rate based on a specified threshold of reconstruction error, the proposed BCS algorithm works with relatively large compression ratios and it can achieve perfect loss-less compression performance with quite high compression ratios. Furthermore, the error bars for the signal reconstruction are also quantified effectively.

Proceedings ArticleDOI
04 May 2014
TL;DR: The proposed method is applicable even when different codecs are used for the first and second compression, and performs well even when the second encoding is as strong as the first one.
Abstract: We propose a method for detecting insertion and deletion of whole frames in digital videos. We start by strengthening and extending a state of the art method for double encoding detection, and propose a system that is able to locate the point in time where frames have been deleted or inserted, discerning between the two cases. The proposed method is applicable even when different codecs are used for the first and second compression, and performs well even when the second encoding is as strong as the first one.