scispace - formally typeset
Search or ask a question

Showing papers on "Lossless compression published in 2002"


Proceedings ArticleDOI
10 Dec 2002
TL;DR: A prediction-based conditional entropy coder which utilizes static portions of the host as side-information improves the compression efficiency, and thus the lossless data embedding capacity.
Abstract: We present a novel reversible (lossless) data hiding (embedding) technique, which enables the exact recovery of the original host signal upon extraction of the embedded information. A generalization of the well-known LSB (least significant bit) modification is proposed as the data embedding method, which introduces additional operating points on the capacity-distortion curve. Lossless recovery of the original is achieved by compressing portions of the signal that are susceptible to embedding distortion, and transmitting these compressed descriptions as a part of the embedded payload. A prediction-based conditional entropy coder which utilizes static portions of the host as side-information improves the compression efficiency, and thus the lossless data embedding capacity.

1,126 citations


Journal ArticleDOI
TL;DR: This paper introduces a new paradigm for data embedding in images (lossless dataembedding) that has the property that the distortion due to embedding can be completely removed from the watermarked image after the embedded data has been extracted.
Abstract: One common drawback of virtually all current data embedding methods is the fact that the original image is inevitably distorted due to data embedding itself. This distortion typically cannot be removed completely due to quantization, bit-replacement, or truncation at the grayscales 0 and 255. Although the distortion is often quite small and perceptual models are used to minimize its visibility, the distortion may not be acceptable for medical imagery (for legal reasons) or for military images inspected under nonstandard viewing conditions (after enhancement or extreme zoom). In this paper, we introduce a new paradigm for data embedding in images (lossless data embedding) that has the property that the distortion due to embedding can be completely removed from the watermarked image after the embedded data has been extracted. We present lossless embedding methods for the uncompressed formats (BMP, TIFF) and for the JPEG format. We also show how the concept of lossless data embedding can be used as a powerful tool to achieve a variety of nontrivial tasks, including lossless authentication using fragile watermarks, steganalysis of LSB embedding, and distortion-free robust watermarking.

702 citations


Proceedings ArticleDOI
02 Apr 2002
TL;DR: It is shown that turbo codes can come close to the Slepian-Wolf bound in lossless distributed source coding in asymmetric scenario considered and the scheme also performs well for joint source-channel coding.
Abstract: We show that turbo codes can come close to the Slepian-Wolf bound in lossless distributed source coding. In the asymmetric scenario considered, X and Y are statistically dependent signals and X is encoded with no knowledge of Y. However, Y is known as side information at the decoder. We use a system based on turbo codes to send X at a rate close to H(X|Y). We apply our system to binary sequences and simulations show performance close to the information-theoretic limit. For distributed source coding of Gaussian sequences, our results show significant improvement over previous work. The scheme also performs well for joint source-channel coding.

455 citations


Proceedings ArticleDOI
TL;DR: In this article, the authors formulate two general methodologies for lossless embedding that can be applied to images as well as any other digital objects, including video, audio, and other structures with redundancy.
Abstract: Lossless data embedding has the property that the distortion due to embedding can be completely removed from the watermarked image without accessing any side channel. This can be a very important property whenever serious concerns over the image quality and artifacts visibility arise, such as for medical images, due to legal reasons, for military images or images used as evidence in court that may be viewed after enhancement and zooming. We formulate two general methodologies for lossless embedding that can be applied to images as well as any other digital objects, including video, audio, and other structures with redundancy. We use the general principles as guidelines for designing efficient, simple, and high-capacity lossless embedding methods for three most common image format paradigms - raw, uncompressed formats (BMP), lossy or transform formats (JPEG), and palette formats (GIF, PNG). We close the paper with examples of how the concept of lossless data embedding can be used as a powerful tool to achieve a variety of non-trivial tasks, including elegant lossless authentication using fragile watermarks. Note on terminology: some authors coined the terms erasable, removable, reversible, invertible, and distortion-free for the same concept.

338 citations


Journal ArticleDOI
TL;DR: This work quantifies the number of Fourier coefficients that can be removed from the hologram domain, and the lowest level of quantization achievable, without incurring significant loss in correlation performance or significant error in the reconstructed object domain.
Abstract: We present the results of applying lossless and lossy data compression to a three-dimensional object reconstruction and recognition technique based on phase-shift digital holography. We find that the best lossless (Lempel-Ziv, Lempel-Ziv-Welch, Huffman, Burrows-Wheeler) compression rates can be expected when the digital hologram is stored in an intermediate coding of separate data streams for real and imaginary components. The lossy techniques are based on subsampling, quantization, and discrete Fourier transformation. For various degrees of speckle reduction, we quantify the number of Fourier coefficients that can be removed from the hologram domain, and the lowest level of quantization achievable, without incurring significant loss in correlation performance or significant error in the reconstructed object domain.

240 citations


Journal ArticleDOI
TL;DR: By building and maintaining a dictionary of individual user's path updates, the proposed adaptive on-line algorithm can learn subscribers' profiles and compressibility of the variable-to-fixed length encoding of the acclaimed Lempel–Ziv family of algorithms reduces the update cost.
Abstract: The complexity of the mobility tracking problem in a cellular environment has been characterized under an information-theoretic framework. Shannon's entropy measure is identified as a basis for comparing user mobility models. By building and maintaining a dictionary of individual user's path updates (as opposed to the widely used location updates), the proposed adaptive on-line algorithm can learn subscribers' profiles. This technique evolves out of the concepts of lossless compression. The compressibility of the variable-to-fixed length encoding of the acclaimed Lempel---Ziv family of algorithms reduces the update cost, whereas their built-in predictive power can be effectively used to reduce paging cost.

227 citations


Patent
22 Aug 2002
TL;DR: In this paper, a reversible wavelet filter is used to generate coefficients from input data, such as image data, and an entropy coder performs entropy coding on the embedded codestream to produce the compressed data stream.
Abstract: A compression and decompression system in which a reversible wavelet filter are used to generates coefficients from input data, such as image data. The reversible wavelet filter is an efficient transform implemented with integer arithmetic that has exact reconstruction. The present invention uses the reversible wavelet filter in a lossless system (or lossy system) in which an embedded codestream is generated from the coefficients produced by the filter. An entropy coder performs entropy coding on the embedded codestream to produce the compressed data stream.

218 citations


Proceedings ArticleDOI
01 Jul 2002
TL;DR: The main contributions of this paper are the idea of using the kd-tree encoding of the geometry to drive the construction of a sequence of meshes, an improved coding of the edge expansion and vertex split since the vertices to split are implicitly defined, a prediction scheme which reduces the code for simplices incident to the split vertex, and a new generalization of theedge expansion operation to tetrahedral meshes.
Abstract: Efficient algorithms for compressing geometric data have been widely developed in the recent years, but they are mainly designed for closed polyhedral surfaces which are manifold or "nearly manifold". We propose here a progressive geometry compression scheme which can handle manifold models as well as "triangle soups" and 3D tetrahedral meshes. The method is lossless when the decompression is complete which is extremely important in some domains such as medical or finite element.While most existing methods enumerate the vertices of the mesh in an order depending on the connectivity, we use a kd-tree technique [Devillers and Gandoin 2000] which does not depend on the connectivity. Then we compute a compatible sequence of meshes which can be encoded using edge expansion [Hoppe et al. 1993] and vertex split [Popovic and Hoppe 1997].The main contributions of this paper are: the idea of using the kd-tree encoding of the geometry to drive the construction of a sequence of meshes, an improved coding of the edge expansion and vertex split since the vertices to split are implicitly defined, a prediction scheme which reduces the code for simplices incident to the split vertex, and a new generalization of the edge expansion operation to tetrahedral meshes.

208 citations


Proceedings ArticleDOI
02 Apr 2002
TL;DR: The PPM algorithm implementation that has a complexity comparable with widespread practical compression schemes based on LZ77, LZ78 and BWT algorithms is devoted.
Abstract: PPM is one of the most promising lossless data compression algorithms using Markov source model of order D. Its main essence is the coding of a new (in the given context) symbol in one of inner nodes of the context tree; a sequence of the special escape symbols is used to describe this node. In reality, the majority of symbols is encoded in inner nodes and the Markov model becomes rather conventional. In spite of the fact that the PPM algorithm achieves the best results in comparison with others, it is used rarely in practical applications due to its high computational complexity. This paper is devoted to the PPM algorithm implementation that has a complexity comparable with widespread practical compression schemes based on LZ77, LZ78 and BWT algorithms. This scheme has been proposed by Shkarin (see Problems of Information Transmission, vol.34, no.3, p.44-54, 2001) and named PPM with information inheritance (PPMII).

179 citations


Proceedings ArticleDOI
07 Oct 2002
TL;DR: A new lossless test vector compression scheme is presented which combines linear feedback shift register (LFSR) reseeding and statistical coding in a powerful way to improve the encoding efficiency.
Abstract: A new lossless test vector compression scheme is presented which combines linear feedback shift register (LFSR) reseeding and statistical coding in a powerful way. Test vectors can be encoded as LFSR seeds by solving a system of linear equations. The solution space of the linear equations can be quite large. The proposed method takes advantage of this large solution space to find seeds that can be efficiently encoded using a statistical code. Two architectures for implementing LFSR reseeding with seed compression are described. One configures the scan cells themselves to perform the LFSR functionality while the other uses a new idea of "scan windows" to allow the use of a small separate LFSR whose size is independent of the number of scan cells. The proposed scheme can be used either for applying a fully deterministic test set or for mixed-mode built-in self-test (BIST), and it can be used in conjunction with other variations of LFSR reseeding that have been previously proposed to further improve the encoding efficiency.

175 citations


Patent
09 Apr 2002
TL;DR: An integrated memory controller (IMC) which includes data compression and decompression engines for improved performance is presented in this paper. But it is limited to the use of a single memory controller.
Abstract: An integrated memory controller (IMC) which includes data compression and decompression engines for improved performance. The memory controller (IMC) of the present invention preferably sits on the main CPU bus or a high speed system peripheral bus such as the PCI bus and couples to system memory. The IMC preferably uses a lossless data compression and decompression scheme. Data transfers to and from the integrated memory controller of the present invention can thus be in either two formats, these being compressed or normal (non-compressed). The IMC also preferably includes microcode for specific decompression of particular data formats such as digital video and digital audio. Compressed data from system I/O peripherals such as the hard drive, floppy drive, or local area network (LAN) are decompressed in the IMC and stored into system memory or saved in the system memory in compressed format. Thus, data can be saved in either a normal or compressed format, retrieved from the system memory for CPU usage in a normal or compressed format, or transmitted and stored on a medium in a normal or compressed format. Internal memory mapping allows for format definition spaces which define the format of the data and the data type to be read or written. Software overrides may be placed in applications software in systems that desire to control data decompression at the software application level. The integrated data compression and decompression capabilities of the IMC remove system bottle-necks and increase performance. This allows lower cost systems due to smaller data storage requirements and reduced bandwidth requirements. This also increases system bandwidth and hence increases system performance. Thus the IMC of the present invention is a significant advance over the operation of current memory controllers.

Journal ArticleDOI
TL;DR: The results show that the choice of the “best” standard depends strongly on the application at hand, but that JPEG 2000 supports the widest set of features among the evaluated standards, while providing superior rate-distortion performance in most cases.
Abstract: JPEG 2000, the new ISO/ITU-T standard for still image coding, has recently reached the International Standard (IS) status. Other new standards have been recently introduced, namely JPEG-LS and MPEG-4 VTC. This paper provides a comparison of JPEG 2000 with JPEG-LS and MPEG-4 VTC, in addition to older but widely used solutions, such as JPEG and PNG, and well established algorithms, such as SPIHT. Lossless compression efficiency, fixed and progressive lossy rate-distortion performance, as well as complexity and robustness to transmission errors, are evaluated. Region of Interest coding is also discussed and its behavior evaluated. Finally, the set of provided functionalities of each standard is also evaluated. In addition, the principles behind each algorithm are briefly described. The results show that the choice of the “best” standard depends strongly on the application at hand, but that JPEG 2000 supports the widest set of features among the evaluated standards, while providing superior rate-distortion performance in most cases.

Proceedings ArticleDOI
TL;DR: A reversible watermarking method based on an integer wavelet transform that enables the recovery of the original, unwatermarked content after the watermarked content has been detected to be authentic.
Abstract: In the digital information age, digital content (audio, image, and video) can be easily copied, manipulated, and distributed. Copyright protection and content authentication of digital content has become an urgent problem to content owners and distributors. Digital watermarking has provided a valuable solution to this problem. Based on its application scenario, most digital watermarking methods can be divided into two categories: robust watermarking and fragile watermarking. As a special subset of fragile watermark, reversible watermark (which is also called lossless watermark, invertible watermark, erasable watermark) enables the recovery of the original, unwatermarked content after the watermarked content has been detected to be authentic. Such reversibility to get back unwatermarked content is highly desired in sensitive imagery, such as military data and medical data. In this paper we present a reversible watermarking method based on an integer wavelet transform. We look into the binary representation of each wavelet coefficient and embed an extra bit to expandable wavelet coefficient. The location map of all expanded coefficients will be coded by JBIG2 compression and these coefficient values will be losslessly compressed by arithmetic coding. Besides these two compressed bit streams, an SHA-256 hash of the original image will also be embedded for authentication purpose.

Journal ArticleDOI
TL;DR: This work presents new differencing algorithms that operate at a fine granularity (the atomic unit of change), make no assumptions about the format or alignment of input data, and in practice use linear time, use constant space, and give good compression.
Abstract: The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains [log n] bits), as compared to algorithms that run in O(n) time and O(n) space or in O(n2) time and O(1) space. We introduce two new techniques for differential compression and apply these to give additional algorithms that improve compression and time performance. We experimentally explore the properties of our algorithms by running them on actual versioned data. Finally, we present theoretical results that limit the compression power of differencing algorithms that are restricted to making only a single pass over the data.

Journal ArticleDOI
TL;DR: A subjective listening test of the combined pre-filter/lossless coder and a state-of-the-art perceptual audio coder (PAC) shows that the new method achieves a comparable compression ratio and audio quality with a lower delay.
Abstract: This paper proposes a versatile perceptual audio coding method that achieves high compression ratios and is capable of low encoding/decoding delay. It accommodates a variety of source signals (including both music and speech) with different sampling rates. It is based on separating irrelevance and redundancy reductions into independent functional units. This contrasts traditional audio coding where both are integrated within the same subband decomposition. The separation allows for the independent optimization of the irrelevance and redundancy reduction units. For both reductions, we rely on adaptive filtering and predictive coding as much as possible to minimize the delay. A psycho-acoustically controlled adaptive linear filter is used for the irrelevance reduction, and the redundancy reduction is carried out by a predictive lossless coding scheme, which is termed weighted cascaded least mean squared (WCLMS) method. Experiments are carried out on a database of moderate size which contains mono-signals of different sampling rates and varying nature (music, speech, or mixed). They show that the proposed WCLMS lossless coder outperforms other competing lossless coders in terms of compression ratios and delay, as applied to the pre-filtered signal. Moreover, a subjective listening test of the combined pre-filter/lossless coder and a state-of-the-art perceptual audio coder (PAC) shows that the new method achieves a comparable compression ratio and audio quality with a lower delay.

Patent
11 Jan 2002
TL;DR: In this article, a plurality of different versions of compressed data generated by the different compression algorithms may be examined to determine an optimal version of the compressed data according to one or more predetermined criteria.
Abstract: Embodiments of a compression/decompression (codec) system may include a plurality of data compression engines each implementing a different data compression algorithm. A codec system may be designed for the reduction of data bandwidth and storage requirements and for compressing/decompressing data. Uncompressed data may be compressed using a plurality of compression engines in parallel, with each engine compressing the data using a different lossless data compression algorithm. At least one of the data compression engines may implement a parallel lossless data compression algorithm designed to process stream data at more than a single byte or symbol at one time. The plurality of different versions of compressed data generated by the different compression algorithms may be examined to determine an optimal version of the compressed data according to one or more predetermined criteria. A codec system may be integrated in a processor, a system memory controller or elsewhere within a system.

Journal ArticleDOI
TL;DR: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rates of convergence for finite-memory sources.
Abstract: The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n /spl rarr/ /spl infin/, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources.

Proceedings ArticleDOI
06 Jan 2002
TL;DR: The approximation ratio is analyzed, that is, the maximum ratio between the size of the generated grammar and the smallest possible grammar over all inputs, for four previously-proposed grammar-based compression algorithms.
Abstract: Several recently-proposed data compression algorithms are based on the idea of representing a string by a context-free grammar. Most of these algorithms are known to be asymptotically optimal with respect to a stationary ergodic source and to achieve a low redundancy rate. However, such results do not reveal how effectively these algorithms exploit the grammar-model itself; that is, are the compressed strings produced as small as possible? We address this issue by analyzing the approximation ratio of several algorithms, that is, the maximum ratio between the size of the generated grammar and the smallest possible grammar over all inputs. On the negative side, we show that every polynomial-time grammar-compression algorithm has approximation ratio at least 8569/8568 unless P = NP. Moreover, achieving an approximation ratio of o(log n/log log n) would require progress on an algebraic problem in a well-studied area. We then upper and lower bound approximation ratios for the following four previously-proposed grammar-based compression algorithms: SEQUENTIAL, BISECTION, GREEDY, and LZ78, each of which employs a distinct approach to compression. These results seem to indicate that there is much room to improve grammar-based compression algorithms.

Journal ArticleDOI
TL;DR: Angle‐Analyzer, a new single‐rate compression algorithm for triangle‐quad hybrid meshes using a carefully‐designed geometry‐driven mesh traversal and an efficient encoding of intrinsic mesh properties produces compression ratios 40% better in connectivity and 20%better in geometry than the leading Touma and Gotsman technique for the same level of geometric distortion.
Abstract: Theme 2 — Genie logicielet calcul symboliqueProjet PrismeRapport de recherche n° 4584 — Octobre 2002 — 29 pagesAbstract: We present Angle-Analyzer, a new single-rate compression algorithm for triangle-quad hybrid meshes. Using a carefully-designed geometry-driven mesh traversal and an ef-ficient encoding of intrinsic mesh properties, Angle-Analyzer produces compression ratios40% better in connectivity and 20% better in geometry than the leading Touma and Gotsmantechnique for the same level of geometric distortion. The simplicity and performance of thisnew technique is demonstrated, and we provide extensive comparative tests to contrast ourresults with the current state-of-the-art techniques.Key-words: Compression algorithms, hybrid meshes, Surface mesh compression, connec-tivity coding, geometry coding.

Patent
11 Jan 2002
TL;DR: In this article, a plurality of compression/decompression engines are used to compress or decompress a particular part of the data and then merge the portions of compressed or uncompressed data output from the plurality of engines.
Abstract: Embodiments of a compression/decompression (codec) system may include a plurality of parallel data compression and/or parallel data decompression engines designed for the reduction of data bandwidth and storage requirements and for compressing/decompressing data. The plurality of compression/decompression engines may each implement a parallel lossless data compression/decompression algorithm. The codec system may split incoming uncompressed or compressed data up among the plurality of compression/decompression engines. Each of the plurality of compression/decompression engines may compress or decompress a particular part of the data. The codec system may then merge the portions of compressed or uncompressed data output from the plurality of compression/decompression engines. The codec system may implement a method for performing parallel data compression and/or decompression designed to process stream data at more than a single byte or symbol at one time. A codec system may be integrated in a processor, a system memory controller or elsewhere within a system.

Journal ArticleDOI
TL;DR: A VLSI architecture is proposed for the IWT implementation, capable of achieving very high frame rates with moderate gate complexity and the effects of finite precision representation of the lifting coefficients on the compression performance are analyzed.
Abstract: This paper deals with the design and implementation of an image transform coding algorithm based on the integer wavelet transform (IWT). First of all, criteria are proposed for the selection of optimal factorizations of the wavelet filter polyphase matrix to be employed within the lifting scheme. The obtained results lead to the IWT implementations with very satisfactory lossless and lossy compression performance. Then, the effects of finite precision representation of the lifting coefficients on the compression performance are analyzed, showing that, in most cases, a very small number of bits can be employed for the mantissa keeping the performance degradation very limited. Stemming from these results, a VLSI architecture is proposed for the IWT implementation, capable of achieving very high frame rates with moderate gate complexity.

Proceedings ArticleDOI
04 Mar 2002
TL;DR: A novel and efficient architecture for on-the-fly data compression and decompression whose field of operation is the cache-to-memory path of a core-based system running standard benchmark programs is proposed.
Abstract: In this paper, we suggest hardware-assisted data compression as a tool for reducing energy consumption of core-based embedded systems. We propose a novel and efficient architecture for on-the-fly data compression and decompression whose field of operation is the cache-to-memory path. Uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache refills take place. We explore two classes of compression methods, profile-driven and differential, since they are characterized by compact HW implementations, and we compare their performance to those provided by some state-of-the-art compression methods (e.g., we have considered a few variants of the Lempel-Ziv encoder). We present experimental results about memory traffic and energy consumption in the cache-to-memory path of a core-based system running standard benchmark programs. The achieved average energy savings range from 4.2% to 35.2%, depending on the selected compression algorithm.

Journal ArticleDOI
TL;DR: In this paper, an efficient algorithm for compressing geometric data has been developed for closed polyhedral surfaces which are manifold or "nearly manifolds" (nearly manifold) surfaces.
Abstract: Efficient algorithms for compressing geometric data have been widely developed in the recent years, but they are mainly designed for closed polyhedral surfaces which are manifold or "nearly manifol...

Patent
02 Jul 2002
TL;DR: In this paper, a method of losslessly compressing and encoding signals representing image information is claimed, where a lossy compressed data file and a residual compressed file are generated, and a lossless data file that is substantially identical to the original data file is created.
Abstract: A method of losslessly compressing and encoding signals representing image information is claimed. A lossy compressed data file and a residual compressed data file are generated. When the lossy compressed data file and the residual compressed data file are combined, a lossless data file that is substantially identical to the original data file is created.

Journal ArticleDOI
TL;DR: This paper discusses the second step algorithms of the Burrows–Wheeler compression algorithm, which in the original version is the Move To Front transform, and proposes a new algorithm that yields a better compression ratio than the previous algorithms.
Abstract: In this paper we fix our attention on the second step algorithms of the Burrows‐Wheeler compression algorithm, which in the original version is the Move To Front transform. We discuss many of its replacements presented so far, and compare compression results obtained using them. Then we propose a new algorithm that yields a better compression ratio than the previous ones.

Journal ArticleDOI
TL;DR: An online preprocessing technique is proposed, which, although very simple, is able to provide significant improvements in the compression ratio of the images that it targets and shows a good robustness on other images.
Abstract: This article addresses the problem of improving the efficiency of lossless compression of images with sparse histograms. An online preprocessing technique is proposed, which, although very simple, is able to provide significant improvements in the compression ratio of the images that it targets and shows a good robustness on other images.

Proceedings ArticleDOI
12 Dec 2002
TL;DR: This work proposes a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings and demonstrates the efficacy of this approach on collections of Web pages.
Abstract: Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. We study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of Web pages. Our experiments show that cluster-based delta compression of collections provides significant improvements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency.

Proceedings ArticleDOI
02 Oct 2002
TL;DR: This paper proposes code compression schemes that use variable-to-fixed (V2F) length coding and an instruction bus encoding scheme, which can effectively reduce the bus power consumption.
Abstract: Memory has been one of the most restricted resources in the embedded computing system domain. Code compression has been proposed as a solution to this problem. Previous work used fixed-to variable coding algorithms that translate fixed-length bit sequences into variable-length bit sequences. In this paper, we propose code compression schemes that use variable-to-fixed (V2F) length coding. We also propose an instruction bus encoding scheme, which can effectively reduce the bus power consumption. Though the code compression algorithm can be applied to any embedded processor, it favors VLIW architectures because VLIW architectures require a high-bandwidth instruction pre-fetch mechanism to supply multiple operations per cycle. Experiments show that the compression ratios using memoryless V2F coding for IA-64 and TMS320C6x are around 72.7% and 82.5% respectively. Markov V2F coding can achieve better compression ratio up to 56% and 70% for IA-64 and TMS320C6x respectively. A greedy algorithm for codeword assignment can reduce the bus power consumption and the reduction depends on the probability model used.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: Two new lossless watermarking methods for authentication of digital videos are presented, in which each frame contains its hash embedded losslessly, while in the second, a hash for a group of frames in B frames only is embedded.
Abstract: In authentication using watermarking, the original media needs to be slightly modified in order to embed a short media digest in the media itself. Lossless authentication watermarking achieves the same goal with the advantage that the distortion can be erased if media authenticity is positively verified. We extend lossless data embedding methods, originally developed for the JPEG image format, to digital video in the MPEG-2 format. Two new lossless watermarking methods for authentication of digital videos are presented. In the first, each frame contains its hash embedded losslessly, while in the second, we embed a hash for a group of frames in B frames only. Implementation issues, such as real-time performance, are also addressed.

Patent
Francis A. Kampf1
21 Mar 2002
TL;DR: In this article, a data segment is compressed utilizing a history buffer to identify repeated character sequences within the data segment, and the history buffer is updated to include a pre-selected data set and reset data from the next data segment.
Abstract: A method and system for increasing compression efficiency of a lossless data compression utility. The data compression utility compresses a segmented input data stream into independently decompressible data blocks, and includes a history buffer that maintains a history of matching character sequences. In accordance with the method of the present invention, a data segment is compressed utilizing a history buffer to identify repeated character sequences within the data segment. Upon receipt of a next data segment to be compressed, the history buffer is updated to include a pre-selected data set and reset data from the next data segment. As part of the compression an adaptable cache is searched for non-repeating bytes within a next data segment. Matching bytes are coded as cache references. Further efficiency is obtained by processing the next data segment as two-byte pairs.