scispace - formally typeset
Search or ask a question
Patent

Adaptive data compression apparatus including run length encoding for a tape drive system

TL;DR: In this paper, the adaptive data compression apparatus is used to efficiently compress a user data file received from a host computer into a bit oriented compressed format for storage on the magnetic tape that is loaded in the tape transport.
Abstract: The adaptive data compression apparatus is located within a tape drive control unit which is interposed between one or more host computers and one or more tape transports. The adaptive data compression apparatus functions to efficiently compress a user data file received from a host computer into a bit oriented compressed format for storage on the magnetic tape that is loaded in the tape transport. The data compression apparatus divides each block of an incoming user data file into predetermined sized segments, each of which is compressed independently without reference to any other segment in the user data file. The data compression apparatus concurrently uses a plurality of data compression algorithms to adapt the data compression operation to the particular data stored in the user data file. A cyclic redundancy check circuit is used to compute a predetermined length CRC code from all of the incoming user data bytes before they are compressed. The computed CRC code is appended to the end of the compressed data block. The data compression apparatus operates by converting bytes and strings of bytes into shorter bit string codes called reference values. The reference values replace the bytes and strings of bytes when recorded on the magnetic tape. The byte strings have two forms, a run length form for characters that are repeated three or more times, and a string form that recognizes character patterns of two or more characters.
Citations
More filters
Patent
15 Feb 1996
TL;DR: In this paper, a method and apparatus for detecting common spans within one or more data blocks by partitioning the blocks into subblocks and searching the group of subblocks (or their corresponding hashes) for duplicates is presented.
Abstract: This invention provides a method and apparatus for detecting common spans within one or more data blocks by partitioning the blocks (figure 4) into subblocks and searching the group of subblocks (figure 12) (or their corresponding hashes (figure 13)) for duplicates. Blocks can be partitioned into subblocks using a variety of methods, including methods that place subblock boundaries at fixed positions (figure 3), methods that place subblock boundaries at data-dependent positions (figure 3), and methods that yield multiple overlapping subblocks (figure 6). By comparing the hashes of subblocks, common spans of one or more blocks can be identified without ever having to compare the blocks or subblocks themselves (figure 13). This leads to several applications including an incremental backup system that backs up changes rather than changed files (figure 25), a utility that determines the similarities and differences between two files (figure 13), a file system that stores each unique subblock at most once (figure 26), and a communications system that eliminates the need to transmit subblocks already possessed by the receiver (figure 19).

385 citations

Patent
08 Apr 2006
TL;DR: In this paper, a method for compressing data comprises the steps of: analyzing a data block of an input data stream to identify a data type of the data block, the input dataset consisting of a plurality of disparate data types; performing content dependent data compression on the block; and performing content independent data compression if the data type is not identified.
Abstract: Systems and methods for providing fast and efficient data compression using a combination of content independent data compression and content dependent data compression. In one aspect, a method for compressing data comprises the steps of: analyzing a data block of an input data stream to identify a data type of the data block, the input data stream comprising a plurality of disparate data types; performing content dependent data compression on the data block, if the data type of the data block is identified; performing content independent data compression on the data block, if the data type of the data block is not identified.

304 citations

Patent
14 Feb 2001
TL;DR: In this paper, the hash file system of the present invention utilizes hash values for computer files or file pieces which may be produced by a checksum generating program, engine or algorithm such as industry standard MD4, MD5, SHA or SHA-1 algorithms.
Abstract: A system and method for a computer file system that is based and organized upon hashes and/or strings of digits of certain, different, or changing lengths and which is capable of eliminating or screening redundant copies of aggregate blocks of data (or parts of data blocks) from the system. The hash file system of the present invention utilizes hash values for computer files or file pieces which may be produced by a checksum generating program, engine or algorithm such as industry standard MD4, MD5, SHA or SHA-1 algorithms. Alternatively, the hash values may be generated by a checksum program, engine, algorithm or other means that produces an effectively unique hash value for a block of data of indeterminate size based upon a non-linear probablistic mathematical algorithm.

297 citations

Patent
22 Aug 2002
TL;DR: In this paper, a reversible wavelet filter is used to generate coefficients from input data, such as image data, and an entropy coder performs entropy coding on the embedded codestream to produce the compressed data stream.
Abstract: A compression and decompression system in which a reversible wavelet filter are used to generates coefficients from input data, such as image data. The reversible wavelet filter is an efficient transform implemented with integer arithmetic that has exact reconstruction. The present invention uses the reversible wavelet filter in a lossless system (or lossy system) in which an embedded codestream is generated from the coefficients produced by the filter. An entropy coder performs entropy coding on the embedded codestream to produce the compressed data stream.

218 citations

Patent
19 Oct 2006
TL;DR: In this article, the authors present a method for providing accelerated loading of operating system and application programs upon system boot or application launch, which consists of: maintaining a list of boot data associated with an application program, preloading the application data upon launching the application program; and servicing requests for application data from a computer system using the preloaded boot data.
Abstract: Systems and methods for providing accelerated loading of operating system and application programs upon system boot or application launch are disclosed. In one aspect, a method for providing accelerated loading of an operating system comprises the steps of: maintaining a list of boot data used for booting a computer system; preloading the boot data upon initialization of the computer system; and servicing requests for boot data from the computer system using the preloaded boot data. In another aspect, a method for providing accelerated launching of an application program comprises the steps of: maintaining a list of application data associated with an application program; preloading the application data upon launching the application program; and servicing requests for application data from a computer system using the preloaded application data.

207 citations

References
More filters
Patent
18 Jun 1984
TL;DR: In this paper, a data compressor compresses an input stream of data character signals by storing in a string table strings encountered in the input stream, where each string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension characters.
Abstract: A data compressor compresses an input stream of data character signals by storing in a string table strings of data character signals encountered in the input stream. The compressor searches the input stream to determine the longest match to a stored string. Each stored string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension character. Each string has a code signal associated therewith and a string is stored in the string table by, at least implicitly, storing the code signal for the string, the code signal for the string prefix and the extension character. When the longest match between the input data character stream and the stored strings is determined, the code signal for the longest match is transmitted as the compressed code signal for the encountered string of characters and an extension string is stored in the string table. The prefix of the extended string is the longest match and the extension character of the extended string is the next input data character signal following the longest match. Searching through the string table and entering extended strings therein is effected by a limited search hashing procedure. Decompression is effected by a decompressor that receives the compressed code signals and generates a string table similar to that constructed by the compressor to effect lookup of received code signals so as to recover the data character signals comprising a stored string. The decompressor string table is updated by storing a string having a prefix in accordance with a prior received code signal and an extension character in accordance with the first character of the currently recovered string.

356 citations

Patent
19 Aug 1985
TL;DR: In this paper, a run length encoding scheme using a flag byte symbol which is disposed between a character signal and a running length symbol was proposed. But this scheme was not suitable for the use of large numbers of characters.
Abstract: A compression device which uses both run length encoding and statistical encoding. The run length encoding scheme uses a flag byte symbol which is disposed between a character signal and a run length symbol. The statistical encoding process uses multiple statistical encoding tables which are selected based upon previously occurring data.

189 citations

Patent
30 Apr 1975
TL;DR: In this article, a data compaction system and apparatus is described, which, in the preferred embodiment, includes a high speed compaction controller utilizing both read only storage and read-write storage.
Abstract: A data compaction system and apparatus is disclosed which, in the preferred embodiment, includes a high speed compaction controller utilizing both read only storage and read-write storage. A compaction device according to the present invention could then be placed upon both ends of a transmission line, the data received by a compaction unit at one end of the line from whatever apparatus wished to transmit data on the line, the data compacted within the compaction unit according to the present invention, the data transmitted on the line to a compaction unit on the other end of the line, the data decompacted, and the data provided to whatever apparatus wished to receive the data. Data received by the compaction device according to the present invention in a fixed length, fixed number base, coded manner would then be compacted by altering the expression of the data, as by altering the number bases in which the data is expressed and by switching between number bases. Thus, it has been found that expressing the data as a string of characters of varying lengths and varying number bases, that is characters not all expressed in the same number base, shortens the overall length of the data transmitted. This is true even if the length of certain characters may be increased by the techniques according to the present invention. Also, prior character and prior record comparisons according to the present invention significantly enhance the compaction ability of the present invention.

81 citations

Patent
06 Aug 1984
TL;DR: In this article, an input data string including repetitive data more in number than the specified value is transformed into a data string having a format including the first region where non-compressed data are placed, the second region including a datum representative of a compressed data string section which has undergone the compression process and information indicative of the number of repetitive data.
Abstract: Method of data compression and restoration wherein an input data string including repetitive data more in number than the specified value is transformed into a data string having a format including the first region where non-compressed data are placed, the second region including a datum representative of a data string section which has undergone the compression process and information indicative of the number of repetitive data, i.e., the length of the data string section, and control information inserted at the front and back of the first region indicative of the number of data included in the first region, said transformed data string being recorded on the recording medium, and, for data reproduction, the first and second regions are identified on the basis of the control information read out on the recording medium so that the compressed data string section is transformed back to the original data string in the form of repetitive data.

71 citations