scispace - formally typeset
Search or ask a question

Showing papers on "Approximate string matching published in 1984"


Patent
18 Jun 1984
TL;DR: In this paper, a data compressor compresses an input stream of data character signals by storing in a string table strings encountered in the input stream, where each string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension characters.
Abstract: A data compressor compresses an input stream of data character signals by storing in a string table strings of data character signals encountered in the input stream. The compressor searches the input stream to determine the longest match to a stored string. Each stored string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension character. Each string has a code signal associated therewith and a string is stored in the string table by, at least implicitly, storing the code signal for the string, the code signal for the string prefix and the extension character. When the longest match between the input data character stream and the stored strings is determined, the code signal for the longest match is transmitted as the compressed code signal for the encountered string of characters and an extension string is stored in the string table. The prefix of the extended string is the longest match and the extension character of the extended string is the next input data character signal following the longest match. Searching through the string table and entering extended strings therein is effected by a limited search hashing procedure. Decompression is effected by a decompressor that receives the compressed code signals and generates a string table similar to that constructed by the compressor to effect lookup of received code signals so as to recover the data character signals comprising a stored string. The decompressor string table is updated by storing a string having a prefix in accordance with a prior received code signal and an extension character in accordance with the first character of the currently recovered string.

356 citations


Journal ArticleDOI
TL;DR: An algorithm that produces the shortest edit sequence transforming one string into another is presented and is optimal in the sense that it generates a minimal covering set of common substrings of one string with respect to another.
Abstract: The string-to-string correction problem is to find a minimal sequence of edit operations for changing a given string into another given string. Extant algorithms compute a longest common subsequence (LCS) of the two strings and then regard the characters not included in the LCS as the differences. However, an LCS does not necessarily include all possible matches, and therefore does not produce the shortest edit sequence. An algorithm that produces the shortest edit sequence transforming one string into another is presented. The algorithm is optimal in the sense that it generates a minimal covering set of common substrings of one string with respect to another. Two improvements of the basic algorithm are developed. The first improvement performs well on strings with few replicated symbols. The second improvement runs in time and space linear to the size of the input. Efficient algorithms for regenerating a string from an edit sequence are also presented.

239 citations


Journal ArticleDOI
TL;DR: This correspondence describes an efficient string pattern matching machine to locate all occurrences of any of a finite number of keywords and phrases in an arbitrary text string.
Abstract: This correspondence describes an efficient string pattern matching machine to locate all occurrences of any of a finite number of keywords and phrases in an arbitrary text string. Some conditions are defined on the states of the machine in order to improve the speed and size of the machine by Aho and Corasick [1]. The pattern matching algorithm is partitioned into various cases by combining these conditions. Finally, the correspondence illustrates the proposed approach by applying it to the analysis of the machines for a simple search.

33 citations


Journal ArticleDOI
TL;DR: One-and two-dimensional pattern matching oriented systolic array processors are presented that support the detection of all repetitions in a string x and the statistics of all substrings of x with and without overlap.
Abstract: One-and two-dimensional pattern matching oriented systolic array processors are presented that support, respectively, the detection of all repetitions in a string x and the statistics of all substrings of x with and without overlap. The time is linear in the length of x in both applications, whereas the number of processors is linear and quadratic, respectively.

13 citations