Top 30 papers published in the topic of Approximate string matching in 1992

TL;DR: This paper presents a new algorithmic technique for two-dimensional matching, that of periodicity analysis, and introduces a new pattern matching paradigm - Compressed Matching

...read moreread less

Abstract: String matching is rich with a variety of algorithmic tools. In contrast, multidimensional matching has a rather sparse set of techniques. This paper presents a new algorithmic technique for two-dimensional matching, that of periodicity analysis.Periodicity in strings has been used to solve string matching problems. The success of these algorithms suggests that periodicity can be as important a tool in multidimensional matching. However, multidimensional periodicity is not as simple as it is in strings and was not formally studied or used in pattern matching.This paper's main contribution is defining and analysing two-dimensional periodicity in rectangular arrays. In addition, we introduce a new pattern matching paradigm - Compressed Matching. A text array T and a pattern array P are given in compressed forms c(T) and c(P). We seek all appearances of P in T, without decompressing T. By using periodicity analysis, we show that for the two-dimensional run-length compression there is a O(|c(T)|log|P|+|P|), or almost optimal algorithm that can achieve a search time that is sublinear in the size of the text |T|.

...read moreread less

87 citations

Journal Article•DOI•

An efficient algorithm for the All Pairs Suffix-Prefix Problem

[...]

Dan Gusfield¹, Gad M. Landau², Baruch Schieber•Institutions (2)

University of California, Davis¹, New York University²

18 Mar 1992-Information Processing Letters

TL;DR: An algorithm is presented that solves the problem of finding the suffix-prefix match for each of the k(k - 1) ordered pairs of strings in O(m + k 2) time, for any fixed alphabet.

...read moreread less

83 citations

Proceedings Article•DOI•

Alphabet independent two dimensional matching

[...]

Amihood Amir¹, Gary Benson², Martin Farach³•Institutions (3)

Georgia Institute of Technology¹, University of Maryland, College Park², Rutgers University³

01 Jul 1992

TL;DR: An algorithm for two dimensional matching with an 0(n2) text scanning phase that runs on the same model as standard linear time string matching algorithm and requires no special assumptions about the alphabet.

...read moreread less

Abstract: Alphabet Independent Two Dimensional Matching Amihood Amir* Gary Bensont Martin Farach$ Georgia Tech Univ. of Maryland DIMACS There are many solutions to the string matching pmZllem whkh are strictly linear in the input size and independent ofalphabet.size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the input. In contrast, algorithm for two dimensional matching have needed stronger models of computation, most notably assuming a totally ordered alphabet. The fastest algorithms for two dimensional matching have therefore had a logarithmic dependence on the alphabet size. In the worst case, this givea an algorithm that runs in 0(n2 log m) with 0(rn2 log m) preprocessing. We show an algorithm for two dimensional matching with an 0(n2) text scanning phase. Furthermore, the text scan requires no special assumptions about the alphabet, i.e. it runs on the same model as standard linear time string matching algorithm. the *College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0280; (404) 853-0083; amir@cc.gatecb. edu; Partially supported by NSF ~ant IR.I-90-13055. tDept. of Computer Scienee, University of Maryland, College Park, MD 20742; (301) 405-2715; benaon@cs.umd.edq Partially supported by NSF grant IRI-90-13055. :DIMACS, Box 1179, Rutgers University, Piscataway, NJ 08855; (808) 932-592% farach@Xhu.acs.mtgers.edw, Supported by DIMACS under NSF contract STC-88-09648. Permission to copy without fee all or pert of thie material ie grantad provided that tha copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and ite date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or apacific permission. 24th ANNUAL ACM STOC 5/92/VICTORIA, B.C., CANADA a 1992 ACM 0-89791-51 2-7/9210004/0059 ...$1 .50

...read moreread less

58 citations

Patent•

Associative cam apparatus and method for variable length string matching

[...]

Brian Ta-Cheng Hou¹, Craig D. Cohen², James Pasco-Anderson², Michael Gutman²•Institutions (2)

General Electric¹, Codex Corporation²

13 Nov 1992

TL;DR: A variable length string matcher as mentioned in this paper finds the longest string in a stored sequence of data elements (e.g., in a history buffer) that matches a string in given sequence of input data elements.

...read moreread less

Abstract: A variable length string matcher finds the longest string in a stored sequence of data elements (e.g., in a history buffer) that matches a string in a given sequence of data elements. The matcher includes circuitry that operates iteratively to compare data elements of the strings and determine the longest matching string based on when an iteration does not result in issuance of a match signal. In another aspect, the history buffer is an associative content addressable memory (CAM), and the string matcher uses absolute addressing of the CAM to determine the longest matching string.

...read moreread less

48 citations

Journal Article•DOI•

Two-dimensional dictionary matching

[...]

Amihood Amir¹, Martin Farach²•Institutions (2)

Georgia Institute of Technology¹, Rutgers University²

21 Dec 1992-Information Processing Letters

TL;DR: This paper presents an algorithm for the Two-Dimensional Dictionary Problem, that of finding each occurrence of a set of two-dimensional patterns in a text.

...read moreread less

47 citations

Book Chapter•DOI•

Efficient Randomized Dictionary Matching Algorithms (Extended Abstract)

[...]

Amihood Amir¹, Martin Farach², Yossi Matias³•Institutions (3)

Georgia Institute of Technology¹, Rutgers University², University of Maryland, College Park³

29 Apr 1992

TL;DR: The standard string matching problem involves finding all occurrences of a single pattern in a single text, while there are some domains in which it is more appropriate to deal with dictionaries of patterns.

...read moreread less

Abstract: The standard string matching problem involves finding all occurrences of a single pattern in a single text. While this approach works well in many application areas, there are some domains in which it is more appropriate to deal with dictionaries of patterns. A dictionary is a set of patterns; the goal of dictionary matching is to find all dictionary patterns in a given text, simultaneously.

...read moreread less

34 citations

Journal Article•DOI•

An approximate string-matching algorithm

[...]

Jong Yong Kim¹, John Shawe-Taylor¹•Institutions (1)

University of London¹

06 Jan 1992

TL;DR: An approximate string-matching algorithm is described based on earlier attribute- matching algorithms, which involves building a trie from the text string which takes time O(N log2 N), for a text string of length N.

...read moreread less

Abstract: An approximate string-matching algorithm is described based on earlier attribute-matching algorithms. The algorithm involves building a trie from the text string which takes time O(N log2 N), for a text string of length N. Once this data structure has been built any number of approximate searches can be made for pattern strings of length m. The expected complexity analysis is given for the look-up phase of the algorithm based on certain regularity assumptions about the background language. The expected look-up time for each pattern is O(m log2 N). The ideas employed in the algorithm have been shown effective in practice before, but have not previously received any theoretical analysis.

...read moreread less

Efficient string algorithmics

[...]

Dany Breslauer

01 Jan 1992

TL;DR: This work considers several problems from a theoretical perspective and provides efficient algorithms and lower bounds for these problems in sequential and parallel models of computation for the string matching problem.

...read moreread less

Abstract: Problems involving strings arise in many areas of computer science and have numerous practical applications. We consider several problems from a theoretical perspective and provide efficient algorithms and lower bounds for these problems in sequential and parallel models of computation. In the sequential setting, we present new algorithms for the string matching problem improving the previous bounds on the number of comparisons performed by such algorithms. In parallel computation, we present tight algorithms and lower bounds for the string matching problem, for finding the periods of a string, for detecting squares and for finding initial palindromes.

...read moreread less

Proceedings Article•

Tighter Bounds on the Exact Complexity of String Matching (Extended Abstract)

[...]

Richard Cole, Ramesh Hariharan

01 Jan 1992

TL;DR: The authors show an upper bound of n+8/3(m+1)(n-m) character comparisons, achieved by an online algorithm which performs O(n) work in total, requires O(m) space and O( m/sup 2/) time for preprocessing.

...read moreread less

Efficient comparison based string matching

[...]

Dany Breslauer, Zvi Galil

01 Jan 1992

TL;DR: In this article, the exact number of symbol comparisons that are required to solve the string matching problem was studied and a family of efficient algorithms were presented. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler.

...read moreread less

Abstract: We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler. In particular, we give a linear-time algorithm that finds all occurrences of a pattern of length m in a text of length n in [formula] comparisons. The pattern preprocessing takes linear time and makes at most 2 m comparisons. This algorithm establishes that, in general, searching for a long pattern is easier than searching for a short one. We also show that any algorithm in the family of the algorithms presented must make at least [formula] symbol comparisons, for m = 2 k − 1 and any integer k ≥ 1.

...read moreread less

Proceedings Article•DOI•

Inference of edit costs using parametric string matching

[...]

Horst Bunke¹, János Csirik•Institutions (1)

University of Bern¹

30 Aug 1992

TL;DR: The authors propose a generalized version of the string matching algorithm by Wagner and Fischer (1974) based on a parametrization of the edit cost, which computes the edit distance of A and B in terms of the parameter r.

...read moreread less

Abstract: String matching is a useful concept in pattern recognition that is constantly receiving attention from both theoretical and practical points of view. The authors propose a generalized version of the string matching algorithm by Wagner and Fischer (1974). It is based on a parametrization of the edit cost. The authors assume constant cost for any delete and insert operation, but the cost for replacing a symbol is given as a parameter r. For any two given strings A and B, the algorithm computes the edit distance of A and B in terms of the parameter r. The authors give the new algorithm and study some of its properties. Its time complexity is O(n/sup 2/.m), where n and m are the lengths of the two strings to be compared and n >

...read moreread less

Book Chapter•DOI•

Approximate Matching of Network Expressions with Spacers

[...]

Gene Myers¹•Institutions (1)

University of Arizona¹

06 Apr 1992

TL;DR: A threshold-sensitive algorithm for approximately matching both network and regular expressions and a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures are presented.

...read moreread less

Abstract: We present two algorithmic results pertinent to the matching of patterns of interest in macromolecular sequences. The first result is an output sensitive algorithm for approximately matching network expressions, i.e., regular expressions without Kleene closure. This result generalizes the O(kn) expected-time algorithm of Ukkonen for approximately matching keywords [Ukk85]. The second result concerns the problem of matching a pattern that is a network expression whose elements are approximate matches to network expressions interspersed with specifiable distance ranges. For this class of patterns, it is shown how to determine a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures.

...read moreread less

Proceedings Article•

Un)expected behavior of typical suffix trees

[...]

Wojciech Szpankowski

01 Sep 1992

TL;DR: A novel technique called string ruler approach is used to provide a characterization of several basic parameters of suffix trees (dependency among symbols are allowed !) and provide new insights and generalizations of string matching algorithms, particularly the one by Chang and Lawler.

...read moreread less

Abstract: Suffix tree is a data structure widely used in algorithms on words and data compression. Despite this, very little is known about its typical behavior. Recently, Chang and Lawler have designed a sublinear expected time algorithm for approximate string matching using simple estimates of some parameters of suffix trees. It seems that any further advances in such an endover are subject to better understanding of suffix trees behavior. In this paper, we use a novel technique called string ruler approach to provide a characterization of several basic parameters of suffix trees (dependency among symbols are allowed !). These findings are used to :(i) settle in the negative the conjecture of Wyner and Ziv regarding the typical behavior of the universal data compression scheme of Lampel and Ziv; (ii) prove an open problem regarding the length of a block in the Lampel-Ziv parsing algorithm; (iii) provide new insights and generalizations of string matching algorithms, particularly the one by Chang and Lawler.

...read moreread less

Journal Article•DOI•

An 0(1) time algorithm for string matching

[...]

Gen-Huey Chen

01 Jan 1992-International Journal of Computer Mathematics

TL;DR: An 0(1) time algorithm for string matching is designed on a two-dimensional (n-m+1)x n processor array with a reconfigurable bus system, where n and m are the length of text and pattern respectively.

...read moreread less

Abstract: An 0(1) time algorithm for string matching is designed on a two-dimensional (n-m+1)x n processor array with a reconfigurable bus system, where n and m are the length of text and pattern respectively.

...read moreread less

Journal Article•DOI•

A parallel solution to the approximate string matching problem

[...]

Alan A. Bertossi¹, Fabrizio Luccio¹, Linda Pagli¹, Elena Lodi²•Institutions (2)

University of Pisa¹, University of Siena²

01 Oct 1992

TL;DR: A parallelisation scheme for this algorithm is proposed, which applies to a very general set of errors, and allows to solve ASMP in time T with N processors, with NT of O(mn), thereby achieving optimal speedup.

...read moreread less

Abstract: The approximate string matching problem (ASMP) consists of finding all the occurrences of a string of characters X of length m in another string Y of length n, m<

...read moreread less

Book Chapter•DOI•

String Matching Under a General Matching Relation

[...]

S. Muthukrishnan¹, H. Ramesh¹•Institutions (1)

New York University¹

18 Dec 1992

TL;DR: This work considers a general string matching problem in which an arbitrary many-to-many matching relation is specified and those text positions are sought at which the pattern matches under this relation.

...read moreread less

Abstract: In standard string matching, each symbol matches only itself. In other string matching problems, e.g., the string matching with “don't-cares” problem, a symbol may match several symbols. In general, an arbitrary many-to-many matching relation might hold between symbols. We consider a general string matching problem in which such a matching relation is specified and those text positions are sought at which the pattern matches under this relation.

...read moreread less

Journal Article•DOI•

A practical method for implementing string pattern matching machines

[...]

Jun-ichi Aoe¹•Institutions (1)

University of Tokushima¹

01 Oct 1992-Information Sciences

TL;DR: It is shown by theoretical and empirical observations that the pattern matching machine by the presented structure is about 33% smaller and about 1.3 times faster than that by the triple array.

...read moreread less

Journal Article•DOI•

A fast VLSI solution for approximate string matching

[...]

Roberto Grossi¹•Institutions (1)

University of Pisa¹

01 Jun 1992-Integration

TL;DR: A simple hardware algorithm is proposed for the approximate string matching problem, where a string is searched in a large, flat text with a bounded number of insertions, deletions and substitutions.

...read moreread less

Approximate pattern matching and its applications

[...]

Sun Wu

01 Jan 1992

TL;DR: A new algorithm for approximate regular expression matching is presented, which is the first to achieve a subquadratic asymptotic time for this problem, and a new software tool called 'agrep' is developed, which are the first general purpose approximate pattern matching tool in the UNIX system.

...read moreread less

Abstract: In this thesis, we study approximate pattern matching problems. Our study is based on the Levenshtein distance model, where errors considered are 'insertions', 'deletions', and 'substitutions'. In general, given a text string, a pattern, and an integer k, we want to find substrings in the text that match the pattern with no more than k errors. The pattern can be a fixed string, a limited expression, or a regular expression. The problem has different variations with different levels of difficulties depending on the types of the pattern as well as the constraint imposed on the matching. We present new results both of theoretical interest and practical value. We present a new algorithm for approximate regular expression matching, which is the first to achieve a subquadratic asymptotic time for this problem. For the practical side, we present new algorithms for approximate pattern matching that are very efficient and flexible. Based on these algorithms, we developed a new software tool called 'agrep', which is the first general purpose approximate pattern matching tool in the UNIX system. 'agrep' is not only usually faster than the UNIX 'grep/egrep/fgrep' family, it also provides many new features such as searching with errors allowed, record-oriented search, AND/OR combined patterns, and mixed exact/approximate matching. 'agrep' has been made publicly available through anonymous ftp from cs.arizona.edu since June 1991.

...read moreread less

Book•

Combinatorial pattern matching : Third Annual Symposium, Tucson, Arizona, USA, April 29-May 1, 1992 : proceedings

[...]

Alberto Apostolico

01 Jan 1992

TL;DR: This paper presents a probabilistic analysis of generalized suffix trees and two algorithms for the longest common subsequence of three (or more) strings.

...read moreread less

Abstract: Probabilistic analysis of generalized suffix trees.- A language approach to string searching evaluation.- Pattern matching with mismatches: A probabilistic analysis and a randomized algorithm.- Fast multiple keyword searching.- Heaviest increasing/common subsequence problems.- Approximate regular expression pattern matching with concave gap penalties.- Matrix longest common subsequence problem, duality and hilbert bases.- From regular expressions to DFA's using compressed NFA's.- Identifying periodic occurrences of a template with applications to protein structure.- Edit distance for genome comparison based on non-local operations.- 3-D substructure matching in protein Molecules.- Fast serial and parallel algorithms for approximate tree matching with VLDC's (Extended Abstract).- Grammatical tree matching.- Theoretical and empirical comparisons of approximate string matching algorithms.- Fast and practical approximate string matching.- DZ A text compression algorithm for natural languages.- Multiple alignment with guaranteed error bounds and communication cost.- Two algorithms for the longest common subsequence of three (or more) strings.- Color Set Size problem with applications to string matching.- Computing display conflicts in string and circular string visualization.- Efficient randomized dictionary matching algorithms.- Dynamic dictionary matching with failure functions.

...read moreread less

Dissertation•

Algorithms for string matching with applications in molecular biology

[...]

James Lee Holloway

01 Jan 1992

TL;DR: An optimal parallel algorithm to find the edit distance, a metric frequently used to measure distance, between two sequences, and introduces a new problem, the string to string rearrangement problem, that allows movement and inversion of substrings.

...read moreread less

Abstract: As the volume of genetic sequence data increases due to improved sequencing techniques and increased interest, the computational tools available to analyze the data are becoming inadequate. This thesis seeks to improve a few of the computational methods available to access and analyze data in the genetic sequence databases. The first two results are parallel algorithms based on previously known sequential algorithms. The third result is a new approach, based on assumptions that we believe make sense in the biological context of the problem, to approximating an ${\cal NP}$-complete problem. The final result is a fundamentally new approach to approximate string matching using the divide and conquer paradigm instead of the dynamic programming approach that has been used almost exclusively in the past. Dynamic programming algorithms to measure the distance between sequences have been known since at least 1972. Recently there has been interest in developing parallel algorithms to measure the distance between two sequences. We have developed an optimal parallel algorithm to find the edit distance, a metric frequently used to measure distance, between two sequences. It is often interesting to find the substrings of length k that appear most frequently in a given string. We give a simple sequential algorithm to solve this problem and an efficient parallel version of the algorithm. The parallel algorithm uses an efficient novel parallel bucket sort. When sequencing a large segment of DNA, the original DNA sequence is reconstructed using the results of sequencing fragments, that may or may not contain errors, of many copies of the original DNA. New algorithms are given to solve the problem of reconstructing the original DNA sequence with and without errors introduced into the fragments. A program based on this algorithm is used to reconstruct the human beta globin region (HUMHBB) when given a set of 300 to 500 mers drawn randomly from the HUMHBB region. Approximate string matching is used in a biological context to model the steps of evolution. While such evolution may proceed base by base using the change, insert, or delete operators, there is also evidence that whole genes may be moved or inverted. We introduce a new problem, the string to string rearrangement problem, that allows movement and inversion of substrings. We give a divide and conquer algorithm for finding a rearrangement of one string within another.

...read moreread less

Book•

Clustering of Thesaurus terms using adaptive resonance theory, fuzzy cognitive maps and approximate string-matching techniques

[...]

Michael P. Oakes, Malcolm J. Taylor

01 Aug 1992

Proceedings Article•DOI•

“Tuning” an ASM metric: a case study in metric ASM optimization

[...]

Hal Berghel, David Roach, George Balogh, Carroll Hyatt

01 Apr 1992

TL;DR: An optimized version of the edit distance algorithm is described which has proven more accurate for a particular commercial application than the existing (benchmark) algorithm.

...read moreread less

Abstract: Wc present an approximate string matching case study. An optimized version of the edit distance algorithm is described which has proven more accurate for a particular commercial application than the existing (benchmark) algorithm. The cvoluhon and nature of the optimization are detailed and test results are presented.

...read moreread less

Fast parallel algorithms for approximate string matching

[...]

Yi Jiang¹•Institutions (1)

University of Montana¹

01 Jan 1992

TL;DR: A real-time parallel algorithm, which could be implemented on a systolic array using m (the length of the pattern string) very simple processing elements, is proposed, which is well-suited for real- time searching of text databases or biological nucleic acid sequence databases.

...read moreread less

Abstract: Given a text string, a much shorter pattern string, and an integer k , parallel algorithms for finding all occurrences of the pattern string in the text string with at most A; differences (as defined by edit distance) are discussed. First, a real-time parallel algorithm, which could be implemented on a systolic array using m (the length of the pattern string) very simple processing elements, is proposed. After the algorithm gets started, it outputs the minimum edit distance from the pattern string to a substring of the text string at each time step. Thus, the algorithm is well-suited for real-time searching of text databases or biological nucleic acid sequence databases. Second, several different ways for solving the same problem with different CRCW-PRAM assumptions (priority model, combination model, and common — value model) are developed. This class of algorithms uses 0 ( m x n) or 0 ( m x m x n) processors and achieve a time complexity of 0(k) . Key words, approximate string matching, edit distance, systolic computation, CRCW-PRAM models.

...read moreread less

'Tuning' an ASM Metric: Study in Metric ASM Optimization A Case

[...]

David Roach, Carroll Hyatt

01 Jan 1992

TL;DR: An optimized version of the edit distance algorithm is described which has proven more accurate for a particular commercial application than the existing (benchmark) algorithm.

...read moreread less

Abstract: Wc present an approximate string matching case study. An optimized version of the edit distance algorithm is described which has proven more accurate for a particular commercial application than the existing (benchmark) algorithm. The cvoluhon and nature of the optimization are detailed and test results are presented.

...read moreread less

Proceedings Article•DOI•

Pattern recognition algorithm based on cyclic codes

[...]

C. Lazar

30 Aug 1992

TL;DR: Presents some new results on using the theory of error control codes in pattern recognition by using the polynomial cyclic code classification to obtain the invariance at the position of the starting point when creating the string representation.

...read moreread less

Abstract: Presents some new results on using the theory of error control codes in pattern recognition. For the shapes described by means of primitive strings , a recognition algorithm is proposed based on string matching. By using the polynomial cyclic code classification, it is obtained the invariance at the position of the starting point when creating the string representation. >

...read moreread less

Showing papers on "Approximate string matching published in 1992"