scispace - formally typeset
Search or ask a question

Showing papers on "Approximate string matching published in 1988"


Journal ArticleDOI
01 Aug 1988
TL;DR: This work presents an algorithm for finding all occurrences of the pattern in the text, each with at most k differences, given a text of length n, a pattern of length m, and an integer k.
Abstract: Consider the string matching problem where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n , a pattern of length m , and an integer k , we present an algorithm for finding all occurrences of the pattern in the text, each with at most k differences. It runs in O ( m + nk 2 ) time for an alphabet whose size is fixed. For general input the algorithm requires O ( m log m + nk 2 ) time. In both cases the space requirement is O ( m ).

203 citations


Journal ArticleDOI
TL;DR: This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms and special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximatestring matching.

153 citations


Journal ArticleDOI
TL;DR: This paper presents a CRCW parallel RAM algorithm that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors that requires Θ(n2) space.
Abstract: Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires ź(n2) space. However, the space needed can be reduced toO(n1+ź) for any 0< ź ≤1, with a corresponding slow-down proportional to 1/ź. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.

152 citations


Journal ArticleDOI
TL;DR: A new similarity measure based on the Levenshtein metric is defined for this comparison and the resulting method is both computationally fast and storage‐efficient.
Abstract: Approximate string matching is an important operation in information systems because an input string is often an inexact match to the strings already stored. Commonly known accurate methods are computationally expensive as they compare the input string to every entry in the stored dictionary. This paper describes a two-stage process. The first uses a very compact n-gram table to preselect sets of roughly similar strings. The second stage compares these with the input string using an accurate method to give an accurately matched set of strings. A new similarity measure based on the Levenshtein metric is defined for this comparison. The resulting method is both computationally fast and storage-efficient.

70 citations



Book ChapterDOI
29 Aug 1988
TL;DR: Two string-matching algorithms belonging to the second family are presented, which respectively obey to time and space constraints.
Abstract: Pattern recognition in a constantly growing field of research. Identification of pattern in images, for instance, is a first step towards their interpretation. More generally, all formal systems handling strings of symbols involve parsing phases to recognize certain patterns. Regular expressions is one of the techniques to specify simple patterns [26]. It leads to practicable algorithms available under most operating systems or edition tons especially with Unix. String-matching is a particular case of pattern recognition. It consists in locating a word inside another word, called the text. Solutions to this problem can be divided into two families. In the first one the text is considered as fixed while the word is variable. This situation occurs when the text is a dictionary, for example. The basic solution of that sort is due to Weiner who introduced the notion of position trees [29]. It is a kind of index which as been improved in different ways (see [21], [5], [10]). For the second family of solutions to string-matching, it is the word that is fixed. The two most famous and efficient string-matching algorithms of this family have been designed by Knuth, Morris & Pratt [t8] and Boyer & Moore [7]. They have been subject to several studies, improvements or extensions (see [1], [11], [13-16], [22], [23], [25], [28]). A variation to the initial problem happens when approximate patterns are considered (see [20], [27]). Stringmatching is close to detection of repetitions in strings (see [3], [10], [17], [25]). In fact, the study of regularities in strings is a part of the analysis of string-matching algorithms. In this paper, two string-matching algorithms belonging to the second family are presented. They respectively obey to time and space constraints. Both algorithms start by a first phase during which the word alone is processed. Then, the search is done during a second phase which essentially supports the contraints.

32 citations


01 Jan 1988
TL;DR: An algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings and the largest common submatrix of two matrices is considered and shown to be NP-hard.
Abstract: We consider the problem of determining in parallel the cost of converting a source string to a destination string by a sequence of insert, delete and transform operations. Each operation has an integer cost in some fixed range. We present an algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings. The best known sequential algorithm [MP83] runs in time 0(n/ log n) for strings of length n, indicating that our parallel algorithm (with time-processor product equal to 0(mn log m log n)) is nearly optimal. An instance of the edit distance problem is represented as a graph. The algorithm finds the shortest path in the graph using a path doubling method with efficient pruning due to the structure of the problem. Extensions of the algorithm solve approximate string matching and local best fit problems. The problem of finding the largest common submatrix of two matrices is considered and shown to be NP-hard. Finally we present an algorithm for exact two-dimensional pattern matching that runs in OClog n) time using n processors for a n x n search matrix.

20 citations


Book ChapterDOI
21 Dec 1988
TL;DR: A string-matching algorithm with the following properties: it is linear in time with a small multiplicative constant during all its phases; it preprocesses the string and scans the searched text with constant memory space in addition to the strings.
Abstract: We present a string-matching algorithm with the following properties: it is linear in time with a small multiplicative constant during all its phases; it preprocesses the string and scans the searched text with constant memory space in addition to the strings.

6 citations


Journal ArticleDOI
TL;DR: A quantitative analysis of the widely recognized inefficiency of the SNOBOL4 pattern matching algorithm is presented and the possibility of increasing the efficiency of pattern matching by special case processing is discussed and a new approach for string processing languages design along this line is proposed.

1 citations


Book ChapterDOI
28 Mar 1988
TL;DR: An algorithm is developed for determining relative string similarity and an architecture for comparing strings using this algorithm is also developed.
Abstract: Approximate string matching attempts to determine how similar two strings are. An algorithm is developed for determining relative string similarity. An architecture for comparing strings using this algorithm is also developed.

Journal ArticleDOI
TL;DR: The paper compares the performance of the dynamic programming algorithm and the Proximity processor, and highlights the speed and recall advantages of the latter.

Journal Article
TL;DR: In this article, an approximate string matching algorithm was developed to determine how similar two strings are, and an architecture for comparing strings using this algorithm was also developed, which can be used to determine relative string similarity.