scispace - formally typeset
Search or ask a question

Showing papers on "Approximate string matching published in 1990"


Journal ArticleDOI
TL;DR: Given a text string, a pattern string, and an integer k, a new algorithm for finding all occurrences of the pattern string in the text string with at most k differences is presented.
Abstract: Given a text string, a pattern string, and an integer k, a new algorithm for finding all occurrences of the pattern string in the text string with at most k differences is presented. Both its theoretical and practical variants improve upon the known algorithms.

186 citations


Proceedings ArticleDOI
22 Oct 1990
TL;DR: The authors have devised an algorithm that, for k
Abstract: The k differences approximate string matching problem specifies a text string of length n, a pattern string of length m, and the number k of differences (insertions, deletions, substitutions) allowed in a match, and asks for every location in the text where a match occurs. Previous algorithms required at least O(nk) time. When k is as large as a fraction of m, no substantial progress has been made over O(nm) dynamic programming. The authors have investigated much faster algorithms for restricted cases of the problem, such as when the text string is random and errors are not too frequent. They have devised an algorithm that, for k >

117 citations


Journal ArticleDOI
TL;DR: This method of spelling correction consisting of two steps: selection of candidate words, and approximate string matching between the input word and each candidate word is applied to the post-processing of a printed alphanumeric OCR on a personal computer, thus making the OCR more reliable and user-friendly.

64 citations


Journal ArticleDOI
TL;DR: An average case analysis of the Karp-Rabin string matching algorithm, a probabilistic algorithm, that adapts hashing techniques to string searching, and an efficient implementation of this algorithm are presented.

39 citations


Book ChapterDOI
11 Jul 1990
TL;DR: In this article, a generalized Boyer-Moore algorithm was proposed for approximate string matching with k mismatches and k differences, where the problem is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).
Abstract: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).

36 citations


Book ChapterDOI
01 Jan 1990

25 citations


Proceedings ArticleDOI
01 Jan 1990
TL;DR: The main contribution is a linear time algorithm for the problem of pattern matching with scaling, which is based on a new algorithmic approach to two-dimensional string matching and shows how to enhance it so that its running time may become sublinear with respect to the original redundant input representation.
Abstract: The problem of pattern matching with scaling is defined. The input for the two-dimensional version of the problem consists of an n × n “text” matrix and an m × m “pattern” matrix. We want to find all occurrences of the pattern in the text, scaled to all natural multiples. That is, for every natural number i, 1 ≤ i ≤ [ n m ] we seek all occurrences of the pattern in the text, where each character of the pattern corresponds to an i × i square in the text. This problem is useful for some tasks in computer vision. Our main contribution is a linear time algorithm for the problem. We also consider situations where the text is provided in a less redundant form. For instance, suppose that a repeating character is compressed into one character, along with the number of repetitions. We show how to enhance our algorithm so that its running time may become sublinear with respect to the original redundant input representation. Our algorithms are based on a new algorithmic approach to two-dimensional string matching. Unlike existing approaches, the new approach does not work by reducing a two-dimensional problem into an one-dimensional problem.

19 citations


Proceedings Article
11 Jul 1990
TL;DR: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching and solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet.
Abstract: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).

10 citations



Journal ArticleDOI
TL;DR: A set of theoretical results allows to extend known algorithms to solve the approximate string matching problem with O(kn) sequential time and O(k + log m) parallel time on a 4PRAM model with max{n + k + 1, mp2} processors.

5 citations


Journal ArticleDOI
J. H. Bradford1
TL;DR: An algorithm is introduced that encodes pairs of strings as binary numbers such that the Hamming distance between the binary codewords is equal to the Levenshtein Distance between the original strings.

Proceedings ArticleDOI
12 Aug 1990
TL;DR: An algorithm is developed for determining relative string similarity using parallelism and iterative techniques and an architecture for comparing strings using this algorithm is also developed.
Abstract: Approximate string matching attempts to determine how similar two strings are. An algorithm is developed for determining relative string similarity. An architecture for comparing strings using this algorithm is also developed. Using parallelism and iterative techniques, the similarity value is calculated. The length and number of matching substrings determine the amount of similarity. >

Book ChapterDOI
01 Jan 1990-Sequence
TL;DR: The concept of context-dependent errors is introduced, where the differences between P and T depend on errors in T that are functions of the context of P and/or T, and sequential and parallel algorithms for some relevant sets of weighted errors are developed.
Abstract: The approximate string matching problem consists of finding all the occurrences of a pattern P of length m in a text T of length n, m≪n, where at most k differences are allowed between P and each of its occurrences in T. Various types of differences have been studied in the literature. We introduce here the concept of context-dependent errors, where the differences between P and T depend on errors in T that are functions of the context of P and/or T, and develop sequential and parallel algorithms for some relevant sets of weighted errors. In particular, we allow: context-dependent mismatches, extra and missing characters (Problem I); the same errors plus transpositions of two consecutive characters (Problem II); and a more general choice of errors (Problem III). A set of theoretical results allows to solve Problems I and II with O(kn) time sequential algorithms, and O(k + log m) parallel time on a PRAM model with max{n+k+l, m2{ processors. Problem III is instead solved in O(mn) sequential time, and O(m+log n) parallel time with n+1 processors on a feasible bounded degree network.

Proceedings ArticleDOI
05 Apr 1990
TL;DR: It is shown how approximate sting matching can be involved in the automation of various aspects of word game construction or solution and the two main issues which arise have to do with lexical processing and heuristics.
Abstract: It is shown how approximate sting matching can be involved in the automation of various aspects of word game construction or solution. In the discussion, the authors try to identify and explicate the underlying issues in computational linguistics and suggest techniques which have been used to address these issues. It is shown that the two main issues which arise in this context have to do with lexical processing and heuristics, issues which arise in more practical contexts as well. >

Journal ArticleDOI
TL;DR: This work addresses the question of augmenting the pattern matching machine constructed by the Aho-Corasick algorithm with a new pattern string, both on-line and off-line, and shows that augmenting a machine of N nodes with anew pattern string of length m takes Θ(mN) time on- line and Θ-N time off- line.
Abstract: The Aho-Corasick algorithm is a well-known method of determining the occurrences of one of several given pattern strings in a given text string. We address the question of augmenting the pattern matching machine constructed by this algorithm with a new pattern string, both on-line and off-line. We show that augmenting a machine of N nodes with a new pattern string of length m takes Θ(mN) time on-line and Θ(N) time off-line.