Topic
Approximate string matching
About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.
Papers published on a yearly basis
Papers
More filters
••
11 Jul 1990TL;DR: In this article, a generalized Boyer-Moore algorithm was proposed for approximate string matching with k mismatches and k differences, where the problem is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).
Abstract: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).
36 citations
••
06 May 2004TL;DR: This paper proposes the use of approximate string matching techniques to normalize names in order to overcome the problem of links between two pieces of text where the same name is spelt differently may be missed.
Abstract: In this paper we highlight the problems that arise due to variations of spellings of names that occur in text, as a result of which links between two pieces of text where the same name is spelt differently may be missed. The problem is particularly pronounced in the case of ASR text. We propose the use of approximate string matching techniques to normalize names in order to overcome the problem. We show how we could achieve an improvement if we could tag names with reasonable accuracy in ASR.
36 citations
••
01 Apr 1997TL;DR: This paper presents a learning-automaton based solution to string taxonomy that utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported.
Abstract: A typical syntactic pattern recognition (PR) problem involves comparing a noisy string with every element of a dictionary, X. The problem of classification can be greatly simplified if the dictionary is partitioned into a set of subdictionaries. In this case, the classification can be hierarchical-the noisy string is first compared to a representative element of each subdictionary and the closest match within the subdictionary is subsequently located. Indeed, the entire problem of subdividing a set of string into subsets where each subset contains "similar" strings has been referred to as the "String Taxonomy Problem". To our knowledge there is no reported solution to this problem. In this paper we present a learning-automaton based solution to string taxonomy. The solution utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported. The power of the scheme for string taxonomy has been demonstrated using random string and garbled versions of string representations of fragments of macromolecules.
36 citations
••
TL;DR: A technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points is presented and results in a family of secondary memory index structures that trade space for time, with no loss of accuracy.
36 citations
••
15 Jul 1985TL;DR: Given a text of length n and a pattern, this work presents a parallel linear algorithm for finding all occurrences of the pattern in the text in O(n/p) time using any number of p ≤ n/log n processors on a concurrent-read concurrent-write parallel random-access-machine.
Abstract: Given a text of length n and a pattern, we present a parallel linear algorithm for finding all occurrences of the pattern in the text. The algorithm runs in O(n/p) time using any number of p ≤ n/log n processors on a concurrent-read concurrent-write parallel random-access-machine.
36 citations