scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
11 Jul 1990
TL;DR: In this article, a generalized Boyer-Moore algorithm was proposed for approximate string matching with k mismatches and k differences, where the problem is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).
Abstract: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm solves the problem in expected time O(kn(1/(m − k)+k / c)) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes).

36 citations

Proceedings ArticleDOI
06 May 2004
TL;DR: This paper proposes the use of approximate string matching techniques to normalize names in order to overcome the problem of links between two pieces of text where the same name is spelt differently may be missed.
Abstract: In this paper we highlight the problems that arise due to variations of spellings of names that occur in text, as a result of which links between two pieces of text where the same name is spelt differently may be missed. The problem is particularly pronounced in the case of ASR text. We propose the use of approximate string matching techniques to normalize names in order to overcome the problem. We show how we could achieve an improvement if we could tag names with reasonable accuracy in ASR.

36 citations

Journal ArticleDOI
01 Apr 1997
TL;DR: This paper presents a learning-automaton based solution to string taxonomy that utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported.
Abstract: A typical syntactic pattern recognition (PR) problem involves comparing a noisy string with every element of a dictionary, X. The problem of classification can be greatly simplified if the dictionary is partitioned into a set of subdictionaries. In this case, the classification can be hierarchical-the noisy string is first compared to a representative element of each subdictionary and the closest match within the subdictionary is subsequently located. Indeed, the entire problem of subdividing a set of string into subsets where each subset contains "similar" strings has been referred to as the "String Taxonomy Problem". To our knowledge there is no reported solution to this problem. In this paper we present a learning-automaton based solution to string taxonomy. The solution utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported. The power of the scheme for string taxonomy has been demonstrated using random string and garbled versions of string representations of fragments of macromolecules.

36 citations

Journal ArticleDOI
TL;DR: A technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points is presented and results in a family of secondary memory index structures that trade space for time, with no loss of accuracy.

36 citations

Book ChapterDOI
Uzi Vishkin1
15 Jul 1985
TL;DR: Given a text of length n and a pattern, this work presents a parallel linear algorithm for finding all occurrences of the pattern in the text in O(n/p) time using any number of p ≤ n/log n processors on a concurrent-read concurrent-write parallel random-access-machine.
Abstract: Given a text of length n and a pattern, we present a parallel linear algorithm for finding all occurrences of the pattern in the text. The algorithm runs in O(n/p) time using any number of p ≤ n/log n processors on a concurrent-read concurrent-write parallel random-access-machine.

36 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839