scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: The stochastic model allows us to learn a string-edit distance function from a corpus of examples and is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

897 citations

Journal ArticleDOI
TL;DR: An algorithm is described for computing the edit distance between two strings of length n and m, n ⪖ m, which requires O(n · max(1, mlog n) steps whenever the costs of edit operations are integral multiples of a single positive real number and the alphabet for the strings is finite.

739 citations

Journal ArticleDOI
TL;DR: This approach allows an efficient and natural way to construct iconic indexes for pictures and proves the necessary and sufficient conditions to characterize ambiguous pictures for reduced 2D strings as well as normal 2-D strings.
Abstract: In this paper, we describe a new way of representing a symbolic picture by a two-dimensional string. A picture query can also be specified as a 2-D string. The problem of pictorial information retrieval then becomes a problem of 2-D subsequence matching. We present algorithms for encoding a symbolic picture into its 2-D string representation, reconstructing a picture from its 2-D string representation, and matching a 2-D string with another 2-D string. We also prove the necessary and sufficient conditions to characterize ambiguous pictures for reduced 2-D strings as well as normal 2-D strings. This approach thus allows an efficient and natural way to construct iconic indexes for pictures.

674 citations

Journal ArticleDOI
TL;DR: Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword.
Abstract: Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For sinuiarity problems difference measures are surveyed, with a full description of the wellestablmhed dynamic programming method relating this to the approach using probabilities and likelihoods. Searches for approximate matches in large sets using a difference function are seen to be an open problem still, though several promising ideas have been suggested. Approximate matching (error correction) during parsing is briefly reviewed.

672 citations

Journal ArticleDOI
06 Jan 1992
TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.
Abstract: We study approximate string matching in connection with two string distance functions that are computable in linear time. The first function is based on the so-called $q$-grams. An algorithm is given for the associated string matching problem that finds the locally best approximate occurences of pattern $P$, $|P|=m$, in text $T$, $|T|=n$, in time $O(n\log (m-q))$. The occurences with distance $\leq k$ can be found in time $O(n\log k)$. The other distance function is based on finding maximal common substrings and allows a form of approximate string matching in time $O(n)$. Both distances give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edit distance based string matching.

665 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839