scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
01 Jan 1992
TL;DR: In this article, the exact number of symbol comparisons that are required to solve the string matching problem was studied and a family of efficient algorithms were presented. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler.
Abstract: We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler. In particular, we give a linear-time algorithm that finds all occurrences of a pattern of length m in a text of length n in [formula] comparisons. The pattern preprocessing takes linear time and makes at most 2 m comparisons. This algorithm establishes that, in general, searching for a long pattern is easier than searching for a short one. We also show that any algorithm in the family of the algorithms presented must make at least [formula] symbol comparisons, for m = 2 k − 1 and any integer k ≥ 1.

15 citations

Journal ArticleDOI
TL;DR: This paper provides an algorithm that also runs in deterministic time O(kNlogM) but achieves a lower variance of min(M/k, M-c)(M-c)/k, which is essentially a factor of k smaller than in previous work.

15 citations

Proceedings Article
01 Jan 2017
TL;DR: A new approach for handling errors in coverable phenomena is used and the approximate cover problem is defined, in which a text is given that is a sequence of some cover repetitions with possible mismatch errors, and a string is sought that covers the text with the minimum number of errors.
Abstract: Regularities in strings arise in various areas of science, including coding and automata theory, formal language theory, combinatorics, molecular biology and many others. A common notion to describe regularity in a string T is a cover, which is a string C for which every letter of T lies within some occurrence of C. The alignment of the cover repetitions in the given text is called a tiling. In many applications finding exact repetitions is not sufficient, due to the presence of errors. In this paper, we use a new approach for handling errors in coverable phenomena and define the approximate cover problem (ACP), in which we are given a text that is a sequence of some cover repetitions with possible mismatch errors, and we seek a string that covers the text with the minimum number of errors. We first show that the ACP is NP -hard, by studying the cover-length relaxation of the ACP, in which the requested length of the approximate cover is also given with the input string. We show that this relaxation is already NP -hard. We also study another two relaxations of the ACP, which we call the partial-tiling relaxation of the ACP and the full-tiling relaxation of the ACP, in which a tiling of the requested cover is also given with the input string. A given full tiling retains all the occurrences of the cover before the errors, while in a partial tiling there can be additional occurrences of the cover that are not marked by the tiling. We show that the partial-tiling relaxation has a polynomial time complexity and give experimental evidence that the full-tiling also has polynomial time complexity. The study of these relaxations, besides shedding another light on the complexity of the ACP, also involves a deep understanding of the properties of covers, yielding some key lemmas and observations that may be helpful for a future study of regularities in the presence of errors.

15 citations

Patent
30 Jul 2003
TL;DR: In this article, the authors proposed a method of determining the similarity of two strings by calculating a Levenshtein matrix of a first string and a second string and then determining the largest common substring.
Abstract: Embodiments of the present invention provide a method of determining the similarity of two strings. The method comprises calculating a Levenshtein matrix of a first string and a second string. A Levenshtein distance is determined from the Levenshtein matrix. A largest common substring is also determined from the Levenshtein matrix. The method may farther comprise determining a numerical score as a function of the Levenshtein distance and the largest common substring.

15 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839