scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
29 Oct 2007
TL;DR: This paper presents an ASM algorithm that works on top of a Lempel-Ziv self-index, and shows experimentally that the algorithm has a competitive performance and provides a useful space-time tradeoff compared to classical indexes.
Abstract: A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in computational biology applications. We present an ASM algorithm that works on top of a Lempel-Ziv self-index. We consider the so-called hybrid indexes, which are the best in practice for this problem. We show that a Lemplel-Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to the Lempel-Ziv index. We show experimentally that our algorithm has a competitive performance and provides a useful space-time tradeoff compared to classical indexes.

9 citations

Book ChapterDOI
18 Jun 2008
TL;DR: The longest common parameterized subsequence problem which combines the LCS measure with parameterized matching is considered, and it is proved that the problem is NP-hard, and a couple of approximation algorithms for the problem are shown.
Abstract: The well-known problem of the longest common subsequence (LCS), of two strings of lengths nand mrespectively, is O(nm)-time solvable and is a classical distance measure for strings. Another well-studied string comparison measure is that of parameterized matching, where two equal-length strings are a parameterized-match if there exists a bijection on the alphabets such that one string matches the other under the bijection. All works associated with parameterized pattern matching present polynomial time algorithms. There have been several attempts to accommodate parameterized matching along with other distance measures, as these turn out to be natural problems, e.g., Hamming distance, and a bounded version of edit-distance. Several algorithms have been proposed for these problems. In this paper we consider the longest common parameterized subsequence problem which combines the LCS measure with parameterized matching. We prove that the problem is NP-hard, and then show a couple of approximation algorithms for the problem.

9 citations

Journal Article
TL;DR: It is concluded that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison.
Abstract: Exact String matching algorithms has been very significant in many applications in the last two decades. This is due to the advancement in technology that produces large volumes of data. The main factors in string matching algorithms are the number of attempts, the number of character comparison and the running time. These factors are influenced by the type of algorithm, type of data, data size and length of pattern used. In this article, we perform review for advantages and disadvantages of executing exact string matching algorithm. We conclude that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison. The bit parallelism algorithms have the similar limitations. Keyword: Exact string matching, character comparison, number of attempt, limitations;

9 citations

Journal ArticleDOI
TL;DR: This paper proposes a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching that is more efficient than the original BPD, and takes over/extends the role of theOriginal BPD as one of the most practical approximate string Matching algorithms under moderate values of k and m.

9 citations

Journal ArticleDOI
Lloyd Allison1
TL;DR: Ukkonen's (pair-wise) string alignment technique is extended to the problem of finding an optimal alignment for three strings, which has worst-case time-complexity O(nd2) and space-complexe O(d3), where the string lengths are ñ and d is the three-way edit-distance based on tree-costs.

9 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839