Topic
Approximate string matching
About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.
Papers published on a yearly basis
Papers
More filters
••
29 Oct 2007TL;DR: This paper presents an ASM algorithm that works on top of a Lempel-Ziv self-index, and shows experimentally that the algorithm has a competitive performance and provides a useful space-time tradeoff compared to classical indexes.
Abstract: A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T, thus it actually replaces T. Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in computational biology applications. We present an ASM algorithm that works on top of a Lempel-Ziv self-index. We consider the so-called hybrid indexes, which are the best in practice for this problem. We show that a Lemplel-Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to the Lempel-Ziv index. We show experimentally that our algorithm has a competitive performance and provides a useful space-time tradeoff compared to classical indexes.
9 citations
••
18 Jun 2008TL;DR: The longest common parameterized subsequence problem which combines the LCS measure with parameterized matching is considered, and it is proved that the problem is NP-hard, and a couple of approximation algorithms for the problem are shown.
Abstract: The well-known problem of the longest common subsequence (LCS), of two strings of lengths nand mrespectively, is O(nm)-time solvable and is a classical distance measure for strings. Another well-studied string comparison measure is that of parameterized matching, where two equal-length strings are a parameterized-match if there exists a bijection on the alphabets such that one string matches the other under the bijection. All works associated with parameterized pattern matching present polynomial time algorithms.
There have been several attempts to accommodate parameterized matching along with other distance measures, as these turn out to be natural problems, e.g., Hamming distance, and a bounded version of edit-distance. Several algorithms have been proposed for these problems.
In this paper we consider the longest common parameterized subsequence problem which combines the LCS measure with parameterized matching. We prove that the problem is NP-hard, and then show a couple of approximation algorithms for the problem.
9 citations
•
TL;DR: It is concluded that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison.
Abstract: Exact String matching algorithms has been very significant in many applications in the last two decades. This is due to the advancement in technology that produces large volumes of data. The main factors in string matching algorithms are the number of attempts, the number of character comparison and the running time. These factors are influenced by the type of algorithm, type of data, data size and length of pattern used. In this article, we perform review for advantages and disadvantages of executing exact string matching algorithm. We conclude that the suffix automata and hybrid are the faster algorithms with the lowest number of attempts and the hashing approaches have the lower number of comparison. The bit parallelism algorithms have the similar limitations. Keyword: Exact string matching, character comparison, number of attempt, limitations;
9 citations
••
TL;DR: This paper proposes a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching that is more efficient than the original BPD, and takes over/extends the role of theOriginal BPD as one of the most practical approximate string Matching algorithms under moderate values of k and m.
9 citations
••
TL;DR: Ukkonen's (pair-wise) string alignment technique is extended to the problem of finding an optimal alignment for three strings, which has worst-case time-complexity O(nd2) and space-complexe O(d3), where the string lengths are ñ and d is the three-way edit-distance based on tree-costs.
9 citations