scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: A suboptimal average-case algorithm for exact circular string matching requiring time O(n) requiring time k=O(m/logm) for moderate values of k, and how the same results can be easily obtained under the edit distance model.
Abstract: Background Circular string matching is a problem which naturally arises in many biological contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal average-case algorithms for exact circular string matching. Approximate circular string matching is a rather undeveloped area.

34 citations

Journal ArticleDOI
TL;DR: This paper shows how multiple patterns can be packed into a single computer word so as to search for all them simultaneously, and how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences.
Abstract: Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(⌈m/w⌉n), where w is the number of bits in the computer word. Although this is asymptotically the optimal bit-parallel speedup over the basic O(mn) time algorithm, it wastes bit-parallelism's power in the common case where m is much smaller than w, since w−m bits in the computer words are unused. In this paper, we explore different ways to increase the bit-parallelism when the search pattern is short. First, we show how multiple patterns can be packed into a single computer word so as to search for all them simultaneously. Instead of spending O(rn) time to search for r patterns of length m≤w/2, we need O(⌈rm/w⌉n) time. Second, we show how the mechanism permits boosting the search for a single pattern of length m≤w/2, which can be searched for in O(⌈n/⌊w/m⌋⌉) bit-parallel steps instead of O(n). Third, we show how to extend these algorithms so that the time bounds essentially depend on k instead of m, where k is the maximum number of differences permitted. Finally, we show how the ideas can be applied to other problems such as multiple exact string matching and one-against-all computation of edit distance and longest common subsequences. Our experimental results show that the new algorithms work well in practice, obtaining significant speedups over the best existing alternatives, especially on short patterns and moderate number of differences allowed. This work fills an important gap in the field, where little work has focused on very short patterns.

34 citations

Proceedings Article
20 Jul 1998
TL;DR: In this paper, a small amount of germanium or gallium was added to the ferrite and an atmosphere, such as air, was used during the sintering and cooling steps.
Abstract: Desirable properties of manganese zinc ferrites are obtained without the need for controlling or changing the oxygen partial pressure during the sintering and cooling steps by adding a small amount of germanium or gallium to the ferrite and using an atmosphere, such as air, during the sintering and cooling steps, that has at least 1 percent oxygen by volume.

34 citations

Proceedings Article
06 Jan 2007
TL;DR: This work introduces a new filtering method for approximate string matching called the suffix filter, which has some similarity with well-known filtration algorithms, which it is demonstrated experimentally that suffix filters are faster in practice than factor filters.
Abstract: We introduce a new filtering method for approximate string matching called the suffix filter. It has some similarity with well-known filtration algorithms, which we call factor filters, and which are among the best practical algorithms for approximate string matching using a text index. Suffix filters are stronger, i.e., produce fewer false matches than factor filters. We demonstrate experimentally that suffix filters are faster in practice, too.

33 citations

Proceedings Article
22 Jan 2011
TL;DR: For the CSSP, a new formulation is given that is polytope-wise stronger than a straightforward extension of the CSP formulation and a strengthening constraint class is proposed that speeds up the running time.
Abstract: Let S be a set of k strings over an alphabet Σ each string has a length between e and n. The Closest Substring Problem (CSSP) is to find a minimal integer d (and a corresponding string t of length e) such that each string s ∈ S has a substring of length e with Hamming distance at most d to t. We say t is the closest substring to S. For e = n, this problem is known as the Closest String Problem (CSP). Particularly in computational biology, the CSP and CSSP have found numerous practical applications such as identifying regulatory motifs and approximate gene clusters, and in degenerate primer design. We study ILP formulations for both problems. Our experiments show that a position-based formulation for the CSP performs very well on real-world instances emerging from biology. Even on randomly generated instances that are hard to solve to optimality, solving the root relaxation leads to solutions very close to the optimum. For the CSSP we give a new formulation that is polytope-wise stronger than a straightforward extension of the CSP formulation. Furthermore we propose a strengthening constraint class that speeds up the running time.

33 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839