scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
25 Jun 2003
TL;DR: This work provides an answer to the question whether the MEDIAN STRING problem is NP-complete for finite and even binary alphabets and gives the complexity of the related CENTRESTRING problem.
Abstract: Given a finite set of strings, the MEDIAN STRING problem consists in finding a string that minimizes the sum of the distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in Classification, in Speech and Pattern Recognition, and in Computational Biology. In the latter, MEDIAN STRING is related to the key problem of Multiple Alignment. In the recent literature, one finds a theorem stating the NP-completeness of the MEDIAN STRING for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the MEDIAN STRING problem is NP-complete for finite and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related CENTRE STRING problem. Moreover, we study the parametrized complexity of both problems with respect to the number of input strings.

35 citations

Book ChapterDOI
29 Apr 1992
TL;DR: The standard string matching problem involves finding all occurrences of a single pattern in a single text, while there are some domains in which it is more appropriate to deal with dictionaries of patterns.
Abstract: The standard string matching problem involves finding all occurrences of a single pattern in a single text. While this approach works well in many application areas, there are some domains in which it is more appropriate to deal with dictionaries of patterns. A dictionary is a set of patterns; the goal of dictionary matching is to find all dictionary patterns in a given text, simultaneously.

34 citations

Patent
10 Jun 2009
TL;DR: In this paper, the inverted indexes are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves.
Abstract: In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy.

34 citations

Book ChapterDOI
TL;DR: It turns out that this new variant of the Boyer-Moore string matching algorithm achieve very good results in terms of both time efficiency and number of character inspections, especially in the cases in which the patterns are very short.
Abstract: We present a new variant of the Boyer-Moore string matching algorithm which, though not linear, is very fast in practice. We compare our algorithm with the Horspool, Quick Search, Tuned Boyer-Moore, and Reverse Factor algorithms, which are among the fastest string matching algorithms for practical uses. It turns out that our algorithm achieve very good results in terms of both time efficiency and number of character inspections, especially in the cases in which the patterns are very short.

34 citations

Proceedings ArticleDOI
08 Apr 2019
TL;DR: M-Join is proposed, a multi-level filtering approach for fuzzy string similarity join that provides a flexible framework that can support multiple similarity functions at both levels and clearly outperforms state-of-the-art methods.
Abstract: As an essential operation in data integration and data cleaning, similarity join has attracted considerable attention from the database community. In many application scenarios, it is essential to support fuzzy matching, which allows approximate matching between elements that improves the effectiveness of string similarity join. To describe the fuzzy matching between strings, we consider two levels of similarity, i.e., element-level and record-level similarity. Then the problem of calculating fuzzy matching similarity can be transformed into finding the weighted maximal matching in a bipartite graph. In this paper, we propose MF-Join, a multi-level filtering approach for fuzzy string similarity join. MF-Join provides a flexible framework that can support multiple similarity functions at both levels. To improve performance, we devise and implement several techniques to enhance the filter power. Specifically, we utilize a partition-based signature at the element-level and propose a frequency-aware partition strategy to improve the quality of signatures. We also devise a count filter at the record level to further prune dissimilar pairs. Moreover, we deduce an effective upper bound for the record-level similarity to reduce the computational overhead of verification. Experimental results on two popular datasets shows that our proposed method clearly outperforms state-of-the-art methods.

34 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839