scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
11 Sep 2002
TL;DR: This paper investigates the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings, and shows an improvement in performance up to 90% with respect to the basic case.
Abstract: Searching in a large data set those strings that are more similar, according to the edit distance, to a given one is a time-consuming process. In this paper we investigate the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings. Using the bag distance as an approximation of the edit distance, we show an improvement in performance up to 90% with respect to the basic case. This, along with the fact that our solution is independent on both the distance used in the pre-test and on the underlying metric index, demonstrates that metric indices are a powerful solution, not only for many modern application areas, as multimedia, data mining and pattern recognition, but also for the string matching problem.

57 citations

Journal ArticleDOI
TL;DR: It is proved that a restricted version of the closest string problem has the same parameterized complexity as the closest substring, answering an open question in the literature.
Abstract: The closest string problem and the closest substring problem are all natural theoretical computer science problems and find important applications in computational biology. Given $n$ input strings, the closest string (substring) problem finds a new string within distance $d$ to (a substring of) each input string and such that $d$ is minimized. Both problems are NP-complete. In this paper we propose new algorithms for these two problems. For the closest string problem, we developed an exact algorithm with time complexity $O(n|\Sigma|^{O(d)})$, where $\Sigma$ is the alphabet. This improves the previously best known result $O(nd^{O(d)})$ and results into a polynomial time algorithm when $d=O(\log n)$. By using this algorithm, a polynomial time approximation scheme (PTAS) for the closest string problem is also given with time complexity $O(n^{O(\epsilon^{-2})})$, improving the previously best known $O(n^{O(\epsilon^{-2}\log\frac{1}{\epsilon})})$ PTAS. A new algorithm for the closest substring problem is also proposed. Finally, we prove that a restricted version of the closest substring problem has the same parameterized complexity as the closest substring, answering an open question in the literature.

57 citations

Proceedings Article
11 Jul 2010
TL;DR: This paper presents a general q-gram based framework and proposes two efficient algorithms based on the strategies introduced that show a superior performance in the efficient top-k similar string matching problem.
Abstract: Top-k approximate querying on string collections is an important data analysis tool for many applications, and it has been exhaustively studied However, the scale of the problem has increased dramatically because of the prevalence of the Web In this paper, we aim to explore the efficient top-k similar string matching problem Several efficient strategies are introduced, such as length aware and adaptive q-gram selection We present a general q-gram based framework and propose two efficient algorithms based on the strategies introduced Our techniques are experimentally evaluated on three real data sets and show a superior performance

57 citations

Journal ArticleDOI
TL;DR: Simple and practical algorithms for finding all pattern occurrences in sublinear time on average for parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbol of t.

57 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: A robust method to map detected facial Action Units (AUs) to six basic emotions using a learned statistical relationship and a suitable matching technique to reduce false predictions and improve performance with rule based techniques is presented.
Abstract: We present a robust method to map detected facial Action Units (AUs) to six basic emotions. Automatic AU recognition is prone to errors due to illumination, tracking failures and occlusions. Hence, traditional rule based methods to map AUs to emotions are very sensitive to false positives and misses among the AUs. In our method, a set of chosen AUs are mapped to the six basic emotions using a learned statistical relationship and a suitable matching technique. Relationships between the AUs and emotions are captured as template strings comprising the most discriminative AUs for each emotion. The template strings are computed using a concept called discriminative power. The Longest Common Subsequence (LCS) distance, an approach for approximate string matching, is applied to calculate the closeness of a test string of AUs with the template strings, and hence infer the underlying emotions. LCS is found to be efficient in handling practical issues like erroneous AU detection and helps to reduce false predictions. The proposed method is tested with various databases like CK+, ISL, FACS, JAFFE, MindReading and many real-world video frames. We compare our performance with rule based techniques, and show clear improvement on both benchmark databases and real-world datasets.

56 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839