scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
Zvi Galil1
TL;DR: A sufficient condition for an on-line algorithm to be transformed into a real-time algorithm is given and this condition is used to construct real- time algorithms for various string-matching problems by random access machines and by Turing machines.
Abstract: A sufficient condition for an on-line algorithm to be transformed into a real-time algorithm is given. This condition is used to construct real-time algorithms for various string-matching problems by random access machines and by Turing machines.

69 citations

Journal ArticleDOI
TL;DR: The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exactstring matching algorithms.
Abstract: String matching has been an extensively studied research domain in the past two decades due to its various applications in the fields of text, image, signal, and speech processing. As a result, choosing an appropriate string matching algorithm for current applications and addressing challenges is difficult. Understanding different string matching approaches (such as exact string matching and approximate string matching algorithms), integrating several algorithms, and modifying algorithms to address related issues are also difficult. This paper presents a survey on single-pattern exact string matching algorithms. The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exact string matching algorithms.

69 citations

Journal ArticleDOI
TL;DR: This paper focuses on string distance computation based on a set of edit operations, which is based on dynamic programming and has a time complexity of O(n . m), where n and m give the lengths of the two strings to be compared.

69 citations

Proceedings ArticleDOI
01 May 1999
TL;DR: In this paper, a pruned count-suffix tree is used to estimate the selectivity of a sub-string matching query based on all maximal substrings of the query in the tree.
Abstract: With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees as the basic framework for substring selectivity estimation. We present a novel technique to obtain a good estimate for a given substring matching query, called MO (for Maximal Overlap), that estimates the selectivity of a query based on all maximal substrings of the query in the pruned count-suffix tree. We show that MO is provably better than the (independence-based) substring selectivity estimation technique proposed by Krishnan et al. [6], called KVI, under the natural assumption that strings exhibit the so-called “short memory” property. We complement our analysis with an experiment, using a real AT&T data set, that demonstrates that MO is substantially superior to KVI in the quality of the estimate. Finally, we develop and analyze two selectivity estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given pruned count-suffix tree. We show that KVI, MO, MOC and MOLC illustrate an interesting tradeoff between estimation accuracy and computational efficiency. *This work was done when the author was at AT&T Labs-Research, Florham Park, NJ 07932, USA. +This work was done when the author was on sabbatical at AT&T Labs-Research, Florham Park, NJ 07932, USA. Permission to make digital or hard copies ol’all or part ol‘this work for personal or classroom use is granted without fee provided that topics are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the lirst page. To copy otherwise, to republish, lo post on servers or to redistribute to lists. requires prior specific permission andior a fee. PODS ‘W Philadelphia PA Copyright ACM 1999 1-58 113-062-7/99/05...$5.00

67 citations

Book ChapterDOI
Ron Y. Pinter1
01 Jan 1985
TL;DR: This paper considers the extension of the methods of Aho and Corasick to deal with patterns involving more expressive descriptions, such as don’t-care (wild-card) symbols, complements, etc.
Abstract: The occurrences of a constant pattern in a given text string can be found in linear time using the famous algorithm of Knuth, Morris, and Pratt [KMP]. Aho and Corasick [AC] independently solved the problem for patterns consisting of a set of strings, where the occurrence of one member is considered a match. Both algorithms preprocess the pattern so that the text can be searched efficiently. This paper considers the extension of their methods to, deal with patterns involving more expressive descriptions, such as don’t-care (wild-card) symbols, complements, etc. Such extensions are useful in the context of clever text-editors and the analysis of chemical compounds.

67 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839