scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper considers a class of opposite problems connected with string noninclusion relations: find a shortest string included in no string of a given finite language and find a longest string including nostring of agiven finite language.
Abstract: For every string inclusion relation there are two optimization problems: find a longest string included in every string of a given finite language, and find a shortest string including every string of a given finite language. As an example, the two well-known pairs of problems, the longest common substring (or subsequence) problem and the shortest common superstring (or supersequence) problem, are interpretations of these two problems. In this paper we consider a class of opposite problems connected with string noninclusion relations: find a shortest string included in no string of a given finite language and find a longest string including no string of a given finite language. The predicate "string $\alpha$ is not included in string $\beta$" is interpreted as either "$\alpha$ is not a substring of $\beta$" or "$\alpha$ is not a subsequence of $\beta$". The main purpose is to determine the complexity status of the string noninclusion optimization problems. Using graph approaches we present polynomial-time algorithms for the first interpretation and NP-hardness proofs for the second. We also discuss restricted versions of the problems, correlations between the string inclusion and noninclusion problems, and generalized problems which are the string inclusion problems for one language and the string noninclusion problems for another. In applications the string inclusion problems are used to find a similarity between any structures which can be represented by strings. Respectively, the noninclusion problems can be used to find a nonsimilarity. Such problems occur in computational molecular biology, data compression, pattern recognition, and flexible manufacturing. The above generalized problems arise naturally in all of these applied areas. Apart from this practical reason, we hope that studying the string noninclusion problems will yield deeper understanding of the string inclusion problems.

12 citations

Book ChapterDOI
12 Jul 2004
TL;DR: This paper presents a solution to the similarity search problem of finding the string that has the smallest edit distance to a query string on a point Q in the form of nearest neighbors.
Abstract: Similarity search is a fundamental problem in computer science. Given a set of points A={A 1,...,A p } from a universe U and a distance measure D, it is possible to pose similarity search queries on a point Q in the form of nearest neighbors (find the string that has the smallest edit distance to a query string) or in the form of furthest neighbors (find the string that has the longest common subsequence with a query string).

12 citations

Journal ArticleDOI
TL;DR: A uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern, also known as Damerau edit distance is discussed.

12 citations

Journal ArticleDOI
TL;DR: This research proposes a hybrid exact string matching algorithm by combining the good properties of the Quick Search and the Skip Search algorithms to demonstrate and devise a better method to solve the string matching problem with higher speed and lower cost.
Abstract: The string matching problem occupies a corner stone in many computer science fields because of the fundamental role it plays in various computer applications. Thus, several string matching algorithms have been produced and applied in most operating systems, information retrieval, editors, internet searching engines, firewall interception and searching nucleotide or amino acid sequence patterns in genome and protein sequence databases. Several important factors are considered during the matching process such as number of character comparisons, number of attempts and the consumed time. This research proposes a hybrid exact string matching algorithm by combining the good properties of the Quick Search and the Skip Search algorithms to demonstrate and devise a better method to solve the string matching problem with higher speed and lower cost. The hybrid algorithm was tested using different types of standard data. The hybrid algorithm provides efficient results and reliability compared with the original algorithms in terms of number of character comparisons and number of attempts when the hybrid algorithm applied with different pattern lengths. Additionally, the hybrid algorithm produced better quality in performance through providing less time complexity for the worst and best cases comparing with other hybrid algorithms.

12 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202133
202030
201948
201839