scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
Zongwei Zhou1, Yibo Xue1, Junda Liu1, Wei Zhang1, Jun Li1 
12 Dec 2007
TL;DR: A novel algorithm called Multi-phase Dynamic Hash (MDH) is proposed, which cut down the memory requirement by multi-phase hash and explore valuable pattern set information to speed up searching procedure by dynamic-cut heuristics.
Abstract: String matching algorithm is one of the key technologies in numerous network security applications and systems. Nowadays, the increasing network bandwidth and pattern set size both calls for high speed string matching algorithm for large-scale pattern set. This paper proposes a novel algorithm called Multi-phase Dynamic Hash (MDH), which cut down the memory requirement by multi-phase hash and explore valuable pattern set information to speed up searching procedure by dynamic-cut heuristics. The experimental results demonstrate that MDH can improve matching performance by 100% to 300% comparing with other popular algorithms, whereas the memory requirement stays in a comparatively low level.

20 citations

Book ChapterDOI
19 Jun 2005
TL;DR: Theoretically, it is proved that FAAST on average skips more characters than the Tarhio-Ukkonen algorithm in a single shift, and makes fewer character comparisons in an entire matching process.
Abstract: Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch problem, whose objective is to find all occurrences of a short pattern in a long text string with at most k mismatches. FAAST generalizes the well-known Tarhio-Ukkonen algorithm by requiring two or more matches when calculating shift distances, which makes the approximate string matching process significantly faster than the Tarhio-Ukkonen algorithm. Theoretically, we prove that FAAST on average skips more characters than the Tarhio-Ukkonen algorithm in a single shift, and makes fewer character comparisons in an entire matching process. Experiments on both simulated data sets and real gene sequences also demonstrate that FAAST runs several times faster than the Tarhio-Ukkonen algorithm in all the cases that we tested.

20 citations

Book ChapterDOI
01 Nov 1993
TL;DR: This article considers some well known processes and considers their prediction based on a model which takes precisely into account the influence of the parameters involved in the modification of their state, based on an approximate string matching.
Abstract: This article deals with the prediction of processes. Research work on this topic considers some well known processes. Their prediction is based on a model which takes precisely into account the influence of the parameters involved in the modification of their state. Such models are not conceivable here: the point is indeed of some processes that human beings control little, like forest fires, which is the subject of the system presented here. The reasoning that it uses relies on cases. We consider indeed that if two processes behaved the same way during a certain interval then their behaviours are very likely to be similar afterwards. The matching is based on an approximate string matching. Because of the complexity of the handled processes, points of view have been introduced. Their consideration requires a matching adapted to each one. They are presented here.

20 citations

Proceedings ArticleDOI
18 Sep 2011
TL;DR: This paper presents a novel approach towards word spotting using string matching of character primitives, which is tested on historical books of French alphabets and has obtained encouraging results.
Abstract: Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ ageing effects. For efficient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to find similar words. The matching similarity is used to rank the retrieved words. The proposed method is tested on historical books of French alphabets and we have obtained encouraging results from the experiment.

20 citations

01 Jan 2003
TL;DR: A construction algorithm is presented which is currently the fastest practical construction method for large suffix trees and a clustered storage scheme for the suffix tree is proposed that takes into account the locality behavior of typical query types, which leads to a significant speed-up particularly for the exact string matching problem.
Abstract: Suffix trees have been established as one of the most versatile index structures for unstructured string data like genomic sequences and other strings. In this work, our goal is the development of algorithms for the efficient construction of suffix trees for very large strings and their convenient storage regarding fast access when main memory is limited. We present a construction algorithm which, to the best of our knowledge, is currently the fastest practical construction method for large suffix trees. Further we propose a clustered storage scheme for the suffix tree that takes into account the locality behavior of typical query types, which leads to a significant speed-up particularly for the exact string matching problem. For very large strings the query time is faster than that of other recent index structures like the enhanced suffix array.

20 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839