Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

MDH: a high speed multi-phase dynamic hash string matching algorithm for large-scale pattern set

[...]

Zongwei Zhou¹, Yibo Xue¹, Junda Liu¹, Wei Zhang¹, Jun Li¹ - Show less +1 more•Institutions (1)

Tsinghua University¹

12 Dec 2007

TL;DR: A novel algorithm called Multi-phase Dynamic Hash (MDH) is proposed, which cut down the memory requirement by multi-phase hash and explore valuable pattern set information to speed up searching procedure by dynamic-cut heuristics.

...read moreread less

Abstract: String matching algorithm is one of the key technologies in numerous network security applications and systems. Nowadays, the increasing network bandwidth and pattern set size both calls for high speed string matching algorithm for large-scale pattern set. This paper proposes a novel algorithm called Multi-phase Dynamic Hash (MDH), which cut down the memory requirement by multi-phase hash and explore valuable pattern set information to speed up searching procedure by dynamic-cut heuristics. The experimental results demonstrate that MDH can improve matching performance by 100% to 300% comparing with other popular algorithms, whereas the memory requirement stays in a comparatively low level.

...read moreread less

20 citations

Book Chapter•DOI•

A fast algorithm for approximate string matching on gene sequences

[...]

Zheng Liu¹, Xin Chen¹, James Borneman¹, Tao Jiang¹•Institutions (1)

University of California, Riverside¹

19 Jun 2005

TL;DR: Theoretically, it is proved that FAAST on average skips more characters than the Tarhio-Ukkonen algorithm in a single shift, and makes fewer character comparisons in an entire matching process.

...read moreread less

Abstract: Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch problem, whose objective is to find all occurrences of a short pattern in a long text string with at most k mismatches. FAAST generalizes the well-known Tarhio-Ukkonen algorithm by requiring two or more matches when calculating shift distances, which makes the approximate string matching process significantly faster than the Tarhio-Ukkonen algorithm. Theoretically, we prove that FAAST on average skips more characters than the Tarhio-Ukkonen algorithm in a single shift, and makes fewer character comparisons in an entire matching process. Experiments on both simulated data sets and real gene sequences also demonstrate that FAAST runs several times faster than the Tarhio-Ukkonen algorithm in all the cases that we tested.

...read moreread less

20 citations

Book Chapter•DOI•

[...]

Sophie Rougegrez

01 Nov 1993

TL;DR: This article considers some well known processes and considers their prediction based on a model which takes precisely into account the influence of the parameters involved in the modification of their state, based on an approximate string matching.

...read moreread less

Abstract: This article deals with the prediction of processes. Research work on this topic considers some well known processes. Their prediction is based on a model which takes precisely into account the influence of the parameters involved in the modification of their state. Such models are not conceivable here: the point is indeed of some processes that human beings control little, like forest fires, which is the subject of the system presented here. The reasoning that it uses relies on cases. We consider indeed that if two processes behaved the same way during a certain interval then their behaviours are very likely to be similar afterwards. The matching is based on an approximate string matching. Because of the complexity of the handled processes, points of view have been introduced. Their consideration requires a matching adapted to each one. They are presented here.

...read moreread less

20 citations

Proceedings Article•DOI•

Word Retrieval in Historical Document Using Character-Primitives

[...]

Partha Pratim Roy¹, Jean-Yves Ramel¹, Nicolas Ragot¹•Institutions (1)

François Rabelais University¹

18 Sep 2011

TL;DR: This paper presents a novel approach towards word spotting using string matching of character primitives, which is tested on historical books of French alphabets and has obtained encouraging results.

...read moreread less

Abstract: Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ ageing effects. For efficient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to find similar words. The matching similarity is used to rank the retrieved words. The proposed method is tested on historical books of French alphabets and we have obtained encouraging results from the experiment.

...read moreread less

20 citations

Suffix Tree Construction and Storage with Limited Main Memory

[...]

Klaus-Bernd Schrmann, Jens Stoye

01 Jan 2003

TL;DR: A construction algorithm is presented which is currently the fastest practical construction method for large suffix trees and a clustered storage scheme for the suffix tree is proposed that takes into account the locality behavior of typical query types, which leads to a significant speed-up particularly for the exact string matching problem.

...read moreread less

Abstract: Suffix trees have been established as one of the most versatile index structures for unstructured string data like genomic sequences and other strings. In this work, our goal is the development of algorithms for the efficient construction of suffix trees for very large strings and their convenient storage regarding fast access when main memory is limited. We present a construction algorithm which, to the best of our knowledge, is currently the fastest practical construction method for large suffix trees. Further we propose a clustered storage scheme for the suffix tree that takes into account the locality behavior of typical query types, which leads to a significant speed-up particularly for the exact string matching problem. For very large strings the query time is faster than that of other recent index structures like the enhanced suffix array.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics