scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, INSPIRE is proposed, a general framework, which adopts a unifying strategy for processing different variants of spatial keyword queries, and adopts the auto completion paradigm that generates an initial query as a prefix matching query.
Abstract: Geo-textual data are generated in abundance. Recent studies focused on the processing of spatial keyword queries which retrieve objects that match certain keywords within a spatial region. To ensure effective retrieval, various extensions were done including the allowance of errors in keyword matching and autocompletion using prefix matching. In this paper, we propose INSPIRE, a general framework, which adopts a unifying strategy for processing different variants of spatial keyword queries. We adopt the autocompletion paradigm that generates an initial query as a prefix matching query. If there are few matching results, other variants are performed as a form of relaxation that reuses the processing done in the earlier phase. The types of relaxation allowed include spatial region expansion and exact/approximate prefix/substring matching. Moreover, since the autocompletion paradigm allows appending characters after the initial query, we look at how query processing done for the initial query and relaxation can be reused in such instances. Compared to existing works which process variants of spatial keyword query as new queries over different indexes, our approach offers a more compelling way to efficient and effective spatial keyword search. Extensive experiments substantiate our claims.

12 citations

Proceedings ArticleDOI
01 Nov 2009
TL;DR: A hardware-efficient string matching architecture using the brute-force algorithm is proposed and a process element that organizes the proposed architecture is optimized by reducing the number of the comparators.
Abstract: Due to the growth of network environment complexity, the necessity of packet payload inspection at application layer is increased. String matching, which is critical to network intrusions detection systems, inspects packet payloads and detects malicious network attacks using a set of rules. Because string matching is a computationally intensive task, hardware based string matching is required. In this paper, we propose a hardware-efficient string matching architecture using the brute-force algorithm. A process element that organizes the proposed architecture is optimized by reducing the number of the comparators. The performance of the proposed architecture is nearly equal to a previous work. The experimental results show that the proposed architecture with any process width reduces the comparator requirements in comparison with the previous work.

12 citations

DissertationDOI
01 Jan 2002
TL;DR: The results prove that gapped q-grams are superior to existing filter approaches with respect to speed, filtration efficiency and their potential for use in lossy filters.
Abstract: In this work we present new results and methods for approximate string matching with filter algorithms We begin with the presentation of QUASAR, our efficient implementation of an improved version of the filter based on the q-gram lemma The q-gram lemma provides a method based on matching substrings to quickly detect potential matches to a query in a subject or target We improved and implemented an algorithm originally introduced in 1991 This resulted in a very efficient program for approximate string matching using an index It was successfully applied to EST-clustering, a problem from computational biology The second part of our work introduces a new class of filters based on gapped q-grams We analyze the potential of this somewhat more complicated approach for use in filters for approximate string matching with an index We consider two important distance measures in approximate string matching, the Hamming and the edit distance For both problems we provide all the tools required to solve them using gapped q-grams This includes threshold computation and the selection of good gapped q-grams using predictions of their speed and filtration effciency Furthermore we consider the potential of gapped q-grams for use in lossy filters We support our findings with extensive experiments Our results prove that gapped q-grams are superior to existing filter approaches with respect to speed, filtration efficiency and their potential for use in lossy filters In dieser Arbeit beschreiben wir neue Ergebnisse und Verfahren auf dem Gebiet der Filteralgorithmen fur Aehnlichkeitssuche in Textdatenbanken Im ersten Teil stellen wir QUASAR, die Implementierung eines verbesserten Filters basierend auf dem sogenannten q-gram Lemma, vor Dieses Lemma basiert auf dem Vergleich von kurzen Teilwoerten und ermoglicht die effiziente Erkennung der Teile einer Textdatenbank, die einer bestimmten Anfrage ahneln Der zweite Teil der Arbeit stellt eine neue Klasse von Filtern die q-grams mit Lucken, sogenannte "gapped q-grams", benutzen vor Wir untersuchen das Potential dieser komplexeren q-grams fur die Nutzung in Filteralgorithmen fur Index-basierte Ahnlichkeitssuche in Textdatenbanken-

12 citations

Proceedings ArticleDOI
04 Sep 2004
TL;DR: This work shows how to use rewriting-logic to model and evaluate reconfigurable systolic architectures which are applied to the efficient treatment of several dynamic programming methods for resolving well-known problems such as global and local sequence alignment, approximate string matching and computation of the longest common subsequence.
Abstract: Systolic arrays provide a large amount of parallelism. However, their applicability is restricted to a small set of computational problems due to their lack of flexibility. This limitation can be circumvented by using reconfigurable systolic arrays, where the node interconnections and operations can be redefined even at run time. In this context, several alternative systolic architectures can be explored and powerful tools are needed to model and evaluate them. We show how well-known rewriting-logic environments could be used to quickly model and simulate complex application specific digital systems speeding-up its subsequent prototyping. We show how to use rewriting-logic to model and evaluate reconfigurable systolic architectures which are applied to the efficient treatment of several dynamic programming methods for resolving well-known problems such as global and local sequence alignment (Smith-Waterman algorithm), approximate string matching and computation of the longest common subsequence. A VHDL description of the conceived architecture was implemented from the rewriting-logic based abstract models and synthesized over an FPGA of the APEX family.

11 citations

Book ChapterDOI
Gene Myers1
06 Apr 1992
TL;DR: A threshold-sensitive algorithm for approximately matching both network and regular expressions and a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures are presented.
Abstract: We present two algorithmic results pertinent to the matching of patterns of interest in macromolecular sequences. The first result is an output sensitive algorithm for approximately matching network expressions, i.e., regular expressions without Kleene closure. This result generalizes the O(kn) expected-time algorithm of Ukkonen for approximately matching keywords [Ukk85]. The second result concerns the problem of matching a pattern that is a network expression whose elements are approximate matches to network expressions interspersed with specifiable distance ranges. For this class of patterns, it is shown how to determine a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures.

11 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839