Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

[...]

Yasuaki Mitani, Fumihiko Ino¹, Kenichi Hagihara¹•Institutions (1)

Osaka University¹

01 Jul 2017-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This study proposes a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms to improve the runtimes of exact and approximate string matching algorithms, and integrates the inclusive-scan scheme into a previous segmentation-based scheme to maximize search throughput.

...read moreread less

Abstract: In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit (GPU); we achieve this speed-up here because inclusive-scan operations not only eliminate duplicate searches between threads but also realize a GPU-friendly memory access pattern that maximizes memory read/write throughput. To realize our ideas, we first define two binary operators and then present a proof regarding the associativity of these operators, which is necessary for the parallelization of the inclusive-scan operations. Finally, we integrate the inclusive-scan scheme into a previous segmentation-based scheme to maximize search throughput, identifying the best tradeoff point between synchronization cost and duplicate work. Through our experiments, we compared our proposed method with previous segmentation-based methods and indexing-based sequence aligners. For online string matching, our proposed method performed 6.7-16.7 times faster than previous methods, achieving a search throughput of up to 1.88 terabits per second (Tbps) on a GeForce GTX TITAN X GPU. We therefore conclude that our proposed method is quite effective for decreasing the runtimes of online string matching of short patterns.

...read moreread less

24 citations

Patent•

Method for the manipulation, storage, modeling, visualization and quantification of datasets

[...]

Sandy C. Shaw

19 Jan 2001

TL;DR: In this paper, a method for manipulation, storage, modeling, visualization, and quantification of datasets which correspond to target strings is described, which is used to generate comparison strings corresponding to some set of points that can serve as the domain of an iterative function.

...read moreread less

Abstract: There is described a method for manipulation, storage, modeling, visualization, and quantification of datasets, which correspond to target strings. An iterative algorithm is used to generate comparison strings corresponding to some set of points that can serve as the domain of an iterative function. Preferably these points are located in the complex plane, such as in and/or near the Mandelbrot Set or a Julian Set. The comparison string is scored by evaluating a function having the comparison string and one of the plurality of target strings as inputs. The evaluation may be repeated for a number of the other target strings. The score or some other property corresponding to the comparison string is used to determine the target string's placement on a map. The points are analyzed and/or compared by examining, either visually or mathematically, their relative locations, their absolute locations within the region, and/or metrics other than location.

...read moreread less

24 citations

Journal Article•

A Fast String Matching Algorithm

[...]

Hou Yi-bin

01 Jan 2004-Mini-micro Systems

TL;DR: A novel improved algorithm-BMH2C is presented, which computes the right shift using two characters and saves the shift in a two-dimension array, increases the shift, decreases the times of comparing and enhances the matching speed effectively.

...read moreread less

Abstract: The technology of string matching is applied abroad in many fields Based on the discussions of Brute-Force, Boyer-Moore algorithms and the most important improvements to these algorithms, a novel improved algorithm-BMH2C is presented The algorithm computes the right shift using two characters and saves the shift in a two-dimension array, increases the shift, decreases the times of comparing and enhances the matching speed effectively In the end, the comparisons of the testing results of these algorithms are given

...read moreread less

24 citations

Proceedings Article•DOI•

Local Filtering: Improving the Performance of Approximate Queries on String Collections

[...]

Xiaochun Yang¹, Yaoshu Wang², Bin Wang¹, Wei Wang²•Institutions (2)

Northeastern University (China)¹, University of New South Wales²

27 May 2015

TL;DR: A new filtering method, called local filtering, is proposed, based on the idea that two strings exhibiting substantial local dissimilarities must be globally dissimilar, which can achieve substantial speedup compared with state-of-the-art methods and be robust against factors such as dataset characteristics and large edit distance thresholds.

...read moreread less

Abstract: We study efficient query processing for approximate string queries, which find strings within a string collection whose edit distances to the query strings are within the given thresholds. Existing methods typically hinge on the property that globally similar strings must share at least certain number of identical substrings or subsequences. They become ineffective when there are burst errors or when the number of errors is large. In this paper, we explore the opposite paradigm focusing on finding out the differences of database strings to the query string. We propose a new filtering method, called local filtering, based on the idea that two strings exhibiting substantial local dissimilarities must be globally dissimilar. We propose the concept of (positional) local distance to quantify the minimum amount of errors a query fragment contributes to the edit distance between the query and a data string. It also leads to effective pruning rules and can speed up verification via early termination. We devise a family of indexing methods based on the idea of precomputing (positional) local distances for all possible combinations of query fragments and edit distance thresholds. Based on careful analyses of subtle relationships among local distances, novel techniques are proposed to drastically reduce the amount of enumeration with no or little impact on the pruning power. Efficient query processing methods exploiting the new index and bit-parallelism are also proposed. Experimental results on real datasets show that our local filtering-based methods can achieve substantial speedup compared with state-of-the-art methods, and they are robust against factors such as dataset characteristics and large edit distance thresholds.

...read moreread less

24 citations

Proceedings Article•

Indexing mixed types for approximate retrieval

[...]

Liang Jin¹, Chen Li¹, Nick Koudas², Anthony K. H. Tung³•Institutions (3)

University of California, Irvine¹, University of Toronto², National University of Singapore³

30 Aug 2005

TL;DR: The approach presented is based on representing sets of strings at higher levels of the index structure as tries suitably compressed in a way that reasoning about edit distance between a query string and a compressed trie at index nodes is still feasible.

...read moreread less

Abstract: In various applications such as data cleansing, being able to retrieve categorical or numerical attributes based on notions of approximate match (e.g., edit distance, numerical distance) is of profound importance. Commonly, approximate match predicates are specified on combinations of attributes in conjunction. Existing database techniques for approximate retrieval, however, limit their applicability to single attribute retrieval through B-trees and their variants. In this paper, we propose a methodology that utilizes known multidimensional indexing structures for the problem of approximate multi-attribute retrieval. Our method enables indexing of a collection of string and/or numeric attributes to facilitate approximate retrieval using edit distance as an approximate match predicate for strings and numeric distance for numeric attributes. The approach presented is based on representing sets of strings at higher levels of the index structure as tries suitably compressed in a way that reasoning about edit distance between a query string and a compressed trie at index nodes is still feasible. We propose and evaluate various techniques to generate the compressed trie representation and fully specify our indexing methodology. Our experimental results show the benefits of our proposal when compared with various alternate strategies for the same problem.

...read moreread less

24 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics