scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Proceedings ArticleDOI
01 May 2001
TL;DR: A technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points is presented and can be practically realized using a combination of string B-trees and R-tree.
Abstract: As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing.In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.

17 citations

Proceedings ArticleDOI
11 Jun 2010
TL;DR: This research proposes a new concept to solve the problem of exact string matching by scanning text string for the rightmost character of the pattern in preprocessing phase by implementing TSPRC (Test Scanning for Pattern Rightmost Character).
Abstract: Exact string matching algorithms are essential components in practical applications of the computer system. In this research we propose a new concept to solve the problem of exact string matching by scanning text string for the rightmost character of the pattern in preprocessing phase. In matching phase TSPRC (Test Scanning for Pattern Rightmost Character) compares the pattern with text window from both directions simultaneously. Proposed algorithm implemented and compared with existing algorithms. Comparison results demonstrate that TSPRC is efficient than the number of the existing algorithm and take O(1) time complexity in the best case.

17 citations

Proceedings ArticleDOI
04 Apr 2017
TL;DR: This paper uses the CUDA based Graphics Processing Unit (GPU) and the newly introduced Unified Memory (UM) to speed up the most common algorithms to compute the edit distance between two string algorithms, the Levenshtein and Damerau distance algorithms.
Abstract: String matching problems such as sequence alignment is one of the fundamental problems in many computer since fields such as natural language processing (NLP) and bioinformatics. Many algorithms have been proposed in the literature to address this problem. Some of these algorithms compute the edit distance between the two strings to perform the matching. However, these algorithms usually require long execution time. Many researches use high performance computing to reduce the execution time of many string matching algorithms. In this paper, we use the CUDA based Graphics Processing Unit (GPU) and the newly introduced Unified Memory(UM) to speed up the most common algorithms to compute the edit distance between two string. These algorithms are the Levenshtein and Damerau distance algorithms. Our results show that using GPU to implement the Levenshtein and Damerau distance algorithms improvements their execution times of about 11X and 12X respectively when compared to the sequential implementation. And an improvement of about 61X and 71X respectively can be achieved when GPU is used with unified memory.

17 citations

Book ChapterDOI
01 Sep 2015
TL;DR: Practical solutions for the exact order-preserving matching problem to find all the substrings of a text T which have the same length and relative order as a pattern P are presented.
Abstract: The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P. Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same positions in both strings. In this paper we present practical solutions for this problem. The methods are based on filtration, and one of them is the first sublinear solution on average. We show by practical experiments that the new solutions are fast and efficient.

17 citations

Proceedings ArticleDOI
02 Nov 2009
TL;DR: An incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text and significantly outperform existing methods in the literature.
Abstract: We study the problem of approximate membership extraction (AME), i.e., how to efficiently extract substrings in a text document that approximately match some strings in a given dictionary. This problem is important in a variety of applications such as named entity recognition and data cleaning. We solve this problem in two steps. In the first step, for each substring in the text, we filter away the strings in the dictionary that are very different from the substring. In the second step, each candidate string is verified to decide whether the substring should be extracted. We develop an incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text. Our experimental study of the proposed algorithms on real and synthetic datasets showed that our solutions significantly outperform existing methods in the literature.

17 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839