Topic
Approximate string matching
About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.
Papers published on a yearly basis
Papers
More filters
••
01 May 2001TL;DR: A technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points is presented and can be practically realized using a combination of string B-trees and R-tree.
Abstract: As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing.In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.
17 citations
••
11 Jun 2010TL;DR: This research proposes a new concept to solve the problem of exact string matching by scanning text string for the rightmost character of the pattern in preprocessing phase by implementing TSPRC (Test Scanning for Pattern Rightmost Character).
Abstract: Exact string matching algorithms are essential components in practical applications of the computer system. In this research we propose a new concept to solve the problem of exact string matching by scanning text string for the rightmost character of the pattern in preprocessing phase. In matching phase TSPRC (Test Scanning for Pattern Rightmost Character) compares the pattern with text window from both directions simultaneously. Proposed algorithm implemented and compared with existing algorithms. Comparison results demonstrate that TSPRC is efficient than the number of the existing algorithm and take O(1) time complexity in the best case.
17 citations
••
04 Apr 2017TL;DR: This paper uses the CUDA based Graphics Processing Unit (GPU) and the newly introduced Unified Memory (UM) to speed up the most common algorithms to compute the edit distance between two string algorithms, the Levenshtein and Damerau distance algorithms.
Abstract: String matching problems such as sequence alignment is one of the fundamental problems in many computer since fields such as natural language processing (NLP) and bioinformatics. Many algorithms have been proposed in the literature to address this problem. Some of these algorithms compute the edit distance between the two strings to perform the matching. However, these algorithms usually require long execution time. Many researches use high performance computing to reduce the execution time of many string matching algorithms. In this paper, we use the CUDA based Graphics Processing Unit (GPU) and the newly introduced Unified Memory(UM) to speed up the most common algorithms to compute the edit distance between two string. These algorithms are the Levenshtein and Damerau distance algorithms. Our results show that using GPU to implement the Levenshtein and Damerau distance algorithms improvements their execution times of about 11X and 12X respectively when compared to the sequential implementation. And an improvement of about 61X and 71X respectively can be achieved when GPU is used with unified memory.
17 citations
••
01 Sep 2015TL;DR: Practical solutions for the exact order-preserving matching problem to find all the substrings of a text T which have the same length and relative order as a pattern P are presented.
Abstract: The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P. Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same positions in both strings. In this paper we present practical solutions for this problem. The methods are based on filtration, and one of them is the first sublinear solution on average. We show by practical experiments that the new solutions are fast and efficient.
17 citations
••
02 Nov 2009TL;DR: An incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text and significantly outperform existing methods in the literature.
Abstract: We study the problem of approximate membership extraction (AME), i.e., how to efficiently extract substrings in a text document that approximately match some strings in a given dictionary. This problem is important in a variety of applications such as named entity recognition and data cleaning. We solve this problem in two steps. In the first step, for each substring in the text, we filter away the strings in the dictionary that are very different from the substring. In the second step, each candidate string is verified to decide whether the substring should be extracted. We develop an incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text. Our experimental study of the proposed algorithms on real and synthetic datasets showed that our solutions significantly outperform existing methods in the literature.
17 citations