Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Two-dimensional periodicity and its applications

[...]

Amihood Amir, Gary Benson

01 Sep 1992

TL;DR: This paper presents a new algorithmic technique for two-dimensional matching, that of periodicity analysis, and introduces a new pattern matching paradigm - Compressed Matching

...read moreread less

Abstract: String matching is rich with a variety of algorithmic tools. In contrast, multidimensional matching has a rather sparse set of techniques. This paper presents a new algorithmic technique for two-dimensional matching, that of periodicity analysis.Periodicity in strings has been used to solve string matching problems. The success of these algorithms suggests that periodicity can be as important a tool in multidimensional matching. However, multidimensional periodicity is not as simple as it is in strings and was not formally studied or used in pattern matching.This paper's main contribution is defining and analysing two-dimensional periodicity in rectangular arrays. In addition, we introduce a new pattern matching paradigm - Compressed Matching. A text array T and a pattern array P are given in compressed forms c(T) and c(P). We seek all appearances of P in T, without decompressing T. By using periodicity analysis, we show that for the two-dimensional run-length compression there is a O(|c(T)|log|P|+|P|), or almost optimal algorithm that can achieve a search time that is sublinear in the size of the text |T|.

...read moreread less

87 citations

Journal Article•DOI•

Multipattern string matching with q-grams

[...]

Leena Salmela¹, Jorma Tarhio¹, Jari Kytöjoki¹•Institutions (1)

Helsinki University of Technology¹

09 Feb 2007-ACM Journal of Experimental Algorithms

TL;DR: Three algorithms for exact string matching of multiple patterns, which apply q-grams and bit parallelism are presented, which appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns.

...read moreread less

Abstract: We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q-grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g., different trie implementations of the Aho--Corasick algorithm. All of our algorithms appeared to be substantially faster than earlier solutions for sets of 1,000--10,000 patterns and the good performance of two of them continues to 100,000 patterns. The gain is because of the improved filtering efficiency caused by q-grams.

...read moreread less

87 citations

Patent•

Modifying an input string partitioned in accordance with directionality and length constraints

[...]

Lauri Karttunen¹•Institutions (1)

Xerox¹

16 May 1997

TL;DR: In this paper, a processor implemented method of modifying a string of a regular language, which includes at least two symbols and two predetermined substrings, was described, and the processor then replaced the matching substring with the string of the lower language associated with the selected preselected substrings and outputs the modified string.

...read moreread less

Abstract: A processor implemented method of modifying a string of a regular language, which includes at least two symbols and at least two predetermined substrings. Upon receipt of the string, the processor determines an initial position within the string of a substring matching one of the preselected substrings. To make this determination, the processor either matches symbols of the string starting from the left and proceeding to the right or by starting from the right and proceeding to the left. After identifying the initial position, the processor then selects either the longest or the shortest of the preselected substrings. The processor then replaces the matching substring with the string of the lower language associated with the selected preselected substring and outputs the modified string.

...read moreread less

86 citations

Patent•

Data compression and decompression system with immediate dictionary updating interleaved with string search

[...]

Albert B. Cooper¹, Terry A. Welch¹, Theresa Raylene Welch¹•Institutions (1)

Unisys¹

23 Jul 1997

TL;DR: In this paper, a dictionary based data compression and decompression system is proposed, where, in the compressor, when a partial string W and a character C are matched in the dictionary, a new string is entered into the dictionary with C as an extension character on the string PW where P is the string corresponding to the last output compressed code signal.

...read moreread less

Abstract: A dictionary based data compression and decompression system where, in the compressor, when a partial string W and a character C are matched in the dictionary, a new string is entered into the dictionary with C as an extension character on the string PW where P is the string corresponding to the last output compressed code signal. An update string is entered into the compression dictionary for each input character that is read and matched. The updating is immediate and interleaved with the character-by-character matching of the current string. The update process continues until the longest match is found in the dictionary. The code of the longest matched string is output in a string matching cycle. If a single character or multi-character string "A" exists in the dictionary, the string AAA . . . A is encoded in two compressed code signals regardless of the string length. This encoding results in an unrecognized code signal at the decompressor. The decompressor, in response to an unrecognized code signal, enters update strings into the decompressor dictionary in accordance with the recovered string corresponding to the previously received code signal, the unrecognized code signal, the extant code of the decompressor and the number of characters in the previously recovered string.

...read moreread less

85 citations

Proceedings Article•DOI•

Approximate string search in spatial databases

[...]

Bin Yao¹, Feifei Li¹, Marios Hadjieleftheriou², Kun Hou¹•Institutions (2)

Florida State University¹, AT&T Labs²

01 Mar 2010

TL;DR: This work presents a novel index structure, MHR-tree, for efficiently answering approximate string match queries in large spatial databases based on the R-tree augmented with the min-wise signature and the linear hashing technique.

...read moreread less

Abstract: This work presents a novel index structure, MHR-tree, for efficiently answering approximate string match queries in large spatial databases. The MHR-tree is based on the R-tree augmented with the min-wise signature and the linear hashing technique. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the sub-tree of u. We analyze the pruning functionality of such signatures based on set resemblance between the query string and the q-grams from the sub-trees of index nodes. MHR-tree supports a wide range of query predicates efficiently, including range and nearest neighbor queries. We also discuss how to estimate range query selectivity accurately. We present a novel adaptive algorithm for finding balanced partitions using both the spatial and string information stored in the tree. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approach.

...read moreread less

84 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics