Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Noisy Substring Matching Problem

[...]

R.L. Kashyap¹, B.J. Oommen¹•Institutions (1)

Purdue University¹

01 May 1983-IEEE Transactions on Software Engineering

TL;DR: The proposed algorithm for-the computation of S*(Y) requires cubic time and uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarities measure between Y and a certain subset of the set of contiguous substrings of X.

...read moreread less

Abstract: Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm for-the computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.

...read moreread less

28 citations

Proceedings Article•DOI•

A programmable processor for approximate string matching with high throughput rate

[...]

H.-M. Bluthgen, T.G. Noll

10 Jul 2000

TL;DR: The algorithm and architecture of a processor for approximate string matching with high throughput rate is presented, dedicated for multimedia and information retrieval applications working on huge amounts of mass data where short response times are necessary.

...read moreread less

Abstract: In this paper we present the algorithm and architecture of a processor for approximate string matching with high throughput rate. The processor is dedicated for multimedia and information retrieval applications working on huge amounts of mass data where short response times are necessary. The algorithm used for the approximate string matching is based on a dynamic programming procedure known as the string-to-string correction problem. It has been extended to fulfil the requirements of full text search in a database system, including string matching with wildcards and handling of idiomatic turns of some languages. The processor has been fabricated in a 0.6 /spl mu/m CMOS technology. It performs a maximum of 8.5 billion character comparisons per second when operating at the specified clock frequency of 132 MHz.

...read moreread less

27 citations

Journal Article•

Approximate Seeds of Strings

[...]

Manolis Christodoulakis, Costas S. Iliopoulos, Kunsoo Park, Jeong Seop Sim

01 Jan 2005-Journal of Automata, Languages and Combinatorics

TL;DR: This paper solves the smallest distance approximate seed problem and the restricted smallest approximate seedProblem in polynomial time and proves that the general smallest approximate Seed problem is NP-complete.

...read moreread less

Abstract: In this paper we study approximate seeds of strings, that is, substrings of a given string x that cover (by concatenations or overlaps) a superstring of x, under a variety of distance rules (the Hamming distance, the edit distance, and the weighted edit distance). We solve the smallest distance approximate seed problem and the restricted smallest approximate seed problem in polynomial time and we prove that the general smallest approximate seed problem is NP-complete.

...read moreread less

27 citations

Journal Article•DOI•

ALFRED: A Practical Method for Alignment-Free Distance Computation

[...]

Sharma V. Thankachan¹, Sriram P. Chockalingam², Yongchao Liu¹, Alberto Apostolico¹, Srinivas Aluru¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Indian Institute of Technology Bombay²

07 Jun 2016-Journal of Computational Biology

TL;DR: ALFRED is presented, an alignment-free distance computation method, which solves the generalized common substring search problem via exact computation and facilitates to exactly reconstruct the topology of the reference phylogenetic tree for a set of 27 primate mitochondrial genomes, at reasonably acceptable speed.

...read moreread less

Abstract: Alignment-free approaches are gaining persistent interest in many sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, especially for large-scale sequence datasets. Besides the widely used k-mer methods, the average common substring (ACS) approach has emerged to be one of the well-known alignment-free approaches. Two recent works further generalize this ACS approach by allowing a bounded number k of mismatches in the common substrings, relying on approximation (linear time) and exact computation, respectively. Albeit having a good worst-case time complexity [Formula: see text], the exact approach is complex and unlikely to be efficient in practice. Herein, we present ALFRED, an alignment-free distance computation method, which solves the generalized common substring search problem via exact computation. Compared to the theoretical approach, our algorithm is easier to implement and more practical to use, while still providing highly competitive theoretical performances with an expected run-time of [Formula: see text]. By applying our program to phylogenetic inference as a case study, we find that our program facilitates to exactly reconstruct the topology of the reference phylogenetic tree for a set of 27 primate mitochondrial genomes, at reasonably acceptable speed. ALFRED is implemented in C++ programming language and the source code is freely available online.

...read moreread less

27 citations

Journal Article•DOI•

Dynamic computation of generalised median strings

[...]

Xiaoyi Jiang¹, K. Abegglen², Horst Bunke², János Csirik•Institutions (2)

Technical University of Berlin¹, University of Bern²

01 Dec 2003-Pattern Analysis and Applications

TL;DR: This paper presents a novel approach that is able to operate in a dynamic environment, where there is a steady arrival of new strings belonging to the considered set and needs only the median of the set computed before together with the new string to compute an updated median string of the new set.

...read moreread less

Abstract: The generalised median string is defined as a string that has the smallest sum of distances to the elements of a given set of strings. It is a valuable tool in representing a whole set of objects by a single prototype, and has interesting applications in pattern recognition. All algorithms for computing generalised median strings known from the literature are of static nature. That is, they require all elements of the underlying set of strings to be given when the algorithm is started. In this paper, we present a novel approach that is able to operate in a dynamic environment, where there is a steady arrival of new strings belonging to the considered set. Rather than computing the median from scratch upon arrival of each new string, the proposed algorithm needs only the median of the set computed before together with the new string to compute an updated median string of the new set. Our approach is experimentally compared to a greedy algorithm and the set median using both synthetic and real data.

...read moreread less

27 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics