Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Spelling correction in the PubMed search engine

[...]

W. John Wilbur¹, Won Kim¹, Natalie Xie¹•Institutions (1)

National Institutes of Health¹

01 Nov 2006-Information Retrieval

TL;DR: The methodology developed is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings.

...read moreread less

Abstract: It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here we describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries are discussed and our solutions are outlined.

...read moreread less

36 citations

Book Chapter•DOI•

A generalized correlation attack with a probabilistic constrained edit distance

[...]

Jovan Dj. Golic¹, Slobodan Petrovic¹•Institutions (1)

University of Belgrade¹

24 May 1992

TL;DR: For a noisy clock-controlled shift register statistically optimal probabilistic constrained edit distance a recursive algorithm for its efficient computation are derived and corresponding generalized correlation attack is proposed.

...read moreread less

Abstract: For a noisy clock-controlled shift register statistically optimal probabilistic constrained edit distance a recursive algorithm for its efficient computation are derived. corresponding generalized correlation attack is proposed.

...read moreread less

35 citations

Book Chapter•DOI•

Faster String Matching with Super-Alphabets

[...]

Kimmo Fredriksson¹•Institutions (1)

University of Helsinki¹

11 Sep 2002

TL;DR: This paper shows how to obtain an O(n/m) average time string matching algorithm, using a super-alphabet for simulating suffix automaton and adopting a similar technique to the shift-or algorithm, extending its bit-parallelism in another direction.

...read moreread less

Abstract: Given a text T[1 . . . n] and a pattern P[1 . . . m] over some alphabet ? of size ?, finding the exact occurrences of P in T requires at least ?(n log? m/m) character comparisons on average, as shown in [19]. Consequently, it is believed that this lower bound implies also an ?(n log? m/m) lower bound for the execution time of an optimal algorithm. However, in this paper we show how to obtain an O(n/m) average time algorithm. This is achieved by slightly changing the model of computation, and with a modification of an existing algorithm. Our technique uses a super-alphabet for simulating suffix automaton. The space usage of the algorithm is O(?m). The technique can be applied to many other string matching algorithms, including dictionary matching, which is also solved in expected time O(n/m), and approximate matching allowing k edit operations (mismatches, insertions or deletions of characters). This is solved in expected time O(nk/m) for k ? O(m/log? m). The known lower bound for this problem is ?(n(k + log? m)/m), given in [6]. Finally we show how to adopt a similar technique to the shift-or algorithm, extending its bit-parallelism in another direction. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. Some of the algorithms are implemented, and we show that the methods work well in practice too. This is especially true for the shift-or algorithm, which in some cases works faster than predicted by the theory. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. All the methods and analyses assume the RAM model of computation, and that each symbol is coded in b = ?log2 ?? bits. They work for larger b too, but the speed-up is decreased.

...read moreread less

35 citations

Book Chapter•DOI•

Identifying Periodic Occurrences of a Template with Applications to Protein Structures

[...]

Vincent A. Fischetti¹, Gad M. Landau², Jeanette P. Schmidt², Peter H. Sellers¹•Institutions (2)

Rockefeller University¹, New York University²

29 Apr 1992

TL;DR: This work considers a string matching problem where the pattern is a template that matches many different strings with various degrees of perfection, and shows that the structure of Pn can be exploited and the problem reduced to essentially solving a dynamic programming of size O(mn).

...read moreread less

Abstract: We consider a string matching problem where the pattern is a template that matches many different strings with various degrees of perfection. The quality of a match is given by a penalty matrix that assigns each pair of characters a score that characterizes how well the characters match. Superfluous characters in the text and superfluous characters in the pattern may also occur and the respective penalties for such gaps in the alignment are also given by the penalty matrix. For a text T of length n, and a template P of length m, we wish to find the best alignment of T with Pn, which is the concatenation of n copies of P, (m will typically be much smaller than n). Such an alignment can simply be obtained by solving a dynamic programming problem of size O(n2m), and ignoring the periodic character of Pn. We show that the structure of Pn can be exploited and the problem reduced to essentially solving a dynamic programming of size O(mn). If the complexity of computing gap penalties is O(1), (which is frequently the case), our algorithm runs in O(mn) time. The problem was motivated by a protein structure problem.

...read moreread less

35 citations

Journal Article•DOI•

The Computational Hardness of Estimating Edit Distance

[...]

Alexandr Andoni¹, Robert Krauthgamer²•Institutions (2)

Massachusetts Institute of Technology¹, Weizmann Institute of Science²

01 Mar 2010-SIAM Journal on Computing

TL;DR: This work proves the first non-trivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings, and provides the first setting in which the complexity of computing the edit Distance is provably larger than that of Hamming distance.

...read moreread less

Abstract: We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation and communication, asserting, for example, that protocols with $O(1)$ bits of communication can obtain only approximation $\alpha\geq\Omega(\log d/\log\log d)$, where $d$ is the length of the input strings. This case of $O(1)$ communication is of particular importance since it captures constant-size sketches as well as embeddings into spaces like $l_1$ and squared-$l_2$, two prevailing algorithmic approaches for dealing with edit distance. Indeed, the known nontrivial communication upper bounds are all derived from embeddings into $l_1$. By excluding low-communication protocols for edit distance, we rule out a strictly richer class of algorithms than previous results. Furthermore, our lower bound holds not only for strings over a binary alphabet but also for strings that are permutations (aka the Ulam metric). For this case, our bound nearly matches an upper bound known via embedding the Ulam metric into $l_1$. Our proof uses a new technique that relies on Fourier analysis in a rather elementary way.

...read moreread less

35 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics