Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Approximate Boyer-Moore string matching

[...]

Jorma Tarhio, Esko Ukkonen

01 Apr 1993-SIAM Journal on Computing

TL;DR: The generalized Boyer–Moore algorithm is shown to solve the k mismatches problem and a related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with k differences.

...read moreread less

Abstract: The Boyer–Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. The generalized Boyer–Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time $O(kn({1 / {(m - k) + ({k / c})}}))$, where c is the size of the alphabet. A related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with $ \leqslant k$ differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer–Moore algorithm when $k = 0$.

...read moreread less

117 citations

Proceedings Article•DOI•

A sublinear algorithm for weakly approximating edit distance

[...]

Tugkan Batu¹, Funda Ergün², Joe Kilian, Avner Magen³, Sofya Raskhodnikova⁴, Ronitt Rubinfeld, Rahul Sami⁵ - Show less +3 more•Institutions (5)

University of Pennsylvania¹, Case Western Reserve University², University of Toronto³, Massachusetts Institute of Technology⁴, Yale University⁵

09 Jun 2003

TL;DR: The algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance and shows a lower bound of Ω(nΑ/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit Distance at least n/6.

...read moreread less

Abstract: We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time O(nmax(Α/2, 2Α - 1\)) for any fixed Α

...read moreread less

116 citations

Proceedings Article•DOI•

ABL: alignment-based learning

[...]

Menno van Zaanen¹•Institutions (1)

University of Leeds¹

31 Jul 2000

TL;DR: A new type of grammar learning algorithm, inspired by string edit distance, that takes a corpus of flat sentences as input and returns a Corpus of labelled, bracketed sentences that works on pairs of unstructured sentences.

...read moreread less

Abstract: This paper introduces a new type of grammar learning algorithm, inspired by string edit distance (Wagner and Fischer, 1974). The algorithm takes a corpus of flat sentences as input and returns a corpus of labelled, bracketed sentences. The method works on pairs of unstructured sentences that have one or more words in common. When two sentences are divided into parts that are the same in both sentences and parts that are different, this information is used to find parts that are interchangeable. These parts are taken as possible constituents of the same type. After this alignment learning step, the selection learning step selects the most probable constituents from all possible constituents.This method was used to bootstrap structure on the ATIS corpus (Marcus et. al., 1993) and on the OVIS! corpus (Bonnema et al., 1997). While the results are encouraging (we obtained up to 89.25% non-crossing brackets precision), this paper will point out some of the shortcomings of our approach and will suggest possible solutions.

...read moreread less

114 citations

Journal Article•DOI•

Efficient Computation of the Tree Edit Distance

[...]

Mateusz Pawlik¹, Nikolaus Augsten¹•Institutions (1)

University of Salzburg¹

25 Mar 2015-ACM Transactions on Database Systems

TL;DR: RTED is shown optimal among all algorithms that use LRH (left-right-heavy) strategies, which include RTED and the fastest tree edit distance algorithms presented in literature.

...read moreread less

Abstract: We consider the classical tree edit distance between ordered labelled trees, which is defined as the minimum-cost sequence of node edit operations that transform one tree into another. The state-of-the-art solutions for the tree edit distance are not satisfactory. The main competitors in the field either have optimal worst-case complexity but the worst case happens frequently, or they are very efficient for some tree shapes but degenerate for others. This leads to unpredictable and often infeasible runtimes. There is no obvious way to choose between the algorithms.In this article we present RTED, a robust tree edit distance algorithm. The asymptotic complexity of our algorithm is smaller than or equal to the complexity of the best competitors for any input instance, that is, our algorithm is both efficient and worst-case optimal. This is achieved by computing a dynamic decomposition strategy that depends on the input trees. RTED is shown optimal among all algorithms that use LRH (left-right-heavy) strategies, which include RTED and the fastest tree edit distance algorithms presented in literature. In our experiments on synthetic and real-world data we empirically evaluate our solution and compare it to the state-of-the-art.

...read moreread less

112 citations

Patent•

Exemplar-based natural language processing

[...]

Richard Futrell¹, Thomas R. Gruber¹•Institutions (1)

Apple Inc.¹

30 Sep 2014

TL;DR: In this paper, an exemplar-based NLP system for NLP is presented, and a semantic edit distance between the first text phrase and the second text phrase in a semantic space can be determined based on one or more of the insertion cost, the deletion cost, and the substitution cost.

...read moreread less

Abstract: Systems and processes for exemplar-based natural language processing are provided. In one example process, a first text phrase can be received. It can be determined whether editing the first text phrase to match a second text phrase requires one or more of inserting, deleting, and substituting a word of the first text phrase. In response to determining that editing the first text phrase to match the second text phrase requires one or more of inserting, deleting, and substituting a word of the first text phrase, one or more of an insertion cost, a deletion cost, and a substitution cost can be determined. A semantic edit distance between the first text phrase and the second text phrase in a semantic space can be determined based on one or more of the insertion cost, the deletion cost, and the substitution cost.

...read moreread less

112 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics