Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (unless APSP can)

[...]

Karl Bringmann¹, Paweł Gawrychowski², Shay Mozes³, Oren Weimann⁴•Institutions (4)

Max Planck Society¹, University of Wrocław², Interdisciplinary Center Herzliya³, University of Haifa⁴

27 Mar 2017-arXiv: Data Structures and Algorithms

TL;DR: The fastest known algorithm for tree edit distance runs in cubic $O(n^3)$ time and is based on a similar dynamic programming solution as string edit distance.

...read moreread less

Abstract: The edit distance between two rooted ordered trees with $n$ nodes labeled from an alphabet~$\Sigma$ is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. Tree edit distance is a well known generalization of string edit distance. The fastest known algorithm for tree edit distance runs in cubic $O(n^3)$ time and is based on a similar dynamic programming solution as string edit distance. In this paper we show that a truly subcubic $O(n^{3-\varepsilon})$ time algorithm for tree edit distance is unlikely: For $|\Sigma| = \Omega(n)$, a truly subcubic algorithm for tree edit distance implies a truly subcubic algorithm for the all pairs shortest paths problem. For $|\Sigma| = O(1)$, a truly subcubic algorithm for tree edit distance implies an $O(n^{k-\varepsilon})$ algorithm for finding a maximum weight $k$-clique. Thus, while in terms of upper bounds string edit distance and tree edit distance are highly related, in terms of lower bounds string edit distance exhibits the hardness of the strong exponential time hypothesis [Backurs, Indyk STOC'15] whereas tree edit distance exhibits the hardness of all pairs shortest paths. Our result provides a matching conditional lower bound for one of the last remaining classic dynamic programming problems.

...read moreread less

19 citations

Proceedings Article•DOI•

Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features

[...]

Hasim Sak¹, Murat Saraclar¹, Tunga Güngör¹•Institutions (1)

Boğaziçi University¹

01 Dec 2011

TL;DR: It is shown that morpholexical and n-best-list features are effective in improving the accuracy of the system (0.8%) and the proposed methods are evaluated on a Turkish broadcast news transcription task.

...read moreread less

Abstract: This paper explores rich morphological and novel n-best-list features for reranking automatic speech recognition hypotheses. The morpholexical features are defined over the morphological features obtained by using an n-gram language model over lexical and grammatical morphemes in the first-pass. The n-best-list features for each hypothesis are defined using that hypothesis and other alternate hypotheses in an n-best list. Our methodology is to align each hypothesis with other hypotheses one by one using minimum edit distance alignment. This gives us a set of edit operations - substitution, addition and deletion as seen in these alignments. These edit operations constitute our n-best-list features as indicator features. The reranking model is trained using a word error rate sensitive averaged perceptron algorithm introduced in this paper. The proposed methods are evaluated on a Turkish broadcast news transcription task. The baseline systems are word and statistical sub-word systems which also employ morphological features for reranking. We show that morpholexical and n-best-list features are effective in improving the accuracy of the system (0.8%).

...read moreread less

19 citations

Book Chapter•DOI•

A Graph Edit Distance Based on Node Merging

[...]

Stefano Berretti¹, A. Del Bimbo¹, Pietro Pala¹•Institutions (1)

University of Florence¹

21 Jul 2004

TL;DR: A novel solution is proposed for error tolerant graph matching by extending the original edit distance based framework so as to account for a new operator to support node merging during the matching process.

...read moreread less

Abstract: In this paper a novel solution is proposed for error tolerant graph matching. The solution belongs to the class of edit distance based techniques. In particular, the original edit distance based framework is extended so as to account for a new operator to support node merging during the matching process.

...read moreread less

19 citations

Proceedings Article•DOI•

Translation memory retrieval methods

[...]

Michael Bloodgood¹, Benjamin Strauss¹•Institutions (1)

University of Maryland, College Park¹

01 Apr 2014

TL;DR: This paper investigated and evaluated the use of several matching algorithms, including the edit distance algorithm that is believed to be at the heart of most modern commercial translation memory systems, and showed how well various matching algorithms correlate with human judgments of helpfulness.

...read moreread less

Abstract: Translation Memory (TM) systems are one of the most widely used translation technologies. An important part of TM systems is the matching algorithm that determines what translations get retrieved from the bank of available translations to assist the human translator. Although detailed accounts of the matching algorithms used in commercial systems can’t be found in the literature, it is widely believed that edit distance algorithms are used. This paper investigates and evaluates the use of several matching algorithms, including the edit distance algorithm that is believed to be at the heart of most modern commercial TM systems. This paper presents results showing how well various matching algorithms correlate with human judgments of helpfulness (collected via crowdsourcing with Amazon’s Mechanical Turk). A new algorithm based on weighted n-gram precision that can be adjusted for translator length preferences consistently returns translations judged to be most helpful by translators for multiple domains and language pairs.

...read moreread less

19 citations

Proceedings Article•DOI•

Computing the Levenshtein distance of a regular language

[...]

Stavros Konstantinidis¹•Institutions (1)

Saint Mary's University¹

14 Nov 2005

TL;DR: The problem of computing the smallest edit distance between any pair of distinct words of a regular language is studied in this paper, which is the smallest number of substitutions, insertions, and deletions that can be used to transform one of the words into another.

...read moreread less

Abstract: The edit distance (or Levenshtein distance) between two words is the smallest number of substitutions, insertions, and deletions of symbols that can be used to transform one of the words into the other. In this paper we consider the problem of computing the edit distance of a regular language (also known as constraint system), that is, the set of words accepted by a given finite automaton. This quantity is the smallest edit distance between any pair of distinct words of the language. We show that the problem is of polynomial time complexity. We distinguish two cases depending on whether the given automaton is deterministic or nondeterministic. In the latter case the time complexity is higher.

...read moreread less

19 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics