Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Tractable and intractable variations of unordered tree edit distance

[...]

Yoshiyuki Yamamoto¹, Kouichi Hirata¹, Tetsuji Kuboyama²•Institutions (2)

Kyushu Institute of Technology¹, Gakushuin University²

24 Jul 2014-International Journal of Foundations of Computer Science

TL;DR: This paper focuses on the variations tractable by the algorithms including the submodule of a network algorithm, either the minimum cost maximum flow algorithm or the maximum weighted bipartite matching algorithm, and shows that both network algorithms are replaceable.

...read moreread less

Abstract: In this paper, we investigate the problem of computing structural sensitive variations of an unordered tree edit distance. First, we focus on the variations tractable by the algorithms including the submodule of a network algorithm, either the minimum cost maximum flow algorithm or the maximum weighted bipartite matching algorithm. Then, we show that both network algorithms are replaceable, and hence the time complexity of computing these variations can be reduced to O(nmd) time, where n is the number of nodes in a tree, m is the number of nodes in another tree and d is the minimum degree of given two trees. Next, we show that the problem of computing the bottom-up distance is MAX SNP-hard. Note that the well-known linear-time algorithm for the bottom-up distance designed by Valiente (2001) computes just a bottom-up indel (insertion-deletion) distance allowing no substitutions.

...read moreread less

17 citations

Proceedings Article•

Modelling Legitimate Translation Variation for Automatic Evaluation of MT Quality

[...]

Bogdan Babych¹, Anthony Hartley¹•Institutions (1)

University of Leeds¹

01 May 2004

TL;DR: This paper explores the link between legitimate translation variation and statistical measures of a words salience within a given document, such as tf.idf scores, and shows that the use of such scores extends the N-gram distance measures in a way that allows us to accurately predict multiple quality parameters of the text.

...read moreread less

Abstract: Automatic methods for MT evaluation are often based on the assumption that MT quality is related to some kind of distance between the evaluated text and a professional human translation (e.g., an edit distance or the precision of matched N-grams). However, independently produced human translations are necessarily different, conveying the same content by dissimilar means. Such legitimate translation variation is a serious problem for distance-based evaluation methods, because mismatches do not necessarily mean degradation in MT quality. In this paper we explore the link between legitimate translation variation and statistical measures of a words salience within a given document, such as tf.idf scores. We show that the use of such scores extends the N-gram distance measures in a way that allows us to accurately predict multiple quality parameters of the text, such as translation adequacy and fluency. However legitimate translation variation also reveals fundamental limits on the applicability of distance-based MT evaluation methods and on data-driven architectures for MT.

...read moreread less

17 citations

Posted Content•

Efficient Deterministic Single Round Document Exchange for Edit Distance

[...]

Djamal Belazzougui

30 Nov 2015-arXiv: Data Structures and Algorithms

TL;DR: This paper shows an efficient deterministic protocol that is efficient even for large numbers of (adversarial) edit errors, and is the first efficient Deterministic protocol for this problem, if efficiency is measured in both the message size and the running time.

...read moreread less

Abstract: Suppose that we have two parties that possess each a binary string. Suppose that the length of the first string (document) is $n$ and that the two strings (documents) have edit distance (minimal number of deletes, inserts and substitutions needed to transform one string into the other) at most $k$. The problem we want to solve is to devise an efficient protocol in which the first party sends a single message that allows the second party to guess the first party's string. In this paper we show an efficient deterministic protocol for this problem. The protocol runs in time $O(n\cdot \mathtt{polylog}(n))$ and has message size $O(k^2+k\log^2n)$ bits. To the best of our knowledge, ours is the first efficient deterministic protocol for this problem, if efficiency is measured in both the message size and the running time. As an immediate application of our new protocol, we show a new error correcting code that is efficient even for large numbers of (adversarial) edit errors.

...read moreread less

17 citations

Proceedings Article•DOI•

Indexing for subtree similarity-search using edit distance

[...]

Sara Cohen¹•Institutions (1)

Hebrew University of Jerusalem¹

22 Jun 2013

TL;DR: This paper proposes the first index structure for subtree similarity-search, provided that the unit cost function is used and extensive experimentation and comparison to previous work shows the huge improvement gained when using the proposed index structure and processing algorithm.

...read moreread less

Abstract: Given a tree Q and a large set of trees T = {T1,...,Tn}, the subtree similarity-search problem is that of finding the subtrees of trees among T that are most similar to Q, using the tree edit distance metric. Determining similarity using tree edit distance has been proven useful in a variety of application areas. While subtree similarity-search has been studied in the past, solutions required traversal of all of T, which poses a severe bottleneck in processing time, as T grows larger. This paper proposes the first index structure for subtree similarity-search, provided that the unit cost function is used. Extensive experimentation and comparison to previous work shows the huge improvement gained when using the proposed index structure and processing algorithm.

...read moreread less

17 citations

Training a Super Model Look-Alike: Featuring Edit Distance, N-Gram Occurrence, and One Reference Translation

[...]

Eva Forsbom¹•Institutions (1)

Uppsala University¹

01 Jan 2003

TL;DR: Preliminary experiments showed that the measures are not portable without redefinitions, so two new measures are defined, WAFT and NEVA, which could be applied for both purposes and granularities.

...read moreread less

Abstract: Two string comparison measures, edit distance and n-gram co-occurrence, are tested for automatic evaluation of translation quality, where the quality is compared to one or several reference translations The measures are tested in combination for diagnost

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics