scispace - formally typeset
Journal ArticleDOI

Efficient Computation of the Tree Edit Distance

Reads0
Chats0
TLDR
RTED is shown optimal among all algorithms that use LRH (left-right-heavy) strategies, which include RTED and the fastest tree edit distance algorithms presented in literature.
Abstract
We consider the classical tree edit distance between ordered labelled trees, which is defined as the minimum-cost sequence of node edit operations that transform one tree into another. The state-of-the-art solutions for the tree edit distance are not satisfactory. The main competitors in the field either have optimal worst-case complexity but the worst case happens frequently, or they are very efficient for some tree shapes but degenerate for others. This leads to unpredictable and often infeasible runtimes. There is no obvious way to choose between the algorithms.In this article we present RTED, a robust tree edit distance algorithm. The asymptotic complexity of our algorithm is smaller than or equal to the complexity of the best competitors for any input instance, that is, our algorithm is both efficient and worst-case optimal. This is achieved by computing a dynamic decomposition strategy that depends on the input trees. RTED is shown optimal among all algorithms that use LRH (left-right-heavy) strategies, which include RTED and the fastest tree edit distance algorithms presented in literature. In our experiments on synthetic and real-world data we empirically evaluate our solution and compare it to the state-of-the-art.

read more

Citations
More filters
Journal ArticleDOI

Tree edit distance

TL;DR: A new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance), which runs at least as fast as RTED without trading in memory efficiency and develops new single-path functions which are better in terms of runtime and memory than the previously used functions.
Proceedings ArticleDOI

FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm

TL;DR: A Chinese spell checker – FASPell based on a new paradigm which consists of a denoising autoencoder (DAE) and a decoder and helps to eliminate the use of confusion set that is deficient in flexibility and sufficiency of utilizing the salient feature of Chinese character similarity.
Proceedings ArticleDOI

Navigating the Maze of Wikidata Query Logs

TL;DR: This paper provides an in-depth and diversified analysis of the Wikidata query logs, recently made publicly available, providing a thorough characterization of the queries in terms of their expressive power, their topological structure and shape, along with a deeper understanding of the usage of recursion in these logs.
Posted Content

Tree-Transformer: A Transformer-Based Method for Correction of Tree-Structured Data

TL;DR: This paper presents the Tree-Transformer, a novel neural network architecture designed to translate between arbitrary input and output trees, and applied this architecture to correction tasks in both the source code and natural language domains.
Proceedings ArticleDOI

Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering

TL;DR: ParaBank 2 is described, a new resource that contains multiple diverse sentential paraphrases, produced from a bilingual corpus using negative constraints, inference sampling, and clustering, showing that ParaBank 2 significantly surpasses prior work in both lexical and syntactic diversity while being meaning-preserving.
References
More filters
Journal ArticleDOI

Simple fast algorithms for the editing distance between trees and related problems

TL;DR: Algorithms are designed to answer the following kinds of questions about trees: what is the distance between two trees, and the analogous question for prunings as for subtrees.
Journal ArticleDOI

A data structure for dynamic trees

TL;DR: An O(mn log n)-time algorithm is obtained to find a maximum flow in a network of n vertices and m edges, beating by a factor of log n the fastest algorithm previously known for sparse graphs.
Journal ArticleDOI

The Tree-to-Tree Correction Problem

TL;DR: An algorithm is presented which solves the problem of determining the distance from T to T' as measured by the mlmmum cost sequence of edit operaUons needed to transform T into T'.
Journal ArticleDOI

A survey on tree edit distance and related problems

TL;DR: This work surveys the problem of comparing labeled trees based on simple local operations of deleting, inserting, and relabeling nodes and presents one or more of the central algorithms for solving the problem.
Proceedings ArticleDOI

A data structure for dynamic trees

TL;DR: An O(mn log n)-time algorithm is obtained to find a maximum flow in a network of n vertices and m edges, beating by a factor of log n the fastest algorithm previously known for sparse graphs.