scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Proceedings ArticleDOI
06 Aug 2009
TL;DR: A new method to improve tree edit distance approach to textual entailment recognition, using particle swarm optimization, by automatically estimating the optimal values of the cost operations over all RTE development datasets is introduced.
Abstract: This paper introduces a new method to improve tree edit distance approach to textual entailment recognition, using particle swarm optimization. Currently, one of the main constraints of recognizing textual entailment using tree edit distance is to tune the cost of edit operations, which is a difficult and challenging task in dealing with the entailment problem and datasets. We tried to estimate the cost of edit operations in tree edit distance algorithm automatically, in order to improve the results for textual entailment. Automatically estimating the optimal values of the cost operations over all RTE development datasets, we proved a significant enhancement in accuracy obtained on the test sets.

15 citations

Proceedings ArticleDOI
01 Oct 2012
TL;DR: It is suggested that generalizing to other objects such as sequences with other measures such as edit distance may lead to a theory of reconciling objects over graphs, which may have practical consequences for modern cloud-based deployments.
Abstract: We explore the connections between the classical problems of set difference and error correction codes, motivated by some recent results on Invertible Bloom Filters (communication-efficient set difference) and Biff Codes (fast error correction coding based on set difference). In particular, we seek to understand how these results generalize to settings where many parties communicate over a network represented by a graph, and the goal is for the parties to reconcile the objects owned by each, for some suitable definition of reconcile. Our general framework encompasses standard problems such as rumor spreading and network coding. We suggest that generalizing to other objects such as sequences with other measures such as edit distance may lead to a theory of reconciling objects over graphs. Such a theory may have practical consequences for modern cloud-based deployments.

15 citations

Journal Article
TL;DR: This paper uses an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs of a stochastic tree ED, and carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.
Abstract: Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.

15 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: A new type of convexity of regions is defined, which allows convex regions to have concavity to some extend and is called Near Convex Region Adjacency Graph (NCRAG), which is formulated as a sub graph matching problem for symbol spotting in graphical documents.
Abstract: This paper deals with a sub graph matching problem in Region Adjacency Graph (RAG) applied to symbol spotting in graphical documents. RAG is a very important, efficient and natural way of representing graphical information with a graph but this is limited to cases where the information is well defined with perfectly delineated regions. What if the information we are interested in is not confined within well defined regions? This paper addresses this particular problem and solves it by defining near convex grouping of oriented line segments which results in near convex regions. Pure convexity imposes hard constraints and can not handle all the cases efficiently. Hence to solve this problem we have defined a new type of convexity of regions, which allows convex regions to have concavity to some extend. We call this kind of regions Near Convex Regions (NCRs). These NCRs are then used to create the Near Convex Region Adjacency Graph (NCRAG) and with this representation we have formulated the problem of symbol spotting in graphical documents as a sub graph matching problem. For sub graph matching we have used the Approximate Edit Distance Algorithm (AEDA) on the neighborhood string, which starts working after finding a key node in the input or target graph and iteratively identifies similar nodes of the query graph in the neighborhood of the key node. The experiments are performed on artificial, real and distorted datasets.

15 citations

Posted Content
TL;DR: The problem of transforming a set of elements into another by a sequence of elementary edit operations, namely substitutions, removals and insertions of elements, can be formalized as an extension of the linear sum assignment problem (LSAP), which thus finds an optimal bijection between the two augmented sets.
Abstract: We consider the problem of transforming a set of elements into another by a sequence of elementary edit operations, namely substitutions, removals and insertions of elements. Each possible edit operation is penalized by a non-negative cost and the cost of a transformation is measured by summing the costs of its operations. A solution to this problem consists in defining a transformation having a minimal cost, among all possible transformations. To compute such a solution, the classical approach consists in representing removal and insertion operations by augmenting the two sets so that they get the same size. This allows to express the problem as a linear sum assignment problem (LSAP), which thus finds an optimal bijection (or permutation, perfect matching) between the two augmented sets. While the LSAP is known to be efficiently solvable in polynomial time complexity, for instance with the Hungarian algorithm, useless time and memory are spent to treat the elements which have been added to the initial sets. In this report, we show that the problem can be formalized as an extension of the LSAP which considers only one additional element in each set to represent removal and insertion operations. A solution to the problem is no longer represented as a bijection between the two augmented sets. We show that the considered problem is a binary linear program (BLP) very close to the LSAP. While it can be solved by any BLP solver, we propose an adaptation of the Hungarian algorithm which improves the time and memory complexities previously obtained by the approach based on the LSAP. The importance of the improvement increases as the size of the two sets and their absolute difference increase. Based on the analysis of the problem presented in this report, other classical algorithms can be adapted.

15 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139