Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Optimizing Textual Entailment Recognition Using Particle Swarm Optimization

[...]

Yashar Mehdad¹, Bernardo Magnini•Institutions (1)

University of Trento¹

06 Aug 2009

TL;DR: A new method to improve tree edit distance approach to textual entailment recognition, using particle swarm optimization, by automatically estimating the optimal values of the cost operations over all RTE development datasets is introduced.

...read moreread less

Abstract: This paper introduces a new method to improve tree edit distance approach to textual entailment recognition, using particle swarm optimization. Currently, one of the main constraints of recognizing textual entailment using tree edit distance is to tune the cost of edit operations, which is a difficult and challenging task in dealing with the entailment problem and datasets. We tried to estimate the cost of edit operations in tree edit distance algorithm automatically, in order to improve the results for textual entailment. Automatically estimating the optimal values of the cost operations over all RTE development datasets, we proved a significant enhancement in accuracy obtained on the test sets.

...read moreread less

15 citations

Proceedings Article•DOI•

The complexity of object reconciliation, and open problems related to set difference and coding

[...]

Michael Mitzenmacher¹, George Varghese²•Institutions (2)

Harvard University¹, University of California, San Diego²

01 Oct 2012

TL;DR: It is suggested that generalizing to other objects such as sequences with other measures such as edit distance may lead to a theory of reconciling objects over graphs, which may have practical consequences for modern cloud-based deployments.

...read moreread less

Abstract: We explore the connections between the classical problems of set difference and error correction codes, motivated by some recent results on Invertible Bloom Filters (communication-efficient set difference) and Biff Codes (fast error correction coding based on set difference). In particular, we seek to understand how these results generalize to settings where many parties communicate over a network represented by a graph, and the goal is for the parties to reconcile the objects owned by each, for some suitable definition of reconcile. Our general framework encompasses standard problems such as rumor spreading and network coding. We suggest that generalizing to other objects such as sequences with other measures such as edit distance may lead to a theory of reconciling objects over graphs. Such a theory may have practical consequences for modern cloud-based deployments.

...read moreread less

15 citations

Journal Article•

Learning stochastic tree edit distance

[...]

Marc Bernard¹, Amaury Habrard², Marc Sebban¹•Institutions (2)

Jean Monnet University¹, University of Provence²

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: This paper uses an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs of a stochastic tree ED, and carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.

...read moreread less

Abstract: Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.

...read moreread less

15 citations

Proceedings Article•DOI•

Near Convex Region Adjacency Graph and Approximate Neighborhood String Matching for Symbol Spotting in Graphical Documents

[...]

Anjan Dutta¹, Josep Lladós¹, Horst Bunke², Umapada Pal•Institutions (2)

Autonomous University of Barcelona¹, University of Bern²

25 Aug 2013

TL;DR: A new type of convexity of regions is defined, which allows convex regions to have concavity to some extend and is called Near Convex Region Adjacency Graph (NCRAG), which is formulated as a sub graph matching problem for symbol spotting in graphical documents.

...read moreread less

Abstract: This paper deals with a sub graph matching problem in Region Adjacency Graph (RAG) applied to symbol spotting in graphical documents. RAG is a very important, efficient and natural way of representing graphical information with a graph but this is limited to cases where the information is well defined with perfectly delineated regions. What if the information we are interested in is not confined within well defined regions? This paper addresses this particular problem and solves it by defining near convex grouping of oriented line segments which results in near convex regions. Pure convexity imposes hard constraints and can not handle all the cases efficiently. Hence to solve this problem we have defined a new type of convexity of regions, which allows convex regions to have concavity to some extend. We call this kind of regions Near Convex Regions (NCRs). These NCRs are then used to create the Near Convex Region Adjacency Graph (NCRAG) and with this representation we have formulated the problem of symbol spotting in graphical documents as a sub graph matching problem. For sub graph matching we have used the Approximate Edit Distance Algorithm (AEDA) on the neighborhood string, which starts working after finding a key node in the input or target graph and iteratively identifies similar nodes of the query graph in the neighborhood of the key node. The experiments are performed on artificial, real and distorted datasets.

...read moreread less

15 citations

Posted Content•

Linear Sum Assignment with Edition

[...]

Sébastien Bougleux¹, Luc Brun²•Institutions (2)

University of Caen Lower Normandy¹, École nationale supérieure d'ingénieurs de Caen²

14 Mar 2016-arXiv: Data Structures and Algorithms

TL;DR: The problem of transforming a set of elements into another by a sequence of elementary edit operations, namely substitutions, removals and insertions of elements, can be formalized as an extension of the linear sum assignment problem (LSAP), which thus finds an optimal bijection between the two augmented sets.

...read moreread less

Abstract: We consider the problem of transforming a set of elements into another by a sequence of elementary edit operations, namely substitutions, removals and insertions of elements. Each possible edit operation is penalized by a non-negative cost and the cost of a transformation is measured by summing the costs of its operations. A solution to this problem consists in defining a transformation having a minimal cost, among all possible transformations. To compute such a solution, the classical approach consists in representing removal and insertion operations by augmenting the two sets so that they get the same size. This allows to express the problem as a linear sum assignment problem (LSAP), which thus finds an optimal bijection (or permutation, perfect matching) between the two augmented sets. While the LSAP is known to be efficiently solvable in polynomial time complexity, for instance with the Hungarian algorithm, useless time and memory are spent to treat the elements which have been added to the initial sets. In this report, we show that the problem can be formalized as an extension of the LSAP which considers only one additional element in each set to represent removal and insertion operations. A solution to the problem is no longer represented as a bijection between the two augmented sets. We show that the considered problem is a binary linear program (BLP) very close to the LSAP. While it can be solved by any BLP solver, we propose an adaptation of the Hungarian algorithm which improves the time and memory complexities previously obtained by the approach based on the LSAP. The importance of the improvement increases as the size of the two sets and their absolute difference increase. Based on the analysis of the problem presented in this report, other classical algorithms can be adapted.

...read moreread less

15 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics