scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Proceedings ArticleDOI
01 Aug 2019
TL;DR: An extension of the edit distance is proposed, which achieves better human correlation, whilst remaining fast, flexible and easy to understand.
Abstract: Over the years a number of machine translation metrics have been developed in order to evaluate the accuracy and quality of machine-generated translations. Metrics such as BLEU and TER have been used for decades. However, with the rapid progress of machine translation systems, the need for better metrics is growing. This paper proposes an extension of the edit distance, which achieves better human correlation, whilst remaining fast, flexible and easy to understand.

23 citations

Proceedings Article
07 Jun 2012
TL;DR: This paper describes Stanford University's submission to the Shared Evaluation Task of WMT 2012, where the proposed metric (SPEDE) computes probabilistic edit distance as predictions of translation quality as well as a novel pushdown automaton extension of the pFSM model.
Abstract: This paper describes Stanford University's submission to the Shared Evaluation Task of WMT 2012. Our proposed metric (SPEDE) computes probabilistic edit distance as predictions of translation quality. We learn weighted edit distance in a probabilistic finite state machine (pFSM) model, where state transitions correspond to edit operations. While standard edit distance models cannot capture long-distance word swapping or cross alignments, we rectify these shortcomings using a novel pushdown automaton extension of the pFSM model. Our models are trained in a regression framework, and can easily incorporate a rich set of linguistic features. Evaluated on two different prediction tasks across a diverse set of datasets, our methods achieve state-of-the-art correlation with human judgments.

23 citations

Proceedings ArticleDOI
08 Nov 1993
TL;DR: The experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base, and the matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit.
Abstract: The authors present the design of a tool for classifying office documents. They represent a document's layout structure using an ordered labeled tree, called the layout structure tree (L-S-tree), based on a nested segmentation procedure. The tool uses a sample-based approach for learning, where concepts are learned by retaining samples and new documents are classified by matching their L-S-trees with samples. The matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit, and calculating the degree of conceptual closeness between the documents and samples. The experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base.

23 citations

Journal ArticleDOI
TL;DR: The method hierarchically matches the nodes in a road network using the Minimum Road Edit Distance and eliminates false matching nodes using M‐estimators regardless of differences in LoDs and road‐network coordinate systems.
Abstract: This article presents an approach to hierarchical matching of nodes in heterogeneous road networks inthe same urban area. Heterogeneous road networks not only exist at different levels of detail (LoD), butalso have different coordinate systems, leading to difficulties in matching and integrating them. To over-come these difficulties, a pattern-based method was implemented. Based on the authors’ previous work ondetecting patterns of divided highways, complex road junctions, and strokes to eliminate the LoD effectof road networks, the proposed method extracts the local networks around each node in a road networkand uses them as the matching units for the nodes. Second, the degree of shape similarity between thematching units is measured using a Minimum Road Edit Distance based on a transformation. Finally, theproposed method hierarchically matches the nodes in a road network using the Minimum Road Edit Dis-tance and eliminates false matching nodes using M-estimators. An experiment involving matching hetero-geneous road networks with different LoDs and coordinate systems was carried out to verify the validityof the proposed method. The method achieves good and effective matching regardless of differences inLoDs and road-network coordinate systems.

23 citations

Journal Article
TL;DR: This paper proposes and algorithm with finds the minimum distance t such that P is a t-approximate cover of T, which is an approximate version of covers.
Abstract: Repetitive strings have been studied in such diverse fields as molecular biology data compression etc. Some important regularities that have been studied are perods, covers seeds and squares. A natural extension of the repetition problems is to allow errors. Among the four notions above aproximate squares and approximate periodes have been studied. In this paper, we introduce the notion of approximate covers which is an approximate version of covers. Given two strings P(|P|=m) and T(|T|=n) we propose and algorithm with finds the minimum distance t such that P is a t-approximate cover of T. The algorithm take O(m,n) time for the edit distance and time of finding a string which is an approximate cover of T is minimum distance is NP-complete.

23 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139