Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hardness Results for the Center and Median String Problems under the Weighted and Unweighted Edit Distances

[...]

François Nicolas¹, Eric Rivals¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jun 2005-Journal of Discrete Algorithms

TL;DR: This work provides an algorithm to compute an optimal center under a weighted edit distance in polynomial time when the number of input strings is fixed and gives the complexity of the related Center String problem.

...read moreread less

56 citations

Proceedings Article•

The Best Lexical Metric for Phrase-Based Statistical MT System Optimization

[...]

Daniel Cer¹, Christopher D. Manning¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

02 Jun 2010

TL;DR: It is shown that people tend to prefer BLEU and NIST trained models to those trained on edit distance based metrics like TER or WER, and that using BLEu or NIST produces models that are more robust to evaluation by other metrics and perform well in human judgments.

...read moreread less

Abstract: Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU, METEOR, NIST, TER) affects the resulting model. We train a state-of-the-art MT system using MERT on many parameterizations of each metric and evaluate the resulting models on the other metrics and also using human judges. In accordance with popular wisdom, we find that it's important to train on the same metric used in testing. However, we also find that training to a newer metric is only useful to the extent that the MT model's structure and features allow it to take advantage of the metric. Contrasting with TER's good correlation with human judgments, we show that people tend to prefer BLEU and NIST trained models to those trained on edit distance based metrics like TER or WER. Human preferences for METEOR trained models varies depending on the source language. Since using BLEU or NIST produces models that are more robust to evaluation by other metrics and perform well in human judgments, we conclude they are still the best choice for training.

...read moreread less

56 citations

Book Chapter•DOI•

Non-adjacent Digrams Improve Matching of Cross-Lingual Spelling Variants

[...]

Heikki Keskustalo¹, Ari Pirkola¹, Kari Visala¹, Erkka Leppänen¹, Kalervo Järvelin¹ - Show less +1 more•Institutions (1)

University of Tampere¹

08 Oct 2003

TL;DR: This paper established the best method among six baseline matching methods for each language pair and tested novel matching methods based on binary digrams formed of both adjacent and non-adjacent characters of words that consistently outperformed all baseline methods.

...read moreread less

Abstract: Untranslatable query keys pose a problem in dictionary-based cross-language information retrieval (CLIR). One solution consists of using approximate string matching methods for finding the spelling variants of the source key among the target database index. In such a setting, it is important to select a matching method suited especially for CLIR. This paper focuses on comparing the effectiveness of several matching methods in a cross-lingual setting. Search words from five domains were expressed in six languages (French, Spanish, Italian, German, Swedish, and Finnish). The target data consisted of the index of an English full-text database. In this setting, we first established the best method among six baseline matching methods for each language pair. Secondly, we tested novel matching methods based on binary digrams formed of both adjacent and non-adjacent characters of words. The latter methods consistently outperformed all baseline methods.

...read moreread less

55 citations

Journal Article•DOI•

A simple algorithm for detecting circular permutations in proteins

[...]

Shai Uliel¹, Amit Fliess, Amihood Amir, Ron Unger¹•Institutions (1)

Bar-Ilan University¹

01 Nov 1999

TL;DR: A simple and efficient algorithm that runs in time N2 is presented, based on duplicating one of the two sequences, and then performing a modified version of the standard dynamic programming algorithm, that performs very well.

...read moreread less

Abstract: Motivation: Circular permutation of a protein is a genetic operation in which part of the C-terminal of the protein is moved to its N-terminal. Recently, it has been shown that proteins that undergo engineered circular permutations generally maintain their three dimensional structure and biological function. This observation raises the possibility that circular permutation has occured in Nature during evolution. In this scenario a protein underwent circular permutation into another protein, thereafter both proteins further diverged by standard genetic operations. To study this possibility one needs an efficient algorithm that for a given pair of proteins can detect the underlying event of circular permutations. A possible formal description of the question is: given two sequences, find a circular permutation of one of them under which the edit distance between the proteins is minimal. A naive algorithm might take time proportional to N 3 or even N 4 , which is prohibitively slow for a large-scale survey. A sophisticated algorithm that runs in asymptotic time of N 2 was recently suggested, but it is not practical for a large-scale survey. Results: A simple and efficient algorithm that runs in time N 2 is presented. The algorithm is based on duplicating one of the two sequences, and then performing a modified version of the standard dynamic programming algorithm. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well. Availability: A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors.

...read moreread less

55 citations

Proceedings Article•DOI•

Approximate string matching: a simpler faster algorithm

[...]

Richard Cole¹, Ramesh Hariharan²•Institutions (2)

New York University¹, Indian Institute of Science²

01 Jan 1998

TL;DR: This article gave two algorithms for finding all approximate matches of a pattern in a text, where the edit distance between the pattern and the matching text substring is at most k. The first algorithm, which is quite simple, runs in time O( nk 3 m + n + m) on all patterns except k-break periodic strings.

...read moreread less

Abstract: We give two algorithms for finding all approximate matches of a pattern in a text, where the edit distance between the pattern and the matching text substring is at most k The first algorithm, which is quite simple, runs in time O( nk 3 m + n + m) on all patterns except k-break periodic strings (defined later) The second algorithm runs in time O( nk 4 m + n + m )o nk-break periodic patterns The two classes of patterns are easily distinguished in O(m) time

...read moreread less

55 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics