scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: An algorithm to compute the constrained edit distance subject to any arbitrary edit constraint involving the number and type of edit operations to be performed has been presented and demonstrates remarkable accuracy.
Abstract: Let X* be any unknown word from a finite dictionary H Let U be any arbitrary subsequence of X* We consider the problem of estimating X* by processing Y, which is a noisy version of U We do this by defining the constrained edit distance between XH and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed An algorithm to compute this constrained edit distance has been presented Although in general the algorithm has a cubic time complexity, within the framework of our solution the algorithm possesses a quadratic time complexity Recognition using the constrained edit distance as a criterion demonstrates remarkable accuracy Experimental results which involve strings of lengths between 40 and 80 and which contain an average of 26547 errors per string demonstrate that the scheme has about 995 percent accuracy

55 citations

Book ChapterDOI
03 Apr 2002
TL;DR: A radically new indexing approach for approximate string matching where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space.
Abstract: We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We build a metric space where the sites are the nodes of the suffix tree of the text, and the approximate query is seen as a proximity query on that metric space. This permits us finding the R occurrences of a pattern of length m in a text of length n in average time O(mlog2 n+m2+R), using O(n log n) space and O(n log2 n) index construction time. This complexity improves by far over all other previous methods. We also show a simpler scheme needing O(n) space.

55 citations

Proceedings Article
01 Aug 2013
TL;DR: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012).
Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.

55 citations

Proceedings ArticleDOI
31 May 2009
TL;DR: This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n(1/3+o(1)) approximation.
Abstract: We show how to compute the edit distance between two strings of length n up to a factor of 2(O-tilde(sqrt(log n))) in n(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n(1/3+o(1)) approximation. Previously, approximation of 2O √log n) was known only for embedding edit distance into l1, and it is not known if that embedding can be computed in less than a quadratic time.

55 citations

Posted Content
TL;DR: In this paper, the first sub-polynomial approximation algorithm for the edit distance between two strings of length n up to a factor of 2^(1+o(1)) was presented.
Abstract: We show how to compute the edit distance between two strings of length n up to a factor of 2^{\~O(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{\~O(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time.

54 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139