scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Efficient implementation of the embedding that yield solutions to various computational problems involving edit distance, including sketching, communication complexity, nearest neighbor search are shown.
Abstract: We show that l0, 1rd endowed with edit distance embeds into e1 with distortion 2O(√log d log log d). We further show efficient implementation of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, nearest neighbor search. For all these problems, we improve upon previous bounds.

92 citations

Journal ArticleDOI
TL;DR: An intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition are introduced and a new paradigm, “graph probing,” is described for comparing the results returned by the recognition system and the representation created during ground-truthing.
Abstract: While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well.

92 citations

Journal ArticleDOI
TL;DR: Two new graph kernels applied to regression and classification problems are presented, one based on the notion of edit distance while the other based on subtrees enumeration.

91 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: A cheap, language and domain independent feature based on the minimum edit distance between strings yielded a significant improvement for data sets consisting of definite noun phrases and proper names, respectively.
Abstract: We report on experiments in reference resolution using a decision tree approach. We started with a standard feature set used in previous work, which led to moderate results. A closer examination of the performance of the features for different forms of anaphoric expressions showed good results for pronouns, moderate results for proper names, and poor results for definite noun phrases. We then included a cheap, language and domain independent feature based on the minimum edit distance between strings. This feature yielded a significant improvement for data sets consisting of definite noun phrases and proper names, respectively. When applied to the whole data set the feature produced a smaller but still significant improvement.

90 citations

Journal ArticleDOI
TL;DR: Evidence is provided that the near-quadratic running time bounds known for the problem of computing edit distance might be tight, and it is shown that if the edit distance can be computed in time $O(n^{2-\delta})$ for some constant $\delta>0$, then the satisfiability of conjunctive normal form formulas with $N$ variables and $M$ clauses can be solved in time.
Abstract: The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions, or substitutions of symbols needed to transform one string into an...

90 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139