Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Lossless filter for multiple repeats with bounded edit distance

[...]

Pierre Peterlongo¹, Gustavo Sacomoto², Alair Pereira do Lago², Nadia Pisanti³, Marie-France Sagot⁴, Marie-France Sagot⁵ - Show less +2 more•Institutions (5)

Centre national de la recherche scientifique¹, University of São Paulo², University of Pisa³, French Institute for Research in Computer Science and Automation⁴, University of Cambridge⁵

30 Jan 2009-Algorithms for Molecular Biology

TL;DR: TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length and is particularly useful with large error rates.

...read moreread less

Abstract: Identifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice. The filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment. To the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.

...read moreread less

20 citations

Proceedings Article•DOI•

Lower bounds for edit distance and product metrics via Poincaré-type inequalities

[...]

Alexandr Andoni¹, T. S. Jayram², Mihai Patrascu³•Institutions (3)

Princeton University¹, IBM², AT&T Labs³

17 Jan 2010

TL;DR: It is proved that any sketching protocol for edit distance achieving a constant approximation requires nearly logarithmic (in the strings' length) communication complexity, and an intimate connection between non-embeddability, sketching and communication complexity is suggested.

...read moreread less

Abstract: We prove that any sketching protocol for edit distance achieving a constant approximation requires nearly logarithmic (in the strings' length) communication complexity. This is an exponential improvement over the previous, doubly-logarithmic, lower bound of [Andoni-Krauthgamer, FOCS'07]. Our lower bound also applies to the Ulam distance (edit distance over non-repetitive strings). In this special case, it is polynomially related to the recent upper bound of [Andoni-Indyk-Krauthgamer, SODA'09].From a technical perspective, we prove a direct-sum theorem for sketching product metrics that is of independent interest. We show that, for any metric X that requires sketch size which is a sufficiently large constant, sketching the max-product metric ld∞(X) requires Ω(d) bits. The conclusion, in fact, also holds for arbitrary two-way communication. The proof uses a novel technique for information complexity based on Poincare inequalities and suggests an intimate connection between non-embeddability, sketching and communication complexity.

...read moreread less

20 citations

Book Chapter•DOI•

[...]

Dandy Fenz¹, Dustin Lange¹, Astrid Rheinländer², Felix Naumann¹, Ulf Leser² - Show less +1 more•Institutions (2)

Hasso Plattner Institute¹, Humboldt University of Berlin²

25 Jun 2012

TL;DR: The State Set Index (SSI) is introduced, based on a trie (prefix index) that is interpreted as a nondeterministic finite automaton, and implements a novel state labeling strategy making the index highly space-efficient.

...read moreread less

Abstract: String similarity search is required by many real-life applications, such as spell checking, data cleansing, fuzzy keyword search, or comparison of DNA sequences. Given a very large string set and a query string, the string similarity search problem is to efficiently find all strings in the string set that are similar to the query string. Similarity is defined using a similarity (or distance) measure, such as edit distance or Hamming distance. In this paper, we introduce the State Set Index (SSI) as an efficient solution for this search problem. SSI is based on a trie (prefix index) that is interpreted as a nondeterministic finite automaton. SSI implements a novel state labeling strategy making the index highly space-efficient. Furthermore, SSI's space consumption can be gracefully traded against search time. We evaluated SSI on different sets of person names with up to 170 million strings from a social network and compared it to other state-of-the-art methods. We show that in the majority of cases, SSI is significantly faster than other tools and requires less index space.

...read moreread less

20 citations

Journal Article•DOI•

Differential Edit Distance: A Metric for Scene Segmentation Evaluation

[...]

Panagiotis Sidiropoulos¹, Vasileios Mezaris¹, Ioannis Kompatsiaris¹, Josef Kittler²•Institutions (2)

Information Technology Institute¹, University of Surrey²

01 Jun 2012-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A novel unidimensional measure is introduced that is proven to be metric and satisfies a number of qualitative prerequisites that previous measures do not, and that is effective in evaluating scene segmentation techniques and in helping to optimize their parameters.

...read moreread less

Abstract: In this paper, a novel approach to evaluating video temporal decomposition algorithms is presented. The evaluation measures typically used to this end are nonlinear combinations of precision-recall or coverage-overflow, which are not metrics and additionally possess undesirable properties, such as nonsymmetricity. To alleviate these drawbacks, we introduce a novel unidimensional measure that is proven to be metric and satisfies a number of qualitative prerequisites that previous measures do not. This measure is named differential edit distance (DED), since it can be seen as a variation of the well-known edit distance. After defining DED, we further introduce an algorithm that computes it in less than cubic time. Finally, DED is extensively compared with state-of-the-art measures, namely, the harmonic means (F-score) of precision-recall and coverage-overflow. The experiments include comparisons of qualitative properties, the time required for optimizing the parameters of scene segmentation algorithms with the help of these measures, and a user study gauging the agreement of these measures with the users' assessment of the segmentation results. The results confirm that the proposed measure is a unidimensional metric that is effective in evaluating scene segmentation techniques and in helping to optimize their parameters.

...read moreread less

20 citations

Proceedings Article•

Efficient XML Structural Similarity Detection using Sub-tree Commonalities

[...]

Joe Tekli, Richard Chbeir, Kokou Yetongnon¹•Institutions (1)

University of Burgundy¹

01 Jan 2007

TL;DR: An improved comparison method is provided based on the concept of tree edit distance, introducing the notion of commonality between sub-trees, which yields better similarity results with respect to alternative methods, while maintaining quatratic time complexity.

...read moreread less

Abstract: Developing efficient techniques for comparing XML-based documents becomes essential in the database and information retrieval communities. Various algorithms for comparing hierarchically structured data, e.g. XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered labeled trees. Nevertheless, a thorough investigation of current approaches led us to identify several unaddressed structural similarities, i.e. sub-tree related similarities, while comparing XML documents. In this paper, we provide an improved comparison method to deal with such resemblances. Our approach is based on the concept of tree edit distance, introducing the notion of commonality between sub-trees. Experiments demonstrate that our approach yields better similarity results with respect to alternative methods, while maintaining quatratic time complexity.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics