Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Quantitative relaxation of concurrent data structures

[...]

Thomas A. Henzinger¹, Christoph M. Kirsch², Hannes Payer², Ali Sezgin¹, Ana Sokolova² - Show less +1 more•Institutions (2)

Institute of Science and Technology Austria¹, University of Salzburg²

23 Jan 2013

TL;DR: This work presents a systematic and formal framework for obtaining new data structures by quantitatively relaxing existing ones, and gives concurrent implementations of relaxed data structures and demonstrates that bounded relaxations provide the means for trading correctness for performance in a controlled way.

...read moreread less

Abstract: There is a trade-off between performance and correctness in implementing concurrent data structures. Better performance may be achieved at the expense of relaxing correctness, by redefining the semantics of data structures. We address such a redefinition of data structure semantics and present a systematic and formal framework for obtaining new data structures by quantitatively relaxing existing ones. We view a data structure as a sequential specification S containing all "legal" sequences over an alphabet of method calls. Relaxing the data structure corresponds to defining a distance from any sequence over the alphabet to the sequential specification: the k-relaxed sequential specification contains all sequences over the alphabet within distance k from the original specification. In contrast to other existing work, our relaxations are semantic (distance in terms of data structure states). As an instantiation of our framework, we present two simple yet generic relaxation schemes, called out-of-order and stuttering relaxation, along with several ways of computing distances. We show that the out-of-order relaxation, when further instantiated to stacks, queues, and priority queues, amounts to tolerating bounded out-of-order behavior, which cannot be captured by a purely syntactic relaxation (distance in terms of sequence manipulation, e.g. edit distance). We give concurrent implementations of relaxed data structures and demonstrate that bounded relaxations provide the means for trading correctness for performance in a controlled way. The relaxations are monotonic which further highlights the trade-off: increasing k increases the number of permitted sequences, which as we demonstrate can lead to better performance. Finally, since a relaxed stack or queue also implements a pool, we actually have new concurrent pool implementations that outperform the state-of-the-art ones.

...read moreread less

107 citations

Journal Article•DOI•

A comparison of approximate string matching algorithms

[...]

Petteri Jokinen¹, Jorma Tarhio¹, Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Dec 1996-Software - Practice and Experience

TL;DR: It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.

...read moreread less

Abstract: Experimental comparisons of the running time of approximate string matching algorithms for the k differences problem are presented. Given a pattern string, a text string, and an integer k, the task is to find all approximate occurrences of the pattern in the text with at most k differences (insertions, deletions, changes). We consider seven algorithms based on different approaches including dynamic programming, Boyer-Moore string matching, suffix automata, and the distribution of characters. It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.

...read moreread less

106 citations

Proceedings Article•DOI•

Efficient exact edit similarity query processing with the asymmetric signature scheme

[...]

Jianbin Qin¹, Wei Wang¹, Yifei Lu¹, Chuan Xiao¹, Xuemin Lin¹ - Show less +1 more•Institutions (1)

University of New South Wales¹

12 Jun 2011

TL;DR: This paper shows that the minimum signature size lower bound is t +1, and proposes asymmetric signature schemes that achieve this lower bound, and develops efficient query processing algorithms based on the new scheme.

...read moreread less

Abstract: Given a query string Q, an edit similarity search finds all strings in a database whose edit distance with Q is no more than a given threshold t. Most existing method answering edit similarity queries rely on a signature scheme to generate candidates given the query string. We observe that the number of signatures generated by existing methods is far greater than the lower bound, and this results in high query time and index space complexities. In this paper, we show that the minimum signature size lower bound is t +1. We then propose asymmetric signature schemes that achieve this lower bound. We develop efficient query processing algorithms based on the new scheme. Several dynamic programming-based candidate pruning methods are also developed to further speed up the performance. We have conducted a comprehensive experimental study involving nine state-of-the-art algorithms. The experiment results clearly demonstrate the efficiency of our methods.

...read moreread less

106 citations

Journal Article•

Edit distance with move operations

[...]

Dana Shapira¹, James A. Storer¹•Institutions (1)

Brandeis University¹

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: This work considers the more general problem of strings being represented by a singly linked list and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex, and shows that this problem is NP-complete.

...read moreread less

Abstract: The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform one string into another. Here we consider the more general problem of strings being represented by a singly linked list (one character per node) and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex. That is, in O(1) time, not only can characters be inserted or deleted, but also substrings can be moved or deleted. We limit our attention to the ability to move substrings and leave substring deletions for future research. Note that O(1) time substring move operations imply O(1) substring exchange operations as well, a form of transformation that has been of interest in molecular biology. We show that this problem is NP-complete, show that a recursive sequence of moves can be simulated with at most a constant factor increase by a non-recursive sequence, and present a polynomial time greedy algorithm for non-recursive moves with a worst-case log factor approximation to optimal. The development of this greedy algorithm shows how to reduce moves of substrings to moves of characters, and how to convert moves with characters to only insert and deletes of characters.

...read moreread less

106 citations

Proceedings Article•DOI•

Matching and indexing sequences of different lengths

[...]

Tolga Bozkaya¹, Nasser Yazdani¹, Meral Ozsoyoglu¹•Institutions (1)

Case Western Reserve University¹

01 Jan 1997

TL;DR: This paper proposes an indexing scheme which is totally based on lengths and relative distances between sequences, and uses vp-trees as the underlying distance-based index structures in its method.

...read moreread less

Abstract: In this paper, we consider the problem of efficient matching and retrieval of sequences of different lengths. Most of the previous research is concentrated on similarity matching and retrieval of sequences of the same length using Euclidean distance metric. For similarity matching of sequences, we use a modified version of the edit distance function, and consider two sequences matching if a majority of the elements in the sequences match. In the matching process a mapping among non-matching elements is created to check if there are unacceptable deviations among them. This means that two matching sequences should have lengths that are comparable. For efficient retrieval of matching sequences, we propose an indexing scheme which is totally based on lengths and relative distances between sequences. We use vp-trees as the underlying distance-based index structures in our method.

...read moreread less

105 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics