Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Approximating the true evolutionary distance between two genomes

[...]

Krister M. Swenson¹, Mark Marron², Joel V. Earnest-DeYoung², Bernard M. E. Moret³•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, University of New Mexico², Swiss Institute of Bioinformatics³

29 Aug 2008-ACM Journal of Experimental Algorithms

TL;DR: The approach to compute distances between two arbitrary genomes is generalized, but focus on approximating the true evolutionary distance rather than the edit distance, and the distances produced are good enough to enable the simple neighbor-joining procedure to reconstruct the authors' test trees with high accuracy.

...read moreread less

Abstract: As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we generalize our approach to compute distances between two arbitrary genomes, but focus on approximating the true evolutionary distance rather than the edit distance. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable the simple neighbor-joining procedure to reconstruct our test trees with high accuracy.

...read moreread less

79 citations

Proceedings Article•

Speeding up Graph Edit Distance Computation with a Bipartite Heuristic

[...]

Kaspar Riesen, Stefan Fankhauser, Horst Bunke

01 Jan 2007

TL;DR: The idea is to use a fast but suboptimal bipartite graph matching algorithm as a heuristic function that estimates the future costs so that it is guaranteed to return the exact graph edit distance of two given graphs.

...read moreread less

Abstract: Graph edit distance is a dissimilarity measure for arbitrarily structured and arbitrarily labeled graphs. In contrast with other approaches, it does not suffer from any restrictions and can be applied to any type of graph, including hypergraphs [1]. Graph edit distance can be used to address various graph classification problems with different methods, for instance, k-nearest-neighbor classifier (k-NN), graph embedding classifier [2], or classification with graph kernel machines [3]. The main drawback of graph edit distance is its computational complexity which is exponential in the number of nodes of the involved graphs. Consequently, computation of graph edit distance is feasible for graphs of rather small size only. In order to overcome this restriction, a number of fast but suboptimal methods have been proposed in the literature (e.g. [4]). In the present paper we aim at speeding up the computation of exact graph edit distance. We propose to combine the standard tree search approach to graph edit distance computation with the suboptimal procedure described in [4]. The idea is to use a fast but suboptimal bipartite graph matching algorithm as a heuristic function that estimates the future costs. The overhead for computing this heuristic function is small, and easily compensated by the speed-up achieved in tree traversal. Since the heuristic function provides us with a lower bound of the future costs, it is guaranteed to return the exact graph edit distance of two given graphs.

...read moreread less

77 citations

Journal Article•DOI•

The reduced graph descriptor in virtual screening and data-driven clustering of high-throughput screening data.

[...]

Gavin Harper¹, Gianpaolo Bravi¹, Stephen D. Pickett¹, Jameed Hussain¹, Darren V. S. Green¹ - Show less +1 more•Institutions (1)

GlaxoSmithKline¹

04 Sep 2004-Journal of Chemical Information and Computer Sciences

TL;DR: Improvements to previously published methods for similarity searching with reduced graphs are described, with a particular focus on ligand-based virtual screening, and a novel use of reduced graphs in the clustering of high-throughput screening data is described.

...read moreread less

Abstract: Virtual screening and high-throughput screening are two major components of lead discovery within the pharmaceutical industry. In this paper we describe improvements to previously published methods for similarity searching with reduced graphs, with a particular focus on ligand-based virtual screening, and describe a novel use of reduced graphs in the clustering of high-throughput screening data. Literature methods for reduced graph similarity searching encode the reduced graphs as binary fingerprints, which has a number of issues. In this paper we extend the definition of the reduced graph to include positively and negatively ionizable groups and introduce a new method for measuring the similarity of reduced graphs based on a weighted edit distance. Moving beyond simple similarity searching, we show how more flexible queries can be built using reduced graphs and describe a database system that allows iterative querying with multiple representations. Reduced graphs capture many important features of ligand...

...read moreread less

77 citations

Journal Article•DOI•

Exact Reconstruction From Insertions in Synchronization Codes

[...]

Frederic Sala¹, Ryan Gabrys², Clayton Schoeny¹, Lara Dolecek¹•Institutions (2)

University of California, Los Angeles¹, Space and Naval Warfare Systems Center Pacific²

01 Apr 2017-IEEE Transactions on Information Theory

TL;DR: In this paper, an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance was introduced, yielding an upper bound on the number of distinct traces necessary to guarantee exact reconstruction.

...read moreread less

Abstract: This paper studies problems in data reconstruction, an important area with numerous applications. In particular, we examine the reconstruction of binary and nonbinary sequences from synchronization (insertion/deletion-correcting) codes. These sequences have been corrupted by a fixed number of symbol insertions (larger than the minimum edit distance of the code), yielding a number of distinct traces to be used for reconstruction. We wish to know the minimum number of traces needed for exact reconstruction. This is a general version of a problem tackled by Levenshtein for uncoded sequences. We introduce an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance, yielding an upper bound on the number of distinct traces necessary to guarantee exact reconstruction. Without specific knowledge of the code words, this upper bound is tight. We apply our results to the famous single deletion/insertion-correcting Varshamov–Tenengolts (VT) codes and show that a significant number of VT code word pairs achieve the worst case number of outputs needed for exact reconstruction. We also consider extensions to other channels, such as adversarial deletion and insertion/deletion channels and probabilistic channels.

...read moreread less

77 citations

Book Chapter•DOI•

Secure and Efficient Outsourcing of Sequence Comparisons

[...]

Marina Blanton¹, Mikhail J. Atallah², Keith B. Frikken³, Qutaibah M. Malluhi⁴•Institutions (4)

University of Notre Dame¹, Purdue University², Miami University³, Qatar University⁴

10 Sep 2012

TL;DR: The problem of secure outsourcing of sequence comparisons by a client to remote servers, which given two strings λ and μ of respective lengths n and m, consists of finding a minimum-cost sequence of insertions, deletions, and substitutions that transform λ into μ is treated.

...read moreread less

Abstract: We treat the problem of secure outsourcing of sequence comparisons by a client to remote servers, which given two strings λ and μ of respective lengths n and m, consists of finding a minimum-cost sequence of insertions, deletions, and substitutions (also called an edit script) that transform λ into μ. In our setting a client owns λ and μ and outsources the computation to two servers without revealing to them information about either the input strings or the output sequence. Our solution is non-interactive for the client (who only sends information about the inputs and receives the output) and the client’s work is linear in its input/output. The servers’ performance is O(σmn) computation (which is optimal) and communication, where σ is the alphabet size, and the solution is designed to work when the servers have only O(σ(m + n)) memory. By utilizing garbled circuit evaluation in a novel way, we completely avoid public-key cryptography, which makes our solution particularly efficient.

...read moreread less

77 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics