scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2008

37 citations

Proceedings ArticleDOI
22 Jun 2020
TL;DR: For any T ≥ 1, there are constants R=R(T) ≥ 1 and ζ=ζ(T))>0 and a randomized algorithm that takes as input an integer n and two strings x,y of length at most n, and runs in time O(n 1+1/T ) and outputs an upper bound U on the edit distance of edit(x,y) that with high probability, satisfies U ≤ R(edit(x+y)+n 1−ζ).
Abstract: For any T ≥ 1, there are constants R=R(T) ≥ 1 and ζ=ζ(T)>0 and a randomized algorithm that takes as input an integer n and two strings x,y of length at most n, and runs in time O(n 1+1/T ) and outputs an upper bound U on the edit distance of edit(x,y) that with high probability, satisfies U ≤ R(edit(x,y)+n 1−ζ). In particular, on any input with edit(x,y) ≥ n 1−ζ the algorithm outputs a constant factor approximation with high probability. A similar result has been proven independently by Brakensiek and Rubinstein (this proceedings).

37 citations

Journal ArticleDOI
TL;DR: An efficient all-mapper, BitMapper, is developed based on a new vectorized bit-vector algorithm, which simultaneously calculates the edit distance of one read to multiple locations in a given reference genome.
Abstract: As the next-generation sequencing (NGS) technologies producing hundreds of millions of reads every day, a tremendous computational challenge is to map NGS reads to a given reference genome efficiently. However, existing methods of all-mappers, which aim at finding all mapping locations of each read, are very time consuming. The majority of existing all-mappers consist of 2 main parts, filtration and verification. This work significantly reduces verification time, which is the dominant part of the running time. An efficient all-mapper, BitMapper, is developed based on a new vectorized bit-vector algorithm, which simultaneously calculates the edit distance of one read to multiple locations in a given reference genome. Experimental results on both simulated and real data sets show that BitMapper is from several times to an order of magnitude faster than the current state-of-the-art all-mappers, while achieving higher sensitivity, i.e., better quality solutions. We present BitMapper, which is designed to return all mapping locations of raw reads containing indels as well as mismatches. BitMapper is implemented in C under a GPL license. Binaries are freely available at http://home.ustc.edu.cn/%7Echhy .

37 citations

Journal ArticleDOI
19 Dec 2012
TL;DR: It is proved that computing the edit distance is equivalent to finding the optimal cycle decomposition of the corresponding adjacency graph, and an approximation algorithm with an approximation ratio of 1.5 + ∈.
Abstract: Computing the edit distance between two genomes under certain operations is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be easily computed for genomes without duplicate genes. In this paper, we study the edit distance for genomes with duplicate genes under a model that includes DCJ operations, insertions and deletions. We prove that computing the edit distance is equivalent to finding the optimal cycle decomposition of the corresponding adjacency graph, and give an approximation algorithm with an approximation ratio of 1.5 + ∈.

37 citations

Book ChapterDOI
20 Aug 2001
TL;DR: A new measure of the edit distance between two rooted labeled trees is defined, called less-constrained edit distance, by relaxing the restriction of constrained edit mapping, and it is shown that this problem is NP-complete and even has no absolute approximation algorithm unless P = NP, which implies that it is impossible to have a PTAS for the problem.
Abstract: One of the most important problem in computational biology is the tree editing problem which is to determine the edit distance between two rooted labeled trees. It has been shown to have significant applications in both RNA secondary structures and evolutionary trees. Another viewpoint of considering this problem is to find an edit mapping with the minimum cost. By restricting the type of mapping, Zhang [7,8] and Richter [5] independently introduced the constrained edit distance and the structure respecting distance, respectively. They are, in fact, the same concept. In this paper, we define a new measure of the edit distance between two rooted labeled trees, called less-constrained edit distance, by relaxing the restriction of constrained edit mapping. Then we study the algorithmic complexities of computing the less-constrained edit distance between two rooted labeled trees. For unordered labeled trees, we show that this problem is NP-complete and even has no absolute approximation algorithm unless P = NP, which also implies that it is impossible to have a PTAS for the problem. For ordered labeled trees, we give a polynomial-time algorithm to solve the problem.

37 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139