Oblivious string embeddings and edit distance approximations
Tugkan Batu,Funda Ergün,Cenk Sahinalp +2 more
- pp 792-801
TLDR
An oblivious embedding is introduced that maps strings of length n under edit distance tostrings of length at most n/r under editdistance for any value of parameter r to provide a distortion of O(r1+μ) for some μ = o(1), which is almost optimal.Abstract:
We introduce an oblivious embedding that maps strings of length n under edit distance to strings of length at most n/r under edit distance for any value of parameter r. For any given r, our embedding provides a distortion of O(r1+μ) for some μ = o(1), which we prove to be (almost) optimal. The embedding can be computed in O(21/μn) time.We also show how to use the main ideas behind the construction of our embedding to obtain an efficient algorithm for approximating the edit distance between two strings. More specifically, for any 1 > e ≥ 0, we describe an algorithm to compute the edit distance D(S, R) between two strings S and R of length n in time O(n1+e), within an approximation factor of min{n1-e/3+o(1), (D(S, R/ne)1/2+o(1)}. For the case of e = 0, we get a O(n)-time algorithm that approximates the edit distance within a factor of min{n1/3+o(1), D(S, R)1/2+o(1)}, improving the recent result of Bar-Yossef et al. [2].read more
Citations
More filters
Journal ArticleDOI
SCALCE: boosting sequence compression algorithms using locally consistent encoding
TL;DR: SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome is presented.
Proceedings Article
Optimal-time text indexing in BWT-runs bounded space
TL;DR: In this paper, the Run-Length FM-index was extended to O(r log n/r) space, where r is the number of runs in their Burrows-Wheeler Transform (BWT).
Proceedings ArticleDOI
Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
TL;DR: The lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical, and provides the first rigorous separation between edit distance and Ulam distance.
Proceedings ArticleDOI
Streaming algorithms for embedding and computing edit distance in the low distance regime
TL;DR: A randomized injective embedding of the edit distance into the Hamming distance with a small distortion and a randomized embedding with quadratic distortion is shown.
Journal ArticleDOI
A Survey on Data Compression Methods for Biological Sequences
TL;DR: A comprehensive survey of existing compression approaches, that are specialized for biological data, including protein and DNA sequences, and a comparison of the performance of several methods, in terms of compression ratio, memory usage and compression/decompression time.
References
More filters
Journal ArticleDOI
A Space-Economical Suffix Tree Construction Algorithm
TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.
Journal ArticleDOI
Fast algorithms for finding nearest common ancestors
Dov Harel,Robert E. Tarjan +1 more
TL;DR: An algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.
Journal ArticleDOI
A faster algorithm computing string edit distances
William J. Masek,Mike Paterson +1 more
TL;DR: An algorithm is described for computing the edit distance between two strings of length n and m, n ⪖ m, which requires O(n · max(1, mlog n) steps whenever the costs of edit operations are integral multiples of a single positive real number and the alphabet for the strings is finite.
Book
On Finding Lowest Common Ancestors: Simplification and Parallelization
Baruch Schieber,Uzi Vishkin +1 more
TL;DR: A linear time and space preprocessing algorithm that enables us to answer each query in $O(1)$ time, as in Harel and Tarjan, which has the advantage of being simple and easily parallelizable.
Proceedings ArticleDOI
Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms
Richard Cole,Uzi Vishkin +1 more
TL;DR: A new deterministic coin tossing technique that provides for a fast and eff ient b reak ing of a symmetr ic s i tuat ion in paral le l is introduced.
Related Papers (5)
Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false)
Arturs Backurs,Piotr Indyk +1 more