scispace - formally typeset
Open AccessProceedings ArticleDOI

Oblivious string embeddings and edit distance approximations

TLDR
An oblivious embedding is introduced that maps strings of length n under edit distance tostrings of length at most n/r under editdistance for any value of parameter r to provide a distortion of O(r1+μ) for some μ = o(1), which is almost optimal.
Abstract
We introduce an oblivious embedding that maps strings of length n under edit distance to strings of length at most n/r under edit distance for any value of parameter r. For any given r, our embedding provides a distortion of O(r1+μ) for some μ = o(1), which we prove to be (almost) optimal. The embedding can be computed in O(21/μn) time.We also show how to use the main ideas behind the construction of our embedding to obtain an efficient algorithm for approximating the edit distance between two strings. More specifically, for any 1 > e ≥ 0, we describe an algorithm to compute the edit distance D(S, R) between two strings S and R of length n in time O(n1+e), within an approximation factor of min{n1-e/3+o(1), (D(S, R/ne)1/2+o(1)}. For the case of e = 0, we get a O(n)-time algorithm that approximates the edit distance within a factor of min{n1/3+o(1), D(S, R)1/2+o(1)}, improving the recent result of Bar-Yossef et al. [2].

read more

Citations
More filters
Journal ArticleDOI

SCALCE: boosting sequence compression algorithms using locally consistent encoding

TL;DR: SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome is presented.
Proceedings Article

Optimal-time text indexing in BWT-runs bounded space

TL;DR: In this paper, the Run-Length FM-index was extended to O(r log n/r) space, where r is the number of runs in their Burrows-Wheeler Transform (BWT).
Proceedings ArticleDOI

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

TL;DR: The lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical, and provides the first rigorous separation between edit distance and Ulam distance.
Proceedings ArticleDOI

Streaming algorithms for embedding and computing edit distance in the low distance regime

TL;DR: A randomized injective embedding of the edit distance into the Hamming distance with a small distortion and a randomized embedding with quadratic distortion is shown.
Journal ArticleDOI

A Survey on Data Compression Methods for Biological Sequences

TL;DR: A comprehensive survey of existing compression approaches, that are specialized for biological data, including protein and DNA sequences, and a comparison of the performance of several methods, in terms of compression ratio, memory usage and compression/decompression time.
References
More filters
Journal ArticleDOI

A Space-Economical Suffix Tree Construction Algorithm

TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.
Journal ArticleDOI

Fast algorithms for finding nearest common ancestors

TL;DR: An algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.
Journal ArticleDOI

A faster algorithm computing string edit distances

TL;DR: An algorithm is described for computing the edit distance between two strings of length n and m, n ⪖ m, which requires O(n · max(1, mlog n) steps whenever the costs of edit operations are integral multiples of a single positive real number and the alphabet for the strings is finite.
Book

On Finding Lowest Common Ancestors: Simplification and Parallelization

TL;DR: A linear time and space preprocessing algorithm that enables us to answer each query in $O(1)$ time, as in Harel and Tarjan, which has the advantage of being simple and easily parallelizable.
Proceedings ArticleDOI

Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms

TL;DR: A new deterministic coin tossing technique that provides for a fast and eff ient b reak ing of a symmetr ic s i tuat ion in paral le l is introduced.
Related Papers (5)