Oblivious string embeddings and edit distance approximations

doi:10.5555/1109557.1109644

Open AccessProceedings ArticleDOI

Oblivious string embeddings and edit distance approximations

- pp 792-801

TLDR

An oblivious embedding is introduced that maps strings of length n under edit distance tostrings of length at most n/r under editdistance for any value of parameter r to provide a distortion of O(r1+μ) for some μ = o(1), which is almost optimal.

Abstract:

We introduce an oblivious embedding that maps strings of length n under edit distance to strings of length at most n/r under edit distance for any value of parameter r. For any given r, our embedding provides a distortion of O(r1+μ) for some μ = o(1), which we prove to be (almost) optimal. The embedding can be computed in O(21/μn) time.We also show how to use the main ideas behind the construction of our embedding to obtain an efficient algorithm for approximating the edit distance between two strings. More specifically, for any 1 > e ≥ 0, we describe an algorithm to compute the edit distance D(S, R) between two strings S and R of length n in time O(n1+e), within an approximation factor of min{n1-e/3+o(1), (D(S, R/ne)1/2+o(1)}. For the case of e = 0, we get a O(n)-time algorithm that approximates the edit distance within a factor of min{n1/3+o(1), D(S, R)1/2+o(1)}, improving the recent result of Bar-Yossef et al. [2].

Citations

PDF

Open Access

More filters

Journal ArticleDOI

SCALCE: boosting sequence compression algorithms using locally consistent encoding

Faraz Hach, +3 more

- 01 Dec 2012 -

Bioinformatics

TL;DR: SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome is presented.

...read moreread less

Proceedings Article

Optimal-time text indexing in BWT-runs bounded space

Travis Gagie, +2 more

TL;DR: In this paper, the Run-Length FM-index was extended to O(r log n/r) space, where r is the number of runs in their Burrows-Wheeler Transform (BWT).

...read moreread less

Proceedings ArticleDOI

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

Alexandr Andoni, +2 more

TL;DR: The lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical, and provides the first rigorous separation between edit distance and Ulam distance.

...read moreread less

Proceedings ArticleDOI

Streaming algorithms for embedding and computing edit distance in the low distance regime

Diptarka Chakraborty, +2 more

TL;DR: A randomized injective embedding of the edit distance into the Hamming distance with a small distortion and a randomized embedding with quadratic distortion is shown.

...read moreread less

Journal ArticleDOI

A Survey on Data Compression Methods for Biological Sequences

Morteza Hosseini, +2 more

- 14 Oct 2016 -

Information-an International Interdiscip...

TL;DR: A comprehensive survey of existing compression approaches, that are specialized for biological data, including protein and DNA sequences, and a comparison of the performance of several methods, in terms of compression ratio, memory usage and compression/decompression time.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A Space-Economical Suffix Tree Construction Algorithm

Edward M. McCreight

- 01 Apr 1976 -

Journal of the ACM

TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.

...read moreread less

Journal ArticleDOI

Fast algorithms for finding nearest common ancestors

Dov Harel, +1 more

- 17 May 1984 -

SIAM Journal on Computing

TL;DR: An algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.

...read moreread less

Journal ArticleDOI

A faster algorithm computing string edit distances

William J. Masek, +1 more

- 01 Feb 1980 -

Journal of Computer and System Sciences

TL;DR: An algorithm is described for computing the edit distance between two strings of length n and m, n ⪖ m, which requires O(n · max(1, mlog n) steps whenever the costs of edit operations are integral multiples of a single positive real number and the alphabet for the strings is finite.

...read moreread less

Book

On Finding Lowest Common Ancestors: Simplification and Parallelization

Baruch Schieber, +1 more

TL;DR: A linear time and space preprocessing algorithm that enables us to answer each query in $O(1)$ time, as in Harel and Tarjan, which has the advantage of being simple and easily parallelizable.

...read moreread less

Proceedings ArticleDOI

Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms

Richard Cole, +1 more

TL;DR: A new deterministic coin tossing technique that provides for a fast and eff ient b reak ing of a symmetr ic s i tuat ion in paral le l is introduced.

...read moreread less