scispace - formally typeset
Book ChapterDOI

A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

Reads0
Chats0
TLDR
It is shown that the working space for computing Burrows-Wheeler Transform directly in linear time is O(n log*** loglog *** n ) for any *** where *** is the alphabet size, which is the smallest among the known linear time algorithms.
Abstract
To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [5,12], one constructs BWT incrementally, which requires O(n logn ) time where n is the length of the input text. We present an algorithm for computing BWT directly in linear time by modifying the suffix array construction algorithm based on induced sorting [15]. We show that the working space is O(n log*** loglog *** n ) for any *** where *** is the alphabet size, which is the smallest among the known linear time algorithms.

read more

Citations
More filters
Journal ArticleDOI

Fully Functional Static and Dynamic Succinct Trees

TL;DR: The range min-max tree as discussed by the authors is a data structure for ordinal trees that can be represented in 2n p O(n/polylog(n)) bits of space.
Journal ArticleDOI

Word-based self-indexes for natural language text

TL;DR: This article introduces a different kind of index that replaces the text using essentially the same space required by the compressed text alone (compression ratio around 35%).
Journal ArticleDOI

Fully compressed suffix trees

TL;DR: This article introduces the first compressed suffix tree representation that requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time.
Journal ArticleDOI

Lightweight Data Indexing and Compression in External Memory

TL;DR: Algorithms for computing the Burrows-Wheeler Transform and for building (compressed) indexes in external memory are described that are lightweight in the sense that, for an input of size n, they use only n bits of working space on disk while all previous approaches use Θ(nlog n) bits.
Posted Content

Lightweight Data Indexing and Compression in External Memory

TL;DR: In this article, the authors describe algorithms for computing the BWT and for building (compressed) indexes in external memory using only a small amount of disk working space, and prove lower bounds on the complexity of computing and inverting the BWTs via sequential scans in terms of the classic product.
References
More filters

A Block-sorting Lossless Data Compression Algorithm

TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
Journal ArticleDOI

Compressed full-text indexes

TL;DR: The relationship between text entropy and regularities that show up in index structures and permit compressing them are explained and the most relevant self-indexes are covered, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems.
Journal ArticleDOI

Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

TL;DR: The result presents for the first time an efficient index whose size is provably linear in the size of the text in the worst case, and for many scenarios, the space is actually sublinear in practice.
Proceedings ArticleDOI

Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

TL;DR: A structure that supports both operations in O(1) time on the RAM model and an information-theoretically optimal representation for cardinal cardinal trees and multisets where (appropriate generalisations of) the select and rank operations can be supported in 1) time.
Proceedings ArticleDOI

Optimal suffix tree construction with large alphabets

TL;DR: This work builds suffix trees in linear time for integer alphabet using Weiner's algorithm, which matches a trivial /spl Omega/(n log n)-time lower bound based on sorting.