High-order entropy-compressed text indexes
Citations
834 citations
742 citations
656 citations
Cites background or methods from "High-order entropy-compressed text ..."
...2002], wavelet trees [Grossi et al. 2003] and compression boosting [Ferragina et al....
[...]
...…research on compressed indexes has produced data structures that are more alphabet-friendly and achieve various tradeoffs between space usage and query time [Grossi et al. 2003; Rao 2002; Sadakane 2002, 2003; Grabowski et al. 2004; Navarro 2004; M¨akinen et al. 2004; M¨akinen and Navarro 2004]....
[...]
...These second generation compressed indexes make use of new algorithmic tools such as succinct dictionaries [Raman et al. 2002], wavelet trees [Grossi et al. 2003] and compression boosting [Ferragina et al. 2005]....
[...]
...Currently, the most space economical compressed indexes [Grossi et al. 2003; Ferragina et al. 2004] take nHk(T ) + o(n) bits for k < α log| | n with α < 1....
[...]
...Currently, the most space economical compressed indexes [Grossi et al. 2003; Ferragina et al. 2004] take nHk (T ) + o(n) bits for k <alog|| n with a< 1....
[...]
559 citations
Cites background or methods from "High-order entropy-compressed text ..."
...Assuming that the text is read-only and using a stronger version of the bit-probe model, Demaine and L´ opez-Ortiz [ 22 ] have shown in the worst case that any text index with alphabet size |Σ| = 2 that supports fast queries by probing O(m) bits in the text must use Ω(n) bits of extra storage space....
[...]
...(He also uses our Lemma 2 in section 3.1 to show how to store the skip values of the suffix tree in O(n) bits [65].) The space of compressed suffix arrays has been further reduced to the order-k entropy (with a multiplicative constant of 1) by Grossi, Gupta, and Vitter [ 36 ] using a novel analysis based on a finite set model....
[...]
...We remark that Sadakane [64] has shown that the space complexity in Theorem 1(ii) and Theorem 2(ii) can be restated in terms of the order-0 entropy H0 ≤ lg |Σ| of the string, giving as a result � −1H0 n + O(n) bits. Grossi, Gupta, and Vitter [ 36 ]...
[...]
415 citations
References
2,068 citations
"High-order entropy-compressed text ..." refers background in this paper
...Large alphabets are typical of phrase searching [5, 21], for example, in which the alphabet is made up of single words and its size cannot be considered a small constant....
[...]
1,969 citations
"High-order entropy-compressed text ..." refers background or methods or result in this paper
...We first perform a search of P in SA`+lg t(n), which is stored explicitly along with LCP `+lg t(n), the longest common prefix information required in [10]....
[...]
...Similar to what we described in Section 2, level k = ` stores the suffix array SA`, inverted suffix array SA −1 ` , and an array LCP ` storing the longest common prefix information [10] to allow fast searching in SA`....
[...]
...A standard suffix array [4, 10] is an array containing the position of each of the n suffixes of text T in lexicographical order....
[...]
1,661 citations
1,188 citations
"High-order entropy-compressed text ..." refers background or methods in this paper
...Indexing the Associated Press file with the FM-index would require roughly 1 gigabyte according to the experiments in [3]....
[...]
...Decompressing one text symbol of Sj at a time is inherently sequential as in [2] and [19, 20]....
[...]
...1 Related Work A new trend in the design of advanced indexes for full-text searching of documents is represented by compressed suffix arrays [6, 18, 19, 20] and opportunistic FM-indexes [2, 3], in that they support the functionalities of suffix arrays and suffix trees, which are more powerful than classical inverted files [4]....
[...]
...1.1 Related Work A new trend in the design of ad- vanced indexes for full-text searching of documents is represented by compressed suffix arrays [6, 18, 19, 20] and opportunistic FM-indexes [2, 3], in that they sup- port the functionalities of suffix arrays and suffix trees, which are more powerful than classical inverted files [4]....
[...]
...The FM-index [2, 3] is a self-indexing data structure =in)gohr~inHh:aOr!~l~ll+gl.l~lg lglE') bits, while n "~-nel~12~+El pp " g " g " O(m +lg n) time, where I~1 = O(1)....
[...]
887 citations