Heuristics for trie index minimization
TLDR
This paper investigates several heuristics for reordering attributes, and derives bounds on the sizes of the worst tries produced by them in terms of the underlying file, and shows that for most applications, &Ogr;-tries are smaller than other implementations of tries, even when heuristic for improving storage requirements are employed.Abstract:
A trie is a digital search tree in which leaves correspond to records in a file. Searching proceeds from the root to a leaf, where the edge taken at each node depends on the value of an attribute in the query. Trie implementations have the advantage of being fast, but the disadvantage of achieving that speed at great expense in storage space. Of primary concern in making a trie practical, therefore, is the problem of minimizing storage requirements. One method for reducing the space required is to reorder attribute testing. Unfortunately, the problem of finding an ordering which guarantees a minimum-size trie is NP-complete. In this paper we investigate several heuristics for reordering attributes, and derive bounds on the sizes of the worst tries produced by them in terms of the underlying file. Although the analysis is presented for a binary file, extensions to files of higher degree are shown. Another alternative for reducing the space required by a trie is an implementation, called an O-trie, in which the order of attribute testing is contained in the trie itself. We show that for most applications, O-tries are smaller than other implementations of tries, even when heuristics for improving storage requirements are employed.read more
Citations
More filters
Patent
Compressed prefix matching database searching
TL;DR: In this article, the authors proposed a method of conducting a reduced length search along a search path, where a node which would otherwise occur between a previous and a following node in the search path is eliminated, and information is stored as to whether, had said eliminated node been present, the search would have proceeded to the following node.
Journal ArticleDOI
Burst tries: a fast, efficient data structure for string keys
TL;DR: These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Journal ArticleDOI
Tree compression and optimization with applications
Jyrki Katajainen,Erkki Mäkinen +1 more
TL;DR: Tree compression can be seen as a trade-off problem between time and space in which the authors can choose different strategies depending on whether they prefer better compression results or more efficient operations in the compressed structure.
HAT-trie: a cache-conscious trie-based data structure for strings
Nikolas Askitis,Ranjan Sinha +1 more
TL;DR: The HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order and approaching that of the cache-conscious hash table.
References
More filters
Book
The Art of Computer Programming
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Book
The Design and Analysis of Computer Algorithms
Alfred V. Aho,John E. Hopcroft +1 more
TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Journal ArticleDOI
Trie memory
TL;DR: In this paper several paradigms of trie memory are described and compared with other memory paradigm, their advantages and disadvantages are examined in detail, and applications are discussed.
Journal ArticleDOI
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
TL;DR: PATRICIA as mentioned in this paper is an algorithm which provides a flexible means of storing, indexing, and retrieving information in a large file, which is economical of index space and of reindexing time.