scispace - formally typeset
Open AccessJournal ArticleDOI

Heuristics for trie index minimization

David K. Hsiao
- 01 Sep 1979 - 
- Vol. 4, Iss: 3, pp 383-395
TLDR
This paper investigates several heuristics for reordering attributes, and derives bounds on the sizes of the worst tries produced by them in terms of the underlying file, and shows that for most applications, &Ogr;-tries are smaller than other implementations of tries, even when heuristic for improving storage requirements are employed.
Abstract
A trie is a digital search tree in which leaves correspond to records in a file. Searching proceeds from the root to a leaf, where the edge taken at each node depends on the value of an attribute in the query. Trie implementations have the advantage of being fast, but the disadvantage of achieving that speed at great expense in storage space. Of primary concern in making a trie practical, therefore, is the problem of minimizing storage requirements. One method for reducing the space required is to reorder attribute testing. Unfortunately, the problem of finding an ordering which guarantees a minimum-size trie is NP-complete. In this paper we investigate several heuristics for reordering attributes, and derive bounds on the sizes of the worst tries produced by them in terms of the underlying file. Although the analysis is presented for a binary file, extensions to files of higher degree are shown. Another alternative for reducing the space required by a trie is an implementation, called an O-trie, in which the order of attribute testing is contained in the trie itself. We show that for most applications, O-tries are smaller than other implementations of tries, even when heuristics for improving storage requirements are employed.

read more

Content maybe subject to copyright    Report

Citations
More filters
Patent

Compressed prefix matching database searching

TL;DR: In this article, the authors proposed a method of conducting a reduced length search along a search path, where a node which would otherwise occur between a previous and a following node in the search path is eliminated, and information is stored as to whether, had said eliminated node been present, the search would have proceeded to the following node.
Journal ArticleDOI

Burst tries: a fast, efficient data structure for string keys

TL;DR: These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Journal ArticleDOI

Tree compression and optimization with applications

TL;DR: Tree compression can be seen as a trade-off problem between time and space in which the authors can choose different strategies depending on whether they prefer better compression results or more efficient operations in the compressed structure.

HAT-trie: a cache-conscious trie-based data structure for strings

TL;DR: The HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order and approaching that of the cache-conscious hash table.
References
More filters
Book

The Art of Computer Programming

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Book

The Design and Analysis of Computer Algorithms

TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Journal ArticleDOI

Trie memory

TL;DR: In this paper several paradigms of trie memory are described and compared with other memory paradigm, their advantages and disadvantages are examined in detail, and applications are discussed.
Journal ArticleDOI

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

TL;DR: PATRICIA as mentioned in this paper is an algorithm which provides a flexible means of storing, indexing, and retrieving information in a large file, which is economical of index space and of reindexing time.