Heuristics for trie index minimization

doi:10.1145/320083.320102

Open AccessJournal ArticleDOI

Heuristics for trie index minimization

David K. Hsiao

- 01 Sep 1979 -

ACM Transactions on Database Systems

- Vol. 4, Iss: 3, pp 383-395

TLDR

This paper investigates several heuristics for reordering attributes, and derives bounds on the sizes of the worst tries produced by them in terms of the underlying file, and shows that for most applications, &Ogr;-tries are smaller than other implementations of tries, even when heuristic for improving storage requirements are employed.

Abstract:

A trie is a digital search tree in which leaves correspond to records in a file. Searching proceeds from the root to a leaf, where the edge taken at each node depends on the value of an attribute in the query. Trie implementations have the advantage of being fast, but the disadvantage of achieving that speed at great expense in storage space. Of primary concern in making a trie practical, therefore, is the problem of minimizing storage requirements. One method for reducing the space required is to reorder attribute testing. Unfortunately, the problem of finding an ordering which guarantees a minimum-size trie is NP-complete. In this paper we investigate several heuristics for reordering attributes, and derive bounds on the sizes of the worst tries produced by them in terms of the underlying file. Although the analysis is presented for a binary file, extensions to files of higher degree are shown. Another alternative for reducing the space required by a trie is an implementation, called an O-trie, in which the order of attribute testing is contained in the trie itself. We show that for most applications, O-tries are smaller than other implementations of tries, even when heuristics for improving storage requirements are employed.

Heuristics for trie index minimization

Citations

Compressed prefix matching database searching

Burst tries: a fast, efficient data structure for string keys

Automata and Languages

Tree compression and optimization with applications

HAT-trie: a cache-conscious trie-based data structure for strings

References

The Art of Computer Programming

The Design and Analysis of Computer Algorithms

The art of computer programming. Vol.1: Fundamental algorithms

Trie memory

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Related Papers (5)

Trie memory

The Art of Computer Programming: Volume 3: Sorting and Searching

File searching using variable length keys

Compressed tries

HAT-trie: a cache-conscious trie-based data structure for strings