scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 1982"


Journal ArticleDOI
TL;DR: A complete characterization of the probability distribution of the Directory size and depth is derived, and its implications on the design of the directory are studied.
Abstract: Extendible hashing is an attractive direct-access technique which has been introduced recently. It is characterized by a combination of database-size flexibility and fast direct access. This paper derives performance measures for extendible hashing, and considers their implecations on the physical database design. A complete characterization of the probability distribution of the directory size and depth is derived, and its implications on the design of the directory are studied. The expected input/output costs of various operations are derived, and the effects of varying physical design parameters on the expected average operating cost and on the expected volume are studied.

62 citations


Journal ArticleDOI
TL;DR: A letter oriented algorithm is developed that handles more than one word per iteration and that frequently outperforms Cichelli's backtracking algorithm.
Abstract: Cichelli has presented a simple method for constructing minimal perfect hash tables of identifiers for small static word sets. The hash function value for a word is computed as the sum of the length of the word and the values associated with the first and last letters of the word. Cichelli's backtracking algorithm considers one word at a time and performs an exhaustive search to find the letter value assignments. In considering heuristics to improve his algorithm we were led to develop a letter oriented algorithm that handles more than one word per iteration and that frequently outperforms Cichelli's. We also investigate the impact of relaxing the minimality requirement and allowing blank spaces in the constructed table. This substantially reduced the execution time of the algorithm. This relaxation and partitioning data sets are shown to be two useful schemes for handling large data sets.

46 citations


Journal ArticleDOI
TL;DR: Techniques are developed for tuning an important parameter that relates the sizes of the address region and the cellar in order to optimize the average running times of different implementations of the coalesced hashing method.
Abstract: The coalesced hashing method is one of the faster searching methods known today. This paper is a practical study of coalesced hashing for use by those who intend to implement or further study the algorithm. Techniques are developed for tuning an important parameter that relates the sizes of the address region and the cellar in order to optimize the average running times of different implementations. A value for the parameter is reported that works well in most cases. Detailed graphs explain how the parameter can be tuned further to meet specific needs. The resulting tuned algorithm outperforms several well-known methods including standard coalesced hashing, separate (or direct) chaining, linear probing, and double hashing. A variety of related methods are also analyzed including deletion algorithms, a new and improved insertion strategy called varied-insertion, and applications to external searching on secondary storage devices.

26 citations