scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 1975"


Proceedings ArticleDOI
14 May 1975
TL;DR: A study of the performance measures obtained during tests of "Distribution-dependent" hashing functions indicates that in certain cases, distribution-dependent methods perform better than the division method.
Abstract: In this paper procedures are studied for storing, accessing, updating, and reorganizing data in large files whose organization is direct, an organization used when a fast response time is required. "Distribution-dependent" hashing functions and the division method are compared as methods of indirect addressing."Distribution-dependent" hashing functions are characterized. These hashing functions generate addresses from a set of keys by using knowledge of the distribution of that key set within the key space or range of keys. A study of the performance measures obtained during tests of these functions on several key sets indicates that in certain cases, distribution-dependent methods perform better than the division method. This result is extended by a demonstration that distribution-dependent hashing functions can accommodate a change in the distribution of keys without being redefined. A number of insertions to and deletions from the key set can be made before a distribution-dependent hashing function gives poorer performance than the division method under identical circumstances.If many additions are made to a set of keys, it becomes necessary to reorganize, in a larger storage area, the direct file of records identified by that key set. Although processor time must be sacrificed in order to redefine a distribution-dependent hashing function, the division method requires substantially greater access time in a reorganizational situation.

9 citations