scispace - formally typeset
Search or ask a question

Showing papers on "Locality-sensitive hashing published in 1991"


Proceedings ArticleDOI
08 Apr 1991
TL;DR: A trial-and-error method of finding perfect hashing functions is proposed using a simple universal/sub 2/ class (H/sub 3/) of hashing functions, and the results show that the relative frequency ofperfect hashing functions within the class H/ sub 3/ is the same as predicted by the analysis for the set of all functions.
Abstract: Perfect hashing functions are determined that are suitable for hardware implementations. A trial-and-error method of finding perfect hashing functions is proposed using a simple universal/sub 2/ class (H/sub 3/) of hashing functions. The results show that the relative frequency of perfect hashing functions within the class H/sub 3/ is the same as predicted by the analysis for the set of all functions. Extensions of the basic scheme can handle dynamic key sets and large key sets. Perfect hashing functions can be found using software, and then loaded into the hardware hash address generator. Inexpensive associative memory can be used as a general memory construct offered by the system services of high-performance (super) computers. It has a potential application for storing operating system tables or internal tables for software development tools, such as compilers, assemblers and linkers. Perfect hashing in hardware may find a number of other applications, such as high speed event counting and text searching. >

19 citations


Journal ArticleDOI
TL;DR: Based on the simulation results, extendible hashing has an advantage of 5% over linear hashing in terms of storage utilization and directory size, and the authors recommend linear hashing when main memory is at a premium.
Abstract: Based on seven assumptions, the following comparison factors are used to compare the performance of linear hashing with extendible hashing: 1. storage utilization; 2. average unsuccessful search cost; 3. average successful search cost; 4. split cost; 5. insertion cost; 6. number of overflow buckets. The simulation is conducted with the bucket sizes of 10, 20, and 50 for both hashing techniques. In order to observe their average behavior, the simulation uses 50,000 keys which have been generated randomly.According to our simulation results, extendible hashing has an advantage of 5% over linear hashing in terms of storage utilization. Successful search, unsuccessful search, and insertions are less costly in linear hashing. However, linear hashing requires a large overflow space to handle the overflow records. Simulation shows that approximately 10% of the space should be marked as overflow space in linear hashing.Directory size is a serious bottleneck in extendible hashing. Based on the simulation results, the authors recommend linear hashing when main memory is at a premium.

10 citations


Book ChapterDOI
01 Feb 1991
TL;DR: A probabilistic algorithm is employed to obtain a separation of legal key values into disjoint subsets so that the cardinality of each subset can be efficiently minimally perfectly hashed.
Abstract: We present and analyze a method that permits the efficient application of minimal perfect hashing functions to much larger sets of data than previously possible. The method employs a probabilistic algorithm to obtain a separation of legal key values into disjoint subsets. The cardinality of each subset is small enough so that it can be efficiently minimally perfectly hashed. In the retrieval phase, the subset to which an input key belongs is determined and the hashing function for that subset is applied. The time complexity of locating the data associated with a given input key value is O(LogLogN). The space complexity of the algorithm is O(N LogLogN). Construction time for the necessary data structures is O(N2LogLogN).

3 citations