scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 1995"


Journal ArticleDOI
TL;DR: This paper presents a system, named MULTI-HASH, which uses the tools of decision trees and uncertainty modeling for the automatic construction of hash tables and is based on a hybrid method that uses both qualitative and quantitative attributes.

42 citations


Proceedings ArticleDOI
04 Jan 1995
TL;DR: Novel methods to evaluate the structural similarity of proteins are devised and the following property is proved theoretically: the root mean square deviation between two fragments is small, so the distance between the hash vectors associated with the fragments isSmall.
Abstract: We have devised novel methods to evaluate the structural similarity of proteins and we compare them. In each method, a hash vector is associated with each fixed length fragment of three dimensional protein structure. Then, we analyze the similarity between fragments by evaluating the difference between true hash vectors. The novel aspect of the methods is that the following property is proved theoretically: the root mean square deviation between two fragments is small, so the distance between the hash vectors associated with the fragments is small. The methods were compared with the previous methods using PDB data, and were shown to be much faster. One of the new hashing methods is already included in PROTEIX, a database management system for protein structures. The features of PROTEIX are described. >

5 citations


Proceedings ArticleDOI
17 May 1995
TL;DR: The purpose of this paper is to analyze the performance of hashing algorithms, and is interested in studying the effect of clustering on hashing methods.
Abstract: Hashing algorithms are search procedures commonly used, among other applications, in the solution of logic synthesis and formal hardware verification problems. Purpose of this paper is to analyze the performance of hashing algorithms. In particular, we are interested in studying the effect of clustering on hashing methods.

3 citations


Proceedings ArticleDOI
J.W. Miller1
17 Sep 1995
TL;DR: A representation technique is presented allowing for quick access of individual records from a static compressed dataset, given a collection of key-record pairs, that uses a carefully chosen pseudo-random number generator to directly produce the correct record for any key in the dataset.
Abstract: A representation technique is presented allowing for quick access of individual records from a static compressed dataset. Given a collection of key-record pairs, the representation allows the appropriate short record to be returned for any given key. The approach is a generalization of perfect address hashing. The new approach, called perfect value hashing, uses a carefully chosen pseudo-random number generator to directly produce the correct record for any key in the dataset. This contrasts with address hashing where the random number provides an address which is then used to recover the record from a separate table. Value hashing doesn't have the theoretical limitations of address hashing, and in practice is more space efficient for records of size less than 36 bits. Value hashing has the added benefit (important when the records are encoded for compression) that variable length records can be represented without an increase in the size of the encoded records. This new technique was used to provide random access from a highly compressed spelling dictionary.

3 citations


Journal ArticleDOI
TL;DR: Improvements to Cichelli's method for computing the set of weights used for minimal perfect hashing functions by adding a "MOD number_of_keys" operation to the hashing function, and to the removal of unnecessary backtracking due to "guaranteed collisions".
Abstract: This paper will discuss improvements to Cichelli's method for computing the set of weights used for minimal perfect hashing functions[1]. The major modifications investigated here pertain to adding a "MOD number_of_keys" operation to the hashing function, and to the removal of unnecessary backtracking due to "guaranteed collisions".

2 citations


Proceedings ArticleDOI
26 Feb 1995
TL;DR: The DVPH method improves the efficiency ana' response time and handles dynamic memory allocation, and ameliorates losses due to unnecessary page l/O, lossesDue to memory preemption from higher priori0 applications and losses dueto inappropriate hash function selections.
Abstract: In this paper we propose a new method for processing earth science applications which involve join processing. We call this method the 'Domain Vector Pe$ect Hash Join Method' (DVPH). DVPH is a simple method which works well for processing applications with joins. The DVPH method improves the efficiency ana' response time and handles dynamic memory allocation. It ameliorates losses due to unnecessary page l/O, losses due to memory preemption from higher priori0 applications and losses due to inappropriate hash function selections. DVPH handles skewed data well. These advantages are gained at the expense of maintaining a domain vector and an extra index.

1 citations


Journal ArticleDOI
TL;DR: This paper develops a new collision resolution strategy, namely, hypercube hashing, which combines the randomness provided in double hashing with the low communication cost inherited from linear probing to yield better performance.