Topic
Double hashing
About: Double hashing is a research topic. Over the lifetime, 2490 publications have been published within this topic receiving 83732 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Abstract: In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency.The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods.In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to “catch” the small fraction of errors associated with the new methods. An example is discussed which illustrates possible areas of application for the new methods.Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
7,390 citations
••
IBM1
TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.
2,886 citations
••
14 Aug 2005TL;DR: This is the first attack on the full 80-step SHA-1 with complexity less than the 280 theoretical bound, and it is shown that collisions ofSHA-1 can be found with complexityLess than 269 hash operations.
Abstract: In this paper, we present new collision search attacks on the hash function SHA-1. We show that collisions of SHA-1 can be found with complexity less than 269 hash operations. This is the first attack on the full 80-step SHA-1 with complexity less than the 280 theoretical bound.
1,600 citations
••
IBM1
TL;DR: Several new classes of hash functions with certain desirable properties are exhibited, and two novel applications for hashing which make use of these functions are introduced, including a provably secure authentication technique for sending messages over insecure lines and the application of testing sets for equality.
1,586 citations
••
16 Jun 2012TL;DR: A novel kernel-based supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing, and significantly outperforms the state-of-the-arts in searching both metric distance neighbors and semantically similar neighbors is proposed.
Abstract: Recent years have witnessed the growing popularity of hashing in large-scale vision problems. It has been shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing supervised methods either lack adequate performance or often incur cumbersome model training. In this paper, we propose a novel kernel-based supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing. The idea is to map the data to compact binary codes whose Hamming distances are minimized on similar pairs and simultaneously maximized on dissimilar pairs. Our approach is distinct from prior works by utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. We carry out extensive experiments on two image benchmarks with up to one million samples, demonstrating that our approach significantly outperforms the state-of-the-arts in searching both metric distance neighbors and semantically similar neighbors, with accuracy gains ranging from 13% to 46%.
1,461 citations