Showing papers on "Feature hashing published in 2006"

PDF

Open Access

Journal Article•DOI•

[...]

Ashwin Swaminathan¹, Yinian Mao¹, Min Wu¹•Institutions (1)

01 Nov 2006-IEEE Transactions on Information Forensics and Security

TL;DR: A novel algorithm for generating an image hash based on Fourier transform features and controlled randomization is developed and it is shown that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions.

...read moreread less

Abstract: Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness.

...read moreread less

542 citations

Book Chapter•DOI•

Less hashing, same performance: building a better bloom filter

[...]

Adam Kirsch¹, Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

11 Sep 2006

TL;DR: Only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice.

...read moreread less

Abstract: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + i h2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.

...read moreread less

234 citations

Journal Article•DOI•

Spatio–Temporal Transform Based Video Hashing

[...]

Baris Coskun, Bulent Sankur¹, Nasir Memon²•Institutions (2)

Boğaziçi University¹, New York University²

01 Dec 2006-IEEE Transactions on Multimedia

TL;DR: It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video.

...read moreread less

Abstract: Identification and verification of a video clip via its fingerprint find applications in video browsing, database search and security. For this purpose, the video sequence must be collapsed into a short fingerprint using a robust hash function based on signal processing operations. We propose two robust hash algorithms for video based both on the discrete cosine transform (DCT), one on the classical basis set and the other on a novel randomized basis set (RBT). The robustness and randomness properties of the proposed hash functions are investigated in detail. It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video. The DCT hash is more robust, but lacks security aspect, as it is easy to find different video clips with the same hash value. The RBT based hash, being secret key based, does not allow this and is more secure at the cost of a slight loss in the receiver operating curves

...read moreread less

217 citations

Journal Article•DOI•

Perfect spatial hashing

[...]

Sylvain Lefebvre¹, Hugues Hoppe¹•Institutions (1)

Microsoft¹

01 Jul 2006

TL;DR: A perfect multidimensional hash function is designed -- one that is precomputed on static data to have no hash collisions and is ideally suited for parallel SIMD evaluation on graphics hardware.

...read moreread less

Abstract: We explore using hashing to pack sparse data into a compact table while retaining efficient random access Specifically, we design a perfect multidimensional hash function -- one that is precomputed on static data to have no hash collisions Because our hash function makes a single reference to a small offset table, queries always involve exactly two memory accesses and are thus ideally suited for parallel SIMD evaluation on graphics hardware Whereas prior hashing work strives for pseudorandom mappings, we instead design the hash function to preserve spatial coherence and thereby improve runtime locality of reference We demonstrate numerous graphics applications including vector images, texture sprites, alpha channel compression, 3D-parameterized textures, 3D painting, simulation, and collision detection

...read moreread less

183 citations

Journal Article•DOI•

A clustering based approach to perceptual image hashing

[...]

Vishal Monga¹, Arindam Banerjee¹, Brian L. Evans¹•Institutions (1)

University of Texas at Austin¹

01 Nov 2006-IEEE Transactions on Information Forensics and Security

TL;DR: This paper proposes a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion and proves that the decision version of the clustering problem is NP complete.

...read moreread less

Abstract: A perceptual image hash function maps an image to a short binary string based on an image's appearance to the human eye. Perceptual image hashing is useful in image databases, watermarking, and authentication. In this paper, we decouple image hashing into feature extraction (intermediate hash) followed by data clustering (final hash). For any perceptually significant feature extractor, we propose a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion. We prove that the decision version of our clustering problem is NP complete. Based on the proposed algorithm, we develop two variations to facilitate perceptual robustness versus fragility tradeoffs. We validate the perceptual significance of our hash by testing under Stirmark attacks. Finally, we develop randomized clustering algorithms for the purposes of secure image hashing.

...read moreread less

123 citations

Proceedings Article•DOI•

Perfect Hashing for Network Applications

[...]

Yi Lu¹, Balaji Prabhakar¹, Flavio Bonomi²•Institutions (2)

Stanford University¹, Cisco Systems, Inc.²

09 Jul 2006

TL;DR: This paper introduces a hardware-friendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound, and employs a Bloom filter, which is known for its simplicity and speed.

...read moreread less

Abstract: Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wire-speed. However, several associative memory accesses are needed to resolve collisions, making them slower than required. This motivates us to consider minimal perfect hashing schemes, which reduce the number of memory accesses to just 1 and are also space-efficient. Existing perfect hashing algorithms are not tailored for network applications because they take too long to construct and are hard to implement in hardware. This paper introduces a hardware-friendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound. Our construction is several orders faster than existing perfect hashing schemes. Instead of using the traditional mapping-partitioning-searching methodology, our scheme employs a Bloom filter, which is known for its simplicity and speed. We extend our scheme to the dynamic setting, thus handling insertions and deletions.

...read moreread less

62 citations

Patent•

Data hashing method, data processing method, and data processing system using similarity-based hashing algorithm

[...]

Hwa Shin Moon¹, Sungwon Yi¹, Jintae Oh¹, Jong Soo Jang¹, Changhoon Kim¹ - Show less +1 more•Institutions (1)

Electronics and Telecommunications Research Institute¹

06 Dec 2006

TL;DR: In this paper, the similarity-based hashing (SBH) algorithm is used to calculate hash values for the same data and the more similar data, the smaller difference in the generated hash values.

...read moreread less

Abstract: A data hashing method, a data processing method, and a data processing system using a similarity-based hashing (SBH) algorithm in which the same hash value is calculated for the same data and the more similar data, the smaller difference in the generated hash values The data hashing method includes receiving computerized data, and generating a hash value of the computerized data using the SBH algorithm in which two data are the same if calculated hash values are the same and two data are similar if the difference of calculated hash values is small, wherein a search, comparison, and classification of data may be quickly processed within a time complexity of O(1) or O(n) since the similarity/closeness of data content are quantified by component values for each of the respective corresponding generated hash values

...read moreread less

46 citations

Proceedings Article•DOI•

Robustness and security of a wavelet-based CBIR hashing algorithm

[...]

Albert Meixner¹, Andreas Uhl²•Institutions (2)

Duke University¹, University of Salzburg²

26 Sep 2006

TL;DR: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks and a method to construct a forgery is presented.

...read moreread less

Abstract: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks. A method to construct a forgery is presented and possible countermeasures are discussed.

...read moreread less

14 citations

Book Chapter•DOI•

Maintaining external memory efficient hash tables

[...]

Philipp Woelfel¹•Institutions (1)

University of Toronto¹

28 Aug 2006

TL;DR: This work extends a static scheme of Pagh to obtain new randomized algorithms for maintaining hash tables, where a hash function can be evaluated in constant time and by probing only one external memory cell or O(1) consecutive external memory cells.

...read moreread less

Abstract: In typical applications of hashing algorithms the amount of data to be stored is often too large to fit into internal memory. In this case it is desirable to find the data with as few as possible non-consecutive or at least non-oblivious probes into external memory. Extending a static scheme of Pagh [11] we obtain new randomized algorithms for maintaining hash tables, where a hash function can be evaluated in constant time and by probing only one external memory cell or O(1) consecutive external memory cells. We describe a dynamic version of Pagh's hashing scheme achieving 100% table utilization but requiring (2+e)nlogn space for the hash function encoding as well as (3+e)nlogn space for the auxiliary data structure. Update operations are possible in expected constant amortized time. Then we show how to reduce the space for the hash function encoding and the auxiliary data structure to O(nloglogn). We achieve 100% utilization in the static version (and thus a minimal perfect hash function) and 1–e utilization in the dynamic case.

...read moreread less

10 citations

Proceedings Article•DOI•

Adaptive Hashing for IP Address Lookup in Computer Networks

[...]

Christopher J. Martinez¹, Wei-Ming Lin¹•Institutions (1)

University of Texas at San Antonio¹

01 Sep 2006

TL;DR: The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin and follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution.

...read moreread less

Abstract: For applications that rely on large databases as the core data structure, the need for a fast search process is essential. Hashing algorithms have widely been adopted as the search algorithm of choice for fast lookups. Hashing algorithms involve the creation of hash values from the target database entries. A hashing algorithm that transforms the database to hash values with a distribution as uniform as possible would lead to a better search performance. When a database is already value-wise uniformly distributed, any regular hashing algorithm, such as bit-extraction, group-XOR, etc., will lead to a statistically perfect hashing result. In almost all known practical applications, the target database rarely demonstrates uniformly distributed characteristic. The use of any known regular hashing algorithm can lead to a performance far less than desirable. This paper aims at designing a hashing algorithm that can deliver a better performance for all practical databases. An analytical preprocess is performed on the original database to extract critical information that would significantly benefit the design of a better hashing algorithm. The process includes sorting database hash bits to provide a priority that would facilitate the decision-making on which bits and how these bits should be combined to generate better hash values. The algorithm follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution. The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin.

...read moreread less

9 citations

Proceedings Article•DOI•

Image Robust Hashing Based on DCT Sign

[...]

Longjiang Yu¹, Shenghe Sun¹•Institutions (1)

Harbin Institute of Technology¹

18 Dec 2006

TL;DR: An image robust hashing method based on sign bit from the domain of Discrete Cosine Transform (DCT) is presented and the results show a better performance of the proposed method than an existing method.

...read moreread less

Abstract: Image robust hashing is related to cryptographic hash functions. In contrast to cryptographic hash functions this robust digest is sensitive only to perceptual change. Minor changes, which are not affecting the perception, do not result in a different hash. Image robust hashing is used in content-based retrieval, monitoring, and filtering. In this paper an image robust hashing method based on sign bit from the domain of Discrete Cosine Transform (DCT) is presented. DCT is widely used in image processing and video processing, e.g. for compression or digital watermarking. From an image with reduced dimensions we apply 2-D DCT to derive an initial feature vector. The sign bit of this feature vector is extracted to form an intermediate hash. The intermediate hash can be incorporated into some security mechanism to derive a final hash. The advantages of the sign signal in DCT domain are verified in experiments evaluating robustness (e.g. against operations like lossy compression, scaling and cropping) and discriminability. The results show a better performance of the proposed method in this paper than an existing method.

...read moreread less

Journal Article•

Image Hashing Based on Human Visual System

[...]

Zhang Xin-peng¹•Institutions (1)

Shanghai University¹

01 Jan 2006-Journal of Image and Graphics

TL;DR: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed, which has better robustness against JPEG compression and low-pass filtering.

...read moreread less

Abstract: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed.The transform domain matrix composed of 8-by-8 block DCT coefficients of the image are multiplied by N matrices that are pseudo-randomly generated with a key,and divided by the periodically extended Watson matrix.By quantization,an N-bit image hash is obtained.Compared to some other hashing methods,the HVS-based hash has better robustness against JPEG compression and low-pass filtering.Since a key is used in the algorithm,the hash is hard to be forged.

...read moreread less

Proceedings Article•DOI•

A Secure and Robust DCT-Based Hashing Scheme for Image Authentication

[...]

Fawad Ahmed¹, Mohammed Yakoob Siyal¹•Institutions (1)

Nanyang Technological University¹

01 Oct 2006

TL;DR: This paper highlights the problems that have been discovered in some of the existing DCT-based hashing schemes for image authentication, and proposes solutions to counter these problems.

...read moreread less

Abstract: The purpose of an image hash function is to provide a compact representation of an image that can be used for authentication purposes. Designing a good image hash function requires the consideration of many issues like robustness, security and tamper detection with precise localization. In this paper, we focus our attention towards DCT-based hashing schemes for image authentication. We first highlight the problems that we have discovered in some of the existing DCT-based hashing schemes proposed in the literature. We then propose solutions to counter these problems. We present experimental results to show the effectiveness of our proposed scheme. Although we focus on DCT-based hashing techniques, however, the type of problems highlighted in this paper may be present in other spatial or transform domain image hashing techniques as well.

...read moreread less

Proceedings Article•DOI•

Geometric Hashing Using 3D Aspects and Constrained Structures

[...]

Chen Zhe, Zhao Rongchun, Zhang Yanning

01 Jan 2006

TL;DR: Geometric hashing method is extended to 3D object recognition under perspective transformation, in which, 3D aspects of object and geometric constrained structures are used to construct hash table, and geometric invariants of constrained structures can provide the hashing function.

...read moreread less

Abstract: Geometric hashing, as an effective model retrieving method, acts as an important role in object recognition. The most of the current geometric hashing methods are suitable for the 2D scene recognition under affine transformation. In this paper, geometric hashing method is extended to 3D object recognition under perspective transformation. In which, 3D aspects of object and geometric constrained structures are used to construct hash table. In this way, geometric invariants of constrained structures can provide the hashing function, and the 3D aspects of object give the information of object pose, which can simplify matching procedure. In experiment, some artificial objects are used to verify the method and the experimental results show that the proposed method is correct and effective

...read moreread less

Journal Article•DOI•

On the Use of Hash Functions as Preprocessing Algorithms to Detect Defects on Repeating Definite Textures

[...]

Cem Baykal¹, A. Jullien¹•Institutions (1)

University of Calgary¹

13 Jul 2006

TL;DR: The generation of suitable hash functions for textured images, which are simple enough to fit into a very small FPGA, are discussed, and several examples of their use are provided.

...read moreread less

Abstract: Hash functions are one way functions and often used in cryptography to ensure the integrity of files by creating a binary signature specific to that file. In a similar way, a family of special hash functions can be developed and used to generate one dimensional signatures of an image. The resultant signatures can then be used to compare the image either to a golden template or, if the image consists of repeating definite patterns, then to the texture itself. While such hash functions are sensitive enough to detect small changes and defects in repeating texture, they are immune to changes in illumination and contrast. In this paper we discuss the generation of suitable hash functions for textured images, which are simple enough to fit into a very small FPGA, and provide several examples of their use.

...read moreread less

On Accuracy and Speed of Object Recognition Based on Local Arrangements of Feature Points

[...]

Iwamura Masakazu, Nakai Tomohiro, Kise Koichi

01 Sep 2006

TL;DR: This report proposes “locally likely arrangement hashing (LLAH)” which outperforms the geometric hashing in both retrieval accuracy and processing time and considers the major factors which bring the improvement.

...read moreread less

Abstract: The geometric hashing is a well-known object recognition technique based on the arrangements of feature points. We have proposed “locally likely arrangement hashing (LLAH)” which outperforms the geometric hashing in both retrieval accuracy and processing time. In this report, by comparing both methods, we consider the major factors which bring the improvement. We also consider the relationship between a picture angle and the accuracy of the LLAH because the accuracy of the LLAH depends on the picture angle.

...read moreread less

Journal Article•

Practical Hashing Function for URLs Set

[...]

Dai Yafei¹•Institutions (1)

Peking University¹

01 Jan 2006-Journal of Chinese Computer Systems

TL;DR: A hashing function for large scale URLs set is proposed and it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale and is recommended used in the applications of needing to hash URLs.

...read moreread less

Abstract: URL hashing is found many applications in Web research We propose a hashing function for large scale URLs set and find it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale It is a variation of the well-known function (ELFhash) and is recommended used in the applications of needing to hash URLs Moreover, it has low time cost and almost performance compared with MD5 and SHA-1 so that we think it is more practical than the other Finally, some future works are given

...read moreread less

Posted Content•

Cascade hash tables: a series of multilevel double hashing schemes with O(1) worst case lookup time

[...]

Shaohua Li

07 Aug 2006-arXiv: Data Structures and Algorithms

TL;DR: A series of multilevel double hashing schemes called cascade hash tables, which use several levels of hash tables work as fail-safes of lower level hash tables and could effectively reduce collisions in hash insertion.

...read moreread less

Abstract: In this paper, the author proposes a series of multilevel double hashing schemes called cascade hash tables. They use several levels of hash tables. In each table, we use the common double hashing scheme. Higher level hash tables work as fail-safes of lower level hash tables. By this strategy, it could effectively reduce collisions in hash insertion. Thus it gains a constant worst case lookup time with a relatively high load factor(70%-85%) in random experiments. Different parameters of cascade hash tables are tested.

...read moreread less

Proceedings Article•DOI•

The Design of Efficient Hashing Techniques for IP Address Lookup

[...]

Devang K. Pandya, Christopher J. Martinez, Wei-Ming Lin, Parimal Patel

01 Nov 2006

TL;DR: This paper proposes a unique hashing algorithm to tackle such a non-uniformly distributed database prevalent in computer network applications.

...read moreread less

Abstract: Hash results delivered by traditional hashing algorithms usually are far from optimal when the database presented is not uniformly distributed. This paper proposes a unique hashing algorithm to tackle such a non-uniformly distributed database prevalent in computer network applications. The original database is first pre-processed to extract information that would facilitate the design of an ad-hoc hashing algorithm.

...read moreread less

Adaptive Hashing forIP Address Lookup in ComputerNetworks

[...]

Christopher Martinez

01 Jan 2006

TL;DR: The proposed technique clearly outperforms allknownregular hashing algorithms by a significant margin and follows anadhocdesign that is critical toadapting to real-time situation whenthereexists a changing database with anirregular non-uniform distribution.

...read moreread less

Abstract: Forapplications thatrelyonlargedatabases as thecoredatastructure, theneedfora fastsearch process is essential. Hashing algorithms havewidely beenadopted asthe search algorithm ofchoice forfast lookups. Hashing algorithms involve thecreation ofhashvalues fromthetarget database entries. A hashing algorithm thattransforms thedatabase to hashvalues witha distribution asuniform aspossible would leadtoabetter search performance. Whenadatabase isalready value-wise uniformly distributed, anyregular hashing algorithm, suchasbit-extraction, group-XOR, etc., will leadtoastatistically perfect hashing result. Inalmost allknownpractical applications, thetarget database rarely demonstrates uniformly distributed characteristic. Theuseofanyknownregular hashing algorithm canleadtoa performance farlessthandesirable. Thispaper aimsatdesigning ahashing algorithm thatcandeliver abetter performance forallpractical databases. Ananalytical preprocess isperformed ontheoriginal database toextract critical infor- mationthatwouldsignificantly benefit thedesign ofa better hashing algorithm. Theprocess includes sorting database hash bits toprovide apriority thatwouldfacilitate thedecision-making onwhichbits andhowthese bits should becombined togenerate better hashvalues. Thealgorithm follows anadhocdesign that iscritical toadapting toreal-time situation whenthereexists a changing database withanirregular non-uniform distribution. Theproposed technique clearly outperforms allknownregular hashing algorithms byasignificant margin.

...read moreread less