scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 2006"


Journal ArticleDOI
TL;DR: A novel algorithm for generating an image hash based on Fourier transform features and controlled randomization is developed and it is shown that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions.
Abstract: Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness.

542 citations


Book ChapterDOI
11 Sep 2006
TL;DR: Only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice.
Abstract: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + i h2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice.

234 citations


Journal ArticleDOI
TL;DR: It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video.
Abstract: Identification and verification of a video clip via its fingerprint find applications in video browsing, database search and security. For this purpose, the video sequence must be collapsed into a short fingerprint using a robust hash function based on signal processing operations. We propose two robust hash algorithms for video based both on the discrete cosine transform (DCT), one on the classical basis set and the other on a novel randomized basis set (RBT). The robustness and randomness properties of the proposed hash functions are investigated in detail. It is found that these hash functions are resistant to signal processing and transmission impairments, and therefore can be instrumental in building database search, broadcast monitoring and watermarking applications for video. The DCT hash is more robust, but lacks security aspect, as it is easy to find different video clips with the same hash value. The RBT based hash, being secret key based, does not allow this and is more secure at the cost of a slight loss in the receiver operating curves

217 citations


Journal ArticleDOI
01 Jul 2006
TL;DR: A perfect multidimensional hash function is designed -- one that is precomputed on static data to have no hash collisions and is ideally suited for parallel SIMD evaluation on graphics hardware.
Abstract: We explore using hashing to pack sparse data into a compact table while retaining efficient random access Specifically, we design a perfect multidimensional hash function -- one that is precomputed on static data to have no hash collisions Because our hash function makes a single reference to a small offset table, queries always involve exactly two memory accesses and are thus ideally suited for parallel SIMD evaluation on graphics hardware Whereas prior hashing work strives for pseudorandom mappings, we instead design the hash function to preserve spatial coherence and thereby improve runtime locality of reference We demonstrate numerous graphics applications including vector images, texture sprites, alpha channel compression, 3D-parameterized textures, 3D painting, simulation, and collision detection

183 citations


Journal ArticleDOI
TL;DR: This paper proposes a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion and proves that the decision version of the clustering problem is NP complete.
Abstract: A perceptual image hash function maps an image to a short binary string based on an image's appearance to the human eye. Perceptual image hashing is useful in image databases, watermarking, and authentication. In this paper, we decouple image hashing into feature extraction (intermediate hash) followed by data clustering (final hash). For any perceptually significant feature extractor, we propose a polynomial-time heuristic clustering algorithm that automatically determines the final hash length needed to satisfy a specified distortion. We prove that the decision version of our clustering problem is NP complete. Based on the proposed algorithm, we develop two variations to facilitate perceptual robustness versus fragility tradeoffs. We validate the perceptual significance of our hash by testing under Stirmark attacks. Finally, we develop randomized clustering algorithms for the purposes of secure image hashing.

123 citations


Proceedings ArticleDOI
09 Jul 2006
TL;DR: This paper introduces a hardware-friendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound, and employs a Bloom filter, which is known for its simplicity and speed.
Abstract: Hash tables are a fundamental data structure in many network applications, including route lookups, packet classification and monitoring. Often a part of the data path, they need to operate at wire-speed. However, several associative memory accesses are needed to resolve collisions, making them slower than required. This motivates us to consider minimal perfect hashing schemes, which reduce the number of memory accesses to just 1 and are also space-efficient. Existing perfect hashing algorithms are not tailored for network applications because they take too long to construct and are hard to implement in hardware. This paper introduces a hardware-friendly scheme for minimal perfect hashing, with space requirement approaching 3.7 times the information theoretic lower bound. Our construction is several orders faster than existing perfect hashing schemes. Instead of using the traditional mapping-partitioning-searching methodology, our scheme employs a Bloom filter, which is known for its simplicity and speed. We extend our scheme to the dynamic setting, thus handling insertions and deletions.

62 citations


Patent
06 Dec 2006
TL;DR: In this paper, the similarity-based hashing (SBH) algorithm is used to calculate hash values for the same data and the more similar data, the smaller difference in the generated hash values.
Abstract: A data hashing method, a data processing method, and a data processing system using a similarity-based hashing (SBH) algorithm in which the same hash value is calculated for the same data and the more similar data, the smaller difference in the generated hash values The data hashing method includes receiving computerized data, and generating a hash value of the computerized data using the SBH algorithm in which two data are the same if calculated hash values are the same and two data are similar if the difference of calculated hash values is small, wherein a search, comparison, and classification of data may be quickly processed within a time complexity of O(1) or O(n) since the similarity/closeness of data content are quantified by component values for each of the respective corresponding generated hash values

46 citations


Proceedings ArticleDOI
26 Sep 2006
TL;DR: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks and a method to construct a forgery is presented.
Abstract: A wavelet-based robust database hashing scheme is analyzed with respect to its resilience against image modifications and hostile attacks. A method to construct a forgery is presented and possible countermeasures are discussed.

14 citations


Book ChapterDOI
28 Aug 2006
TL;DR: This work extends a static scheme of Pagh to obtain new randomized algorithms for maintaining hash tables, where a hash function can be evaluated in constant time and by probing only one external memory cell or O(1) consecutive external memory cells.
Abstract: In typical applications of hashing algorithms the amount of data to be stored is often too large to fit into internal memory. In this case it is desirable to find the data with as few as possible non-consecutive or at least non-oblivious probes into external memory. Extending a static scheme of Pagh [11] we obtain new randomized algorithms for maintaining hash tables, where a hash function can be evaluated in constant time and by probing only one external memory cell or O(1) consecutive external memory cells. We describe a dynamic version of Pagh's hashing scheme achieving 100% table utilization but requiring (2+e)nlogn space for the hash function encoding as well as (3+e)nlogn space for the auxiliary data structure. Update operations are possible in expected constant amortized time. Then we show how to reduce the space for the hash function encoding and the auxiliary data structure to O(nloglogn). We achieve 100% utilization in the static version (and thus a minimal perfect hash function) and 1–e utilization in the dynamic case.

10 citations


Proceedings ArticleDOI
01 Sep 2006
TL;DR: The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin and follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution.
Abstract: For applications that rely on large databases as the core data structure, the need for a fast search process is essential. Hashing algorithms have widely been adopted as the search algorithm of choice for fast lookups. Hashing algorithms involve the creation of hash values from the target database entries. A hashing algorithm that transforms the database to hash values with a distribution as uniform as possible would lead to a better search performance. When a database is already value-wise uniformly distributed, any regular hashing algorithm, such as bit-extraction, group-XOR, etc., will lead to a statistically perfect hashing result. In almost all known practical applications, the target database rarely demonstrates uniformly distributed characteristic. The use of any known regular hashing algorithm can lead to a performance far less than desirable. This paper aims at designing a hashing algorithm that can deliver a better performance for all practical databases. An analytical preprocess is performed on the original database to extract critical information that would significantly benefit the design of a better hashing algorithm. The process includes sorting database hash bits to provide a priority that would facilitate the decision-making on which bits and how these bits should be combined to generate better hash values. The algorithm follows an ad hoc design that is critical to adapting to real-time situation when there exists a changing database with an irregular non-uniform distribution. The proposed technique clearly outperforms all known regular hashing algorithms by a significant margin.

9 citations


Proceedings ArticleDOI
18 Dec 2006
TL;DR: An image robust hashing method based on sign bit from the domain of Discrete Cosine Transform (DCT) is presented and the results show a better performance of the proposed method than an existing method.
Abstract: Image robust hashing is related to cryptographic hash functions. In contrast to cryptographic hash functions this robust digest is sensitive only to perceptual change. Minor changes, which are not affecting the perception, do not result in a different hash. Image robust hashing is used in content-based retrieval, monitoring, and filtering. In this paper an image robust hashing method based on sign bit from the domain of Discrete Cosine Transform (DCT) is presented. DCT is widely used in image processing and video processing, e.g. for compression or digital watermarking. From an image with reduced dimensions we apply 2-D DCT to derive an initial feature vector. The sign bit of this feature vector is extracted to form an intermediate hash. The intermediate hash can be incorporated into some security mechanism to derive a final hash. The advantages of the sign signal in DCT domain are verified in experiments evaluating robustness (e.g. against operations like lossy compression, scaling and cropping) and discriminability. The results show a better performance of the proposed method in this paper than an existing method.

Journal Article
TL;DR: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed, which has better robustness against JPEG compression and low-pass filtering.
Abstract: An HVS-based image hashing method incorporating a Watson's sensitivity matrix is proposed.The transform domain matrix composed of 8-by-8 block DCT coefficients of the image are multiplied by N matrices that are pseudo-randomly generated with a key,and divided by the periodically extended Watson matrix.By quantization,an N-bit image hash is obtained.Compared to some other hashing methods,the HVS-based hash has better robustness against JPEG compression and low-pass filtering.Since a key is used in the algorithm,the hash is hard to be forged.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: This paper highlights the problems that have been discovered in some of the existing DCT-based hashing schemes for image authentication, and proposes solutions to counter these problems.
Abstract: The purpose of an image hash function is to provide a compact representation of an image that can be used for authentication purposes. Designing a good image hash function requires the consideration of many issues like robustness, security and tamper detection with precise localization. In this paper, we focus our attention towards DCT-based hashing schemes for image authentication. We first highlight the problems that we have discovered in some of the existing DCT-based hashing schemes proposed in the literature. We then propose solutions to counter these problems. We present experimental results to show the effectiveness of our proposed scheme. Although we focus on DCT-based hashing techniques, however, the type of problems highlighted in this paper may be present in other spatial or transform domain image hashing techniques as well.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: Geometric hashing method is extended to 3D object recognition under perspective transformation, in which, 3D aspects of object and geometric constrained structures are used to construct hash table, and geometric invariants of constrained structures can provide the hashing function.
Abstract: Geometric hashing, as an effective model retrieving method, acts as an important role in object recognition. The most of the current geometric hashing methods are suitable for the 2D scene recognition under affine transformation. In this paper, geometric hashing method is extended to 3D object recognition under perspective transformation. In which, 3D aspects of object and geometric constrained structures are used to construct hash table. In this way, geometric invariants of constrained structures can provide the hashing function, and the 3D aspects of object give the information of object pose, which can simplify matching procedure. In experiment, some artificial objects are used to verify the method and the experimental results show that the proposed method is correct and effective

Journal ArticleDOI
13 Jul 2006
TL;DR: The generation of suitable hash functions for textured images, which are simple enough to fit into a very small FPGA, are discussed, and several examples of their use are provided.
Abstract: Hash functions are one way functions and often used in cryptography to ensure the integrity of files by creating a binary signature specific to that file. In a similar way, a family of special hash functions can be developed and used to generate one dimensional signatures of an image. The resultant signatures can then be used to compare the image either to a golden template or, if the image consists of repeating definite patterns, then to the texture itself. While such hash functions are sensitive enough to detect small changes and defects in repeating texture, they are immune to changes in illumination and contrast. In this paper we discuss the generation of suitable hash functions for textured images, which are simple enough to fit into a very small FPGA, and provide several examples of their use.

01 Sep 2006
TL;DR: This report proposes “locally likely arrangement hashing (LLAH)” which outperforms the geometric hashing in both retrieval accuracy and processing time and considers the major factors which bring the improvement.
Abstract: The geometric hashing is a well-known object recognition technique based on the arrangements of feature points. We have proposed “locally likely arrangement hashing (LLAH)” which outperforms the geometric hashing in both retrieval accuracy and processing time. In this report, by comparing both methods, we consider the major factors which bring the improvement. We also consider the relationship between a picture angle and the accuracy of the LLAH because the accuracy of the LLAH depends on the picture angle.

Journal Article
Dai Yafei1
TL;DR: A hashing function for large scale URLs set is proposed and it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale and is recommended used in the applications of needing to hash URLs.
Abstract: URL hashing is found many applications in Web research We propose a hashing function for large scale URLs set and find it has better uniformity and stability than the other two (HfIp and hf) through three experiments of the large scale It is a variation of the well-known function (ELFhash) and is recommended used in the applications of needing to hash URLs Moreover, it has low time cost and almost performance compared with MD5 and SHA-1 so that we think it is more practical than the other Finally, some future works are given

Posted Content
TL;DR: A series of multilevel double hashing schemes called cascade hash tables, which use several levels of hash tables work as fail-safes of lower level hash tables and could effectively reduce collisions in hash insertion.
Abstract: In this paper, the author proposes a series of multilevel double hashing schemes called cascade hash tables. They use several levels of hash tables. In each table, we use the common double hashing scheme. Higher level hash tables work as fail-safes of lower level hash tables. By this strategy, it could effectively reduce collisions in hash insertion. Thus it gains a constant worst case lookup time with a relatively high load factor(70%-85%) in random experiments. Different parameters of cascade hash tables are tested.

Proceedings ArticleDOI
01 Nov 2006
TL;DR: This paper proposes a unique hashing algorithm to tackle such a non-uniformly distributed database prevalent in computer network applications.
Abstract: Hash results delivered by traditional hashing algorithms usually are far from optimal when the database presented is not uniformly distributed. This paper proposes a unique hashing algorithm to tackle such a non-uniformly distributed database prevalent in computer network applications. The original database is first pre-processed to extract information that would facilitate the design of an ad-hoc hashing algorithm.

01 Jan 2006
TL;DR: The proposed technique clearly outperforms allknownregular hashing algorithms by a significant margin and follows anadhocdesign that is critical toadapting to real-time situation whenthereexists a changing database with anirregular non-uniform distribution.
Abstract: Forapplications thatrelyonlargedatabases as thecoredatastructure, theneedfora fastsearch process is essential. Hashing algorithms havewidely beenadopted asthe search algorithm ofchoice forfast lookups. Hashing algorithms involve thecreation ofhashvalues fromthetarget database entries. A hashing algorithm thattransforms thedatabase to hashvalues witha distribution asuniform aspossible would leadtoabetter search performance. Whenadatabase isalready value-wise uniformly distributed, anyregular hashing algorithm, suchasbit-extraction, group-XOR, etc., will leadtoastatistically perfect hashing result. Inalmost allknownpractical applications, thetarget database rarely demonstrates uniformly distributed characteristic. Theuseofanyknownregular hashing algorithm canleadtoa performance farlessthandesirable. Thispaper aimsatdesigning ahashing algorithm thatcandeliver abetter performance forallpractical databases. Ananalytical preprocess isperformed ontheoriginal database toextract critical infor- mationthatwouldsignificantly benefit thedesign ofa better hashing algorithm. Theprocess includes sorting database hash bits toprovide apriority thatwouldfacilitate thedecision-making onwhichbits andhowthese bits should becombined togenerate better hashvalues. Thealgorithm follows anadhocdesign that iscritical toadapting toreal-time situation whenthereexists a changing database withanirregular non-uniform distribution. Theproposed technique clearly outperforms allknownregular hashing algorithms byasignificant margin.