scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 2005"


Journal ArticleDOI
TL;DR: The proposed RASH feature vector is more robust and provides much stronger discrimination than a conventional histogram-based feature vector, and appears to be a good candidate to build indexing algorithms, copy-detection systems, or content-based authentication mechanisms.
Abstract: Robust signal hashing defines a feature vector that characterizes the signal, independently of "nonsignificant" distortions of its content. When dealing with images, the considered distortions are typically due to compression or small geometrical manipulations. In other words, robustness means that images that are visually indistinguishable should produce equal or similar hash values. To discriminate image contents, a hash function should produce distinct outputs for different images. Our paper first proposes a robust hashing algorithm for still images. It is based on radial projection of the image pixels and is denoted the Radial hASHing (RASH) algorithm. Experimental results provided on the USC-SIPI dataset reveal that the proposed RASH feature vector is more robust and provides much stronger discrimination than a conventional histogram-based feature vector. The RASH vector appears to be a good candidate to build indexing algorithms, copy-detection systems, or content-based authentication mechanisms. To take benefit from the RASH vector capabilities, video content is summarized into key frames, each of them characterizing a video shot and described by its RASH vector. The resulting video hashing system works in real time and supports most distortions due to common spatial and temporal video distortions.

175 citations


Journal ArticleDOI
TL;DR: A novel geometric distortion-invariant image hashing scheme, which can be employed to perform copy detection and content authentication of digital images, is proposed and exhaustive experimental results obtained from benchmark attacks confirm the excellent performance of the proposed method.
Abstract: Media hashing is an alternative approach to many applications previously accomplished with watermarking. The major disadvantage of the existing media hashing technologies is their limited resistance to geometric attacks. In this paper, a novel geometric distortion-invariant image hashing scheme, which can be employed to perform copy detection and content authentication of digital images, is proposed. Our major contributions are threefold: (i) a mesh-based robust hashing function is proposed; (ii) a sophisticated hash database for error-resilient and fast matching is constructed; and (iii) the application scalability of our scheme for content copy tracing and authentication is studied. In addition, we further investigate several media hashing issues, including robustness and discrimination, error analysis, and complexity, with respect to the proposed image hashing system. Exhaustive experimental results obtained from benchmark attacks confirm the excellent performance of the proposed method.

92 citations


Journal ArticleDOI
TL;DR: This work addresses the issue of security of hashes and proposes a keying technique, and thereby a key-dependent hash function, based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies.
Abstract: Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.

59 citations


Dissertation
01 Jan 2005
TL;DR: This dissertation focuses on feature extraction from natural images such that the extracted features are largely invariant under perceptually insignificant modifications to the image (i.e. robust).
Abstract: Hash functions are frequently called message digest functions. Their purpose is to extract a short binary string from a large digital message. A key feature of conventional cryptographic (and other) hashing algorithms such as message digest 5 (MD5) and secure hash algorithm 1 (SHA-1) is that they are extremely sensitive to the message; i.e., changing even one bit of the input message will change the output dramatically. However, multimedia data such as digital images undergo various manipulations such as compression and enhancement. An image hash function should instead take into account the changes in the visual domain and produce hash values based on the image's visual appearance. Such a function would facilitate comparisons and searches in large image databases. Other applications of a perceptual hash lie in content authentication and watermarking. This dissertation proposes a unifying framework for multimedia signal hashing. The problem of media hashing is divided into two stages. The first stage extracts media-dependent intermediate features that are robust under incidental modifications while being different for perceptually distinct media with high probability. The second stage performs a media-independent clustering of these features to produce a final hash. This dissertation focuses on feature extraction from natural images such that the extracted features are largely invariant under perceptually insignificant modifications to the image (i.e. robust). An iterative geometry preserving feature detection algorithm is developed based on an explicit modeling of the human visual system via end-stopped wavelets. For the second stage. I show that the decision version of the feature clustering problem is NP-complete. Then, for any perceptually significant feature extractor, I develop polynomial time clustering algorithms based on a greedy heuristic. Existing algorithms for image/media hashing exclusively employ either cryptographic or signal processing methods. A pure signal processing approach achieves robustness to perceptually insignificant distortions but compromises security which is desirable in applications for multimedia, protection. Likewise pure cryptographic techniques while secure, completely ignore the requirement of being robust to incidental modifications of the media. The primary contribution of this dissertation is a joint signal processing and cryptography approach to building robust as well as secure image hashing algorithms. The ideas proposed in this dissertation can also be applied to other problems in multimedia security, e.g. watermarking and data hiding.

35 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: This work presents a mathematical framework and derive expressions for the proposed security metric for various common image hashing schemes and discusses the trade-offs between security and robustness in image hashing.
Abstract: Security and robustness are two important requirements for image hash functions. We introduce "differential entropy" as a metric to quantify the amount of randomness in image hash functions and to study their security. We present a mathematical framework and derive expressions for the proposed security metric for various common image hashing schemes. Using the proposed security metric, we discuss the trade-offs between security and robustness in image hashing.

22 citations


Proceedings ArticleDOI
18 Jan 2005
TL;DR: The presented algorithm uses multiple-vertex dominators in circuit graphs to progressively simplify intermediate hashing steps and the experimental results on benchmark circuits demonstrate the robustness of the approach.
Abstract: The growing complexity of today's system designs requires fast and robust verification methods. Existing BDD, SAT or ATPG-based techniques do not provide sufficient solutions for many verification instances. Boolean function hashing is a probabilistic verification approach which can complement existing formal methods in a number of applications such as equivalence checking, biased random simulation, power analysis and power optimization. The proposed hashing technique is based on the arithmetic transform, which maps a Boolean function onto a probabilistic hash value for a given input assignment. The presented algorithm uses multiple-vertex dominators in circuit graphs to progressively simplify intermediate hashing steps. The experimental results on benchmark circuits demonstrate the robustness of our approach.

17 citations


Proceedings ArticleDOI
27 Jun 2005
TL;DR: A novel robust image hashing scheme based on the set partitioning in hierarchical trees (SPIHT) algorithm, widely employed in image compression, achieves good performance with reasonable complexity.
Abstract: Media hashing is an important resolving skill of copyright infringement. In this paper, a novel robust image hashing scheme is proposed. The set partitioning in hierarchical trees (SPIHT) algorithm, widely employed in image compression, is used to extract the identification information of images. The sorting pass of SPIHT records the spatial distribution of significant wavelet coefficients, termed the significance map. We build the hash values from the significance maps and the associated autocorrelograms. To verify the robustness of the proposed methods, experiments are conducted on the Stir mark benchmarking system. The proposed hash sequence in autocorrelogram achieves good performance with reasonable complexity.

16 citations


Patent
12 Oct 2005
TL;DR: In this article, an evolutionary algorithm was proposed to locate efficient hashing functions for specific data sets by sampling and evolving from the set of polynomials over the ring of integers mod n.
Abstract: Hashing functions have many practical applications in data storage and retrieval. Perfect hashing functions are extremely difficult to find, especially if the data set is large and without large-scale structure. There are great rewards for finding good hashing functions, considering the savings in computational time such functions provide, and much effort has been expended in this search. This in mind, we present a strong competitive evolutionary method to locate efficient hashing functions for specific data sets by sampling and evolving from the set of polynomials over the ring of integers mod n. We find favorable results that seem to indicate the power and usefulness of evolutionary methods in this search. Polynomials thus generated are found to have consistently better collision frequencies than other hashing methods. This results in a reduction in average number of array probes per data element hashed by a factor of two. Presented herein is an evolutionary algorithm to locate efficient hashing functions for specific data sets. Polynomials are used to investigate and evaluate various evolutionary strategies. Populations of random polynomials are generated, and then selection and mutation serve to eliminate unfit polynomials. The results are favorable and indicate the power and usefulness of evolutionary methods in hashing. The average number of collisions using the algorithm presented herein is about one-half of the number of collisions using other hashing methods. Efficient methods of data storage and retrieval are essential to today's information economy. Despite the cur-rent obstacles to creating efficient hashing functions, hashing is widely used due to its efficient data access. This study investigates the feasibility of overcoming such obstacles through the application of Darwin's ideas by modeling the basic principles of biological evolution in a computer. Polynomials over Zn are the evolutionary units and it is believed that competition and selection based on performance would locate polynomials that make efficient hashing functions.

14 citations


Proceedings ArticleDOI
06 Dec 2005
TL;DR: This paper proposes a secure image hashing scheme that allows acceptable manipulations like JPEG compression and low pass filtering and is sensitive enough in detecting malicious manipulations.
Abstract: An image hash function provides a condensed representation of an image that can be used for authentication purposes Security, robustness and fragility are three important issues in designing a hashing scheme for image authentication In this paper, we propose a secure image hashing scheme that allows acceptable manipulations like JPEG compression and low pass filtering and is sensitive enough in detecting malicious manipulations To enforce security, we use key-dependent feature extraction to form the image hash Several experimental results are presented to demonstrate the effectiveness of our proposed scheme

12 citations


Book ChapterDOI
09 May 2005
TL;DR: A novel image hashing method in the DCT Domain which can be directly extended to MPEG video without DCT transforms, and an algorithm for locating tampering based on the hashing method is presented.
Abstract: Image hashing is an alternative approach to many applications accomplished with watermarking. In this paper, we propose a novel image hashing method in the DCT Domain which can be directly extended to MPEG video without DCT transforms. A key goal of the method is to produce randomized hash signatures which are unpredictable for unauthorized users, thereby yielding properties akin to cryptographic MACs. This is achieved by encryption of the block DCT coefficients with chaotic sequences. After applying Principal Components Analysis (PCA) to the encrypted DCT coefficients, we take the quantized eigenvector matrix (8 × 8) and 8 eigenvalues together as the hash signature, the length of which is only 72 bytes for any image of arbitrary size. For image authentication, we also present an algorithm for locating tampering based on the hashing method. Experiments on large-scale database show that the proposed method is efficient, key dependent, pairwise independence, robust against common content-preserving manipulations.

11 citations


Patent
24 Oct 2005
TL;DR: In this article, a method of hash string extraction from biometric information is disclosed, which comprises the steps of providing a fingerprint in the form of a fingerprint, extracting features from the fingerprint, and encoding the features based on their location within the fingerprint; and generating a string of values based on the extracted features and their determined locations.
Abstract: A method of hash string extraction from biometric information is disclosed. The method comprises the steps of providing a biometric information sample in the form of a fingerprint for example, extracting features from the biometric information sample and encoding the features based on their location within the biometric information sample; and, generating a string of values based on the extracted features and their determined locations. The method further comprises the steps of hashing the string of symbols to produce a plurality of hash values for comparing the plurality of hash values against a stored hash value for identifying a user.

Book ChapterDOI
18 Sep 2005
TL;DR: This work investigates the class of hash functions based on checksums to encode the type signatures of MPI datatype and finds that hash functionsbased on Galois Field enables good hashing, computation of the signature of unidatatype in $\mathcal{O}$(1) and computation ofThe concatenation of two datatypes in $\ mathcal{ O}$ (1) additionally.
Abstract: Detecting misuse of datatypes in an application code is a desirable feature for an MPI library. To support this goal we investigate the class of hash functions based on checksums to encode the type signatures of MPI datatype. The quality of these hash functions is assessed in terms of hashing, timing and comparing to other functions published for this particular problem (Gropp, 7th European PVM/MPI Users’ Group Meeting, 2000) or for other applications (CRCs). In particular hash functions based on Galois Field enables good hashing, computation of the signature of unidatatype in $\mathcal{O}$(1) and computation of the concatenation of two datatypes in $\mathcal{O}$(1) additionally.


Patent
21 Dec 2005
TL;DR: In this paper, a method of hash string extraction from biometric information is disclosed, which comprises the steps of providing a fingerprint in the form of a fingerprint, extracting features from the fingerprint, and encoding the features based on their location within the fingerprint; and generating a string of values based on the extracted features and their determined locations.
Abstract: A method of hash string extraction from biometric information is disclosed. The method comprises the steps of providing a biometric information sample in the form of a fingerprint for example, extracting features from the biometric information sample and encoding the features based on their location within the biometric information sample; and, generating a string of values based on the extracted features and their determined locations. The method further comprises the steps of hashing the string of symbols to produce a plurality of hash values for comparing the plurality of hash values against a stored hash value for identifying a user.

Proceedings ArticleDOI
21 Mar 2005
TL;DR: It is shown that judicious assignment of binary indices to the codevectors of the quantizer improves the performance of the hashing method and the robustness provided by application of a channel code is evaluated.
Abstract: Compact representation of perceptually relevant parts of multimedia data, referred to as robust hashing or fingerprinting, is often used for efficient retrieval from databases and authentication. In previous work, we introduced a framework for robust hashing which improves the performance of any particular feature extraction method. The hash generation was achieved from a feature vector in three distinct stages, namely: quantization, bit assignment and application of the decoding stage of an error correcting code. Results were obtained for unidimensional quantization and bit assignment, on one code only. In this work, we provide a generalisation of those techniques to higher dimensions. Our framework is analysed under different conditions at each stage. For the quantization, we consider both the case where the codevectors are uniformly and nonuniformly distributed. For multidimensional quantizers, bit assignment to the resulting indexes is a non-trivial task and a number of techniques are evaluated. We show that judicious assignment of binary indices to the codevectors of the quantizer improves the performance of the hashing method. Finally, the robustness provided by a number of different channel codes is evaluated.

Journal Article
TL;DR: A generalized two dimensional matrix hashing algorithm that has several optional matrices, and the image can be restored through reverse transform, has such advantages as good robustness, high security, requiring not the original image for watermarking extraction, etc.
Abstract: In order to enhance the security of hashing algorithm,the authors propose a generalized two dimensional matrix hashing algorithm.The algorithm has several optional matrices, and the image can be restored through reverse transform. Thus, it is very difficult to decode the hashing algorithm; the security is enhanced via using the selected matrix and iteration time as the secret keys.Numerical experiments show that, under various image processing and attacks, the proposed algorithm has such advantages as good robustness, high security,requiring not the original image for watermarking extraction, etc.