scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 2007"


Proceedings ArticleDOI
12 Nov 2007
TL;DR: This paper presents an image hashing method, to not only detect but also localize tampering using a small signature (< 1kB), and brings out the efficacy of the proposed method compared to existing methods.
Abstract: An image hash should be (1) robust to allowable operations and (2) sensitive to illegal manipulations and distinct queries. Some applications also require the hash to be able to localize image tampering. This requires the hash to contain both robust content and alignment information to meet the above criterion. Fulfilling this is difficult because of two contradictory requirements. First, the hash should be small and second, to verify authenticity and then localize tampering, the amount of information in the hash about the original required would be large. Hence a tradeoff between these requirements needs to be found. This paper presents an image hashing method that addresses this concern, to not only detect but also localize tampering using a small signature (< 1kB). Illustrative experiments bring out the efficacy of the proposed method compared to existing methods.

128 citations


Proceedings ArticleDOI
23 Jul 2007
TL;DR: The design principles behind hash-based search methods are revealed and it is shown how optimum hash functions for similarity search can be derived and the rationale of their effectiveness is explained.
Abstract: Hash-based similarity search reduces a continuous similarity relation to the binary concept "similar or not similar": two feature vectors are considered as similar if they are mapped on the same hash key. From its runtime performance this principle is unequaled--while being unaffected by dimensionality concerns at the same time. Similarity hashing is applied with great success for near similarity search in large document collections, and it is considered as a key technology for near-duplicate detection and plagiarism analysis. This papers reveals the design principles behind hash-based search methods and presents them in a unified way. We introduce new stress statistics that are suited to analyze the performance of hash-based search methods, and we explain the rationale of their effectiveness. Based on these insights, we show how optimum hash functions for similarity search can be derived. We also present new results of a comparative study between different hash-based search methods.

97 citations


Journal ArticleDOI
TL;DR: The essential idea is to produce an efficient and scalable hashing scheme that can be used to supplement the traditional cryptographic hashing during the initial pass over the raw evidence, called a multi-resolution similarity hash (or MRS hash), which is a generalization of recent work in the area.

75 citations


Proceedings ArticleDOI
06 Nov 2007
TL;DR: The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions.
Abstract: We present a simple and efficient external perfect hashing scheme (referred to as EPH algorithm) for very large static key sets. We use a number of techniques from the literature to obtain a novel scheme that is theoretically well-understood and at the same time achieves an order-of-magnitude increase in the size of the problem to be solved compared to previous "practical" methods. We demonstrate the scalability of our algorithm by constructing minimum perfect hash functions for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 62 minutes, using a commodity PC. Our scheme produces minimal perfect hash functions using approximately 3.8 bits per key. For perfect hash functions in the range {0,...,2n - 1} the space usage drops to approximately 2.7 bits per key. The main contribution is the first algorithm that has experimentally proven practicality for sets in the order of billions of keys and has time and space usage carefully analyzed without unrealistic assumptions.

55 citations


Book ChapterDOI
11 Dec 2007
TL;DR: Results show that the proposed method can resist perceptually insignificant modifications such as compression, filtering, scaling and rotation and is also able to successfully detect content changing attacks such as insertion of foreign objects.
Abstract: Image hash function based on the image content has applications in watermarking, authentication and image retrieval. This paper presents an algorithm for generating an image hash that is robust against content-preserving modifications and at the same time, is capable of detecting malicious tampering. Robust features are first extracted from the discrete wavelet transform followed by the Radon transform. Probabilistic quantization is then used to map the feature values to a binary sequence. Results show that the proposed method can resist perceptually insignificant modifications such as compression, filtering, scaling and rotation. It is also able to successfully detect content changing attacks such as insertion of foreign objects.

32 citations


Journal ArticleDOI
TL;DR: This article reviews some representative image hashing techniques proposed in the recent years, with emphases on how to meet the conflicting requirements of perceptual robustness and security, and introduces two image hashing approaches developed in the own research.
Abstract: The easy generation, storage, transmission and reproduction of digital images have caused serious abuse and security problems. Assurance of the rightful ownership, integrity, and authenticity is a major concern to the academia as well as the industry. On the other hand, efficient search of the huge amount of images has become a great challenge. Image hashing is a technique suitable for use in image authentication and content based image retrieval (CBIR). In this article, we review some representative image hashing techniques proposed in the recent years, with emphases on how to meet the conflicting requirements of perceptual robustness and security. Following a brief introduction to some earlier methods, we focus on a typical two-stage structure and some geometric-distortion resilient techniques. We then introduce two image hashing approaches developed in our own research, and reveal security problems in some existing methods due to the absence of secret keys in certain stage of the image feature extraction, or availability of a large quantity of images, keys, or the hash function to the adversary. More research efforts are needed in developing truly robust and secure image hashing techniques.

22 citations


Book ChapterDOI
27 Aug 2007
TL;DR: A general biometric hash generation scheme based on vector quantization of multiple feature subsets selected with genetic optimization that overcomes the dimensionality problem of other hash generation algorithms and enables to exploit all the discriminative information found in large feature sets.
Abstract: We present a general biometric hash generation scheme based on vector quantization of multiple feature subsets selected with genetic optimization. The quantization of subsets overcomes the dimensionality problem of other hash generation algorithms, while the feature selection step using an integer-coding genetic algorithm enables to exploit all the discriminative information found in large feature sets. We provide experimental results of the proposed hashing for verification of on-line signatures. Development and evaluation experiments are reported on the MCYT signature database, comprising 16, 500 signatures from 330 subjects.

19 citations


Book ChapterDOI
09 Jan 2007
TL;DR: This paper presents a novel hashing scheme that is resilient to allow non-malicious manipulations like JPEG compression, high pass filtering and is sensitive enough to detect tampering with precise localization.
Abstract: The purpose of an image hash is to provide a compact representation of the whole image. Designing a good image hash function requires careful consideration of many issues such as robustness, security and tamper detection with precise localization. In this paper, we present a novel hashing scheme that addresses these issues in a unified framework. We analyze the security issues in image hashing and present new ideas to counter some of the attacks that we shall describe in this paper. Our proposed scheme is resilient to allow non-malicious manipulations like JPEG compression, high pass filtering and is sensitive enough to detect tampering with precise localization. Several experimental results are presented to demonstrate the effectiveness of the proposed scheme.

14 citations


01 Jan 2007
TL;DR: This analysis shows the potential of tailored hash-based indexing methods and identifies basic retrieval tasks which can benet from this new technology, relates them to well-known applications and discusses how hash- based indexing is applied.
Abstract: Hash-based indexing is a powerful technology for similarity search in large document collections [13]. Central idea is the interpretation of hash collisions as similarity indication, provided that an appropriate hash function is given. In this paper we identify basic retrieval tasks which can benet from this new technology, we relate them to well-known applications and discuss how hash-based indexing is applied. Moreover, we present two recently developed hash-based indexing approaches and compare the achieved performance improvements in real-world retrieval settings. This analysis, which has not been conducted in this or a similar form by now, shows the potential of tailored hash-based indexing methods.

12 citations


Patent
Simon Tong1, Noam Shazeer1
16 May 2007
TL;DR: In this paper, a system may track statistics for a number of features using an approximate counting technique by subjecting each feature to multiple, different hash functions to generate multiple different hash values, where each of the hash values may identify a particular location in a memory, and storing statistics for each feature at the particular locations identified by the hash value.
Abstract: A system may track statistics for a number of features using an approximate counting technique by: subjecting each feature to multiple, different hash functions to generate multiple, different hash values, where each of the hash values may identify a particular location in a memory, and storing statistics for each feature at the particular locations identified by the hash values. The system may generate rules for a model based on the tracked statistics.

8 citations


Proceedings ArticleDOI
01 Dec 2007
TL;DR: An image hashing technique that attempts to simultaneously address the robustness, fragility and security issues is presented and an improved version of this scheme with a wavelet-based smoothening to improve robustness against JPEG compression and a modified intensity-transformation for enhancing the security.
Abstract: Designing a hash function for multimedia authentication encompasses many issues like robustness to non- malicious distortion, sensitivity to detect malicious manipulations and security In this paper, we present an image hashing technique that attempts to simultaneously address the robustness, fragility and security issues This scheme is an improved version of our previously proposed scheme [1] with a wavelet-based smoothening to improve robustness against JPEG compression and a modified intensity-transformation for enhancing the security Several experimental results are presented to demonstrate the effectiveness of the proposed scheme

Journal ArticleDOI
TL;DR: This paper addresses the cases when such distribution follows a natural negative linear distribution, a partial negative linear distributions, or an exponential distribution which are found to closely approximate many real-life database distributions and derives a general formula for calculating the distribution variance produced by any given non-overlapping bit-grouping XOR hashing function.

Proceedings ArticleDOI
02 Apr 2007
TL;DR: This paper proposes a method for retrieving similar interaction protein using profiles that represent the features of the interaction site binding to a certain compound using geometric hashing technique.
Abstract: Protein function is expressed by binding to other compounds at a local portion, called an interaction site. Since the structure of its interaction site and function of a protein are closely related, retrieving similar interaction protein is effective in clarifying the function of a protein. We have proposed a method for retrieving similar interaction protein using profiles that represent the features of the interaction site binding to a certain compound. In this method, it is necessary to compare the structure between proteins and a profile, we use geometric hashing technique which is one of the popular methods for structure comparison. However, the problem of structure comparison by using the geometric hashing is that memory usage becomes too large. This paper proposes a method for arranging the geometric hashing to alleviate this problem. Firstly, only small parts of the target structures are stored in the hash table to reduce the size of the hash table. By evaluating this hash table we screen out candidates of similar structures between target proteins and query profiles. Secondly overall structures are compared for these candidates. In order to reduce the time for retrieval we evaluate the information of the origin which is not generally evaluated without increasing the size of the hash table. Reference set, the basis for transforming in geometric hashing, are sorted

Proceedings ArticleDOI
01 Aug 2007
TL;DR: A novel method based on image hashing is proposed to locate the acquired component from a CCD DSC camera in this paper and is verified by experimental results.
Abstract: A novel method based on image hashing is proposed to locate the acquired component from a CCD DSC camera in this paper. Image hash is extracted resistant to geometric distortions. The extracted hash is used to identify the component and related defects including missing component, mistake component, inverse orientation. The proposed method is verified by experimental results.

Patent
07 May 2007
TL;DR: In this paper, a hash triplet consisting of a hash for each document word and two involving the word and its preceding and following words is used to provide suggestions to the author, and to filter email.
Abstract: Usages of language are analyzed in ways that are at least partially language independent. In preferred embodiments, portions of a document are hashed, and the resulting hash values are compared with each other and with those of other documents in real-time. Analyses can be used to gauge conformity of a document to one or more standards utilizing a hash triplet consisting of a hash for each document word and two involving the word and its preceding and following words, to provide suggestions to the author, and to filter email.