Showing papers on "Feature hashing published in 2009"

PDF

Open Access

Journal Article•DOI•

[...]

Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

01 Jul 2009-International Journal of Approximate Reasoning

TL;DR: In this paper, a deep graphical model of the word-count vectors obtained from a large set of documents is proposed. But the model is restricted to the deep layer of the deep neural network and cannot handle large numbers of documents.

...read moreread less

1,266 citations

Proceedings Article•DOI•

Kernelized locality-sensitive hashing for scalable image search

[...]

Brian Kulis¹, Kristen Grauman²•Institutions (2)

University of California, Berkeley¹, University of Texas at Austin²

01 Sep 2009

TL;DR: It is shown how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sub-linear time similarity search guarantees for a wide class of useful similarity functions.

...read moreread less

Abstract: Fast retrieval methods are critical for large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sub-linear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several large-scale datasets, and show that it enables accurate and fast performance for example-based object classification, feature matching, and content-based retrieval.

...read moreread less

975 citations

Proceedings Article•DOI•

Feature hashing for large scale multitask learning

[...]

Kilian Q. Weinberger¹, Anirban Dasgupta¹, John Langford¹, Alexander J. Smola¹, Josh Attenberg¹ - Show less +1 more•Institutions (1)

Yahoo!¹

14 Jun 2009

TL;DR: In this article, the authors provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability, and demonstrate the feasibility of this approach with experimental results for a new use case.

...read moreread less

Abstract: Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case --- multitask learning with hundreds of thousands of tasks.

...read moreread less

955 citations

Proceedings Article•

Learning to Hash with Binary Reconstructive Embeddings

[...]

Brian Kulis¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

07 Dec 2009

TL;DR: An algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings is developed.

...read moreread less

Abstract: Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques.

...read moreread less

914 citations

Proceedings Article•

Locality-sensitive binary codes from shift-invariant kernels

[...]

Maxim Raginsky¹, Svetlana Lazebnik²•Institutions (2)

Duke University¹, University of North Carolina at Chapel Hill²

07 Dec 2009

TL;DR: This paper introduces a simple distribution-free encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shift-invariant kernel between the vectors.

...read moreread less

Abstract: This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distribution-free encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shift-invariant kernel (e.g., a Gaussian kernel) between the vectors. We present a full theoretical analysis of the convergence properties of the proposed scheme, and report favorable experimental performance as compared to a recent state-of-the-art method, spectral hashing.

...read moreread less

702 citations

Journal Article•

Hash Kernels for Structured Data

[...]

Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alexander J. Smola, S. V. N. Vishwanathan¹ - Show less +2 more•Institutions (1)

Australian National University¹

01 Dec 2009-Journal of Machine Learning Research

TL;DR: This work generalizes previous work using sampling and shows a principled way to compute the kernel matrix for data streams and sparse feature spaces and gives deviation bounds from the exact kernel matrix.

...read moreread less

Abstract: We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs.

...read moreread less

264 citations

Journal Article•DOI•

Real-time parallel hashing on the GPU

[...]

Dan A. Alcantara¹, Andrei Sharf¹, Fatemeh Abbasinejad¹, Shubhabrata Sengupta¹, Michael Mitzenmacher², John D. Owens¹, Nina Amenta¹ - Show less +3 more•Institutions (2)

University of California, Davis¹, Harvard University²

01 Dec 2009

TL;DR: An efficient data-parallel algorithm for building large hash tables of millions of elements in real-time, which considers a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations.

...read moreread less

Abstract: We demonstrate an efficient data-parallel algorithm for building large hash tables of millions of elements in real-time. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate real-time performance on large datasets: for 5 million key-value pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.

...read moreread less

194 citations

Posted Content•

Feature Hashing for Large Scale Multitask Learning

[...]

Kilian Q. Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alexander J. Smola - Show less +1 more

12 Feb 2009-arXiv: Artificial Intelligence

TL;DR: This paper provides exponential tail bounds for feature hashing and shows that the interaction between random subspaces is negligible with high probability, and demonstrates the feasibility of this approach with experimental results for a new use case --- multitask learning.

...read moreread less

Abstract: Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case -- multitask learning with hundreds of thousands of tasks.

...read moreread less

125 citations

Proceedings Article•DOI•

Supervised semantic indexing

[...]

Bing Bai¹, Jason Weston¹, David Grangier¹, Ronan Collobert¹, Kunihiko Sadamasa¹, Yanjun Qi¹, Olivier Chapelle², Kilian Q. Weinberger² - Show less +4 more•Institutions (2)

Princeton University¹, Yahoo!²

02 Nov 2009

TL;DR: This article proposes Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match and proposes several improvements to the basic model, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH).

...read moreread less

Abstract: In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH). We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.

...read moreread less

100 citations

Journal Article•DOI•

Hash-Based Identification of Sparse Image Tampering

[...]

Marco Tagliasacchi, Giuseppe Valenzise, Stefano Tubaro

01 Nov 2009-IEEE Transactions on Image Processing

TL;DR: An image hashing algorithm based on compressive sensing principles is proposed, which solves both the authentication and the tampering identification problems and is robust to moderate content-preserving transformations including cropping, scaling, and rotation.

...read moreread less

Abstract: In the last decade, the increased possibility to produce, edit, and disseminate multimedia contents has not been adequately balanced by similar advances in protecting these contents from unauthorized diffusion of forged copies When the goal is to detect whether or not a digital content has been tampered with in order to alter its semantics, the use of multimedia hashes turns out to be an effective solution to offer proof of legitimacy and to possibly identify the introduced tampering We propose an image hashing algorithm based on compressive sensing principles, which solves both the authentication and the tampering identification problems The original content producer generates a hash using a small bit budget by quantizing a limited number of random projections of the authentic image The content user receives the (possibly altered) image and uses the hash to estimate the mean square error distortion between the original and the received image In addition, if the introduced tampering is sparse in some orthonormal basis or redundant dictionary, an approximation is given in the pixel domain We emphasize that the hash is universal, eg, the same hash signature can be used to detect and identify different types of tampering At the cost of additional complexity at the decoder, the proposed algorithm is robust to moderate content-preserving transformations including cropping, scaling, and rotation In addition, in order to keep the size of the hash small, hash encoding/decoding takes advantage of distributed source codes

...read moreread less

86 citations

Collaborative Email-Spam Filtering with the Hashing-Trick

[...]

Joshua Attenberg, Kilian Q. Weinberger, Anirban Dasgupta, Alexander J. Smola, Martin Zinkevich - Show less +1 more

01 Jan 2009

TL;DR: This paper delves into a recently proposed technique for collaborative spam ltering that facilitates personalization and describes how this can be used to improve the quality of spam campaigns.

...read moreread less

Abstract: This paper delves into a recently proposed technique for collaborative spam ltering [7] that facilitates personalization

...read moreread less

Proceedings Article•DOI•

Query expansion for hash-based image object retrieval

[...]

Yin-Hsi Kuo¹, Kuan-Ting Chen¹, Chien-Hsing Chiang¹, Winston H. Hsu¹•Institutions (1)

National Taiwan University¹

19 Oct 2009

TL;DR: It is shown that the proposed expansion methods are complementary to each other and can collaboratively contribute up to 76.3% (average) relative improvement over the original hash-based method.

...read moreread less

Abstract: An efficient indexing method is essential for content-based image retrieval with the exponential growth in large-scale videos and photos. Recently, hash-based methods (e.g., locality sensitive hashing - LSH) have been shown efficient for similarity search. We extend such hash-based methods for retrieving images represented by bags of (high-dimensional) feature points. Though promising, the hash-based image object search suffers from low recall rates. To boost the hash-based search quality, we propose two novel expansion strategies - intra-expansion and inter-expansion. The former expands more target feature points similar to those in the query and the latter mines those feature points that shall co-occur with the search targets but not present in the query. We further exploit variations for the proposed methods. Experimenting in two consumer-photo benchmarks, we will show that the proposed expansion methods are complementary to each other and can collaboratively contribute up to 76.3% (average) relative improvement over the original hash-based method.

...read moreread less

Proceedings Article•DOI•

A key-dependent secure image hashing scheme by using Radon transform

[...]

Yang Ou¹, Kyung Hyune Rhee¹•Institutions (1)

Pukyong National University¹

01 Dec 2009

TL;DR: This paper proposes a simple key-based secure image hashing scheme by using radon transform and 1-D Discrete Cosine Transform (DCT), and introduces a randomization on Radon transform to make the image hash more secure and more robust.

...read moreread less

Abstract: The perceptual image hashing is an emerging technique which can be used in image authentication and content-based image retrieval. Recently, several image hashing schemes based on Radon transform are proposed for image authentication and retrieval. These schemes have no random information to strengthen the security with image hashing. In this paper, we propose a simple key-based secure image hashing scheme by using radon transform and 1-D Discrete Cosine Transform (DCT). Particularly, we introduce a randomization on Radon transform and make the image hash more secure and more robust. Moreover, the discriminative capability is also confirmed in our experimental results.

...read moreread less

Proceedings Article•DOI•

Choosing Best Hashing Strategies and Hash Functions

[...]

Mahima Singh¹, Deepak Garg¹•Institutions (1)

Thapar University¹

06 Mar 2009

TL;DR: The paper gives the guideline to choose a best suitable hashing method hash function for a particular problem and presents six suitable various classes of hash functions in which most of the problems can find their solution.

...read moreread less

Abstract: The paper gives the guideline to choose a best suitable hashing method hash function for a particular problem. After studying the various problem we find some criteria has been found to predict the best hash method and hash function for that problem. We present six suitable various classes of hash functions in which most of the problems can find their solution. Paper discusses about hashing and its various components which are involved in hashing and states the need of using hashing for faster data retrieval. Hashing methods were used in many different applications of computer science discipline. These applications are spread from spell checker, database management applications, symbol tables generated by loaders, assembler, and compilers. There are various forms of hashing that are used in different problems of hashing like Dynamic hashing, Cryptographic hashing, Geometric hashing, Robust hashing, Bloom hash, String hashing. At the end we conclude which type of hash function is suitable for which kind of problem.

...read moreread less

Proceedings Article•DOI•

Hashing the mAR coefficients from EEG data for person authentication

[...]

Chen He¹, Xudong Lv¹, Z. Jane Wang¹•Institutions (1)

University of British Columbia¹

19 Apr 2009

TL;DR: The proposed EEG hashing approach presents a fundamental departure from existing methods in EEG-biometry study and suggests that hashing may open new research directions and applications in the emerging EEG-based biometry area.

...read moreread less

Abstract: Electroencephalogram (EEG) recordings of brain waves have been shown to have unique pattern for each individual and thus have potential for biometric applications. In this paper, we propose an EEG feature extraction and hashing approach for person authentication. Multi-variate autoregressive (mAR) coefficients are extracted as features from multiple EEG channels and then hashed by using our recently proposed Fast Johnson-Lindenstrauss Transform (FJLT)-based hashing algorithm to obtain compact hash vectors. Based on the EEG hash vectors, a Naive Bayes probabilistic model is employed for person authentication. Our EEG hashing approach presents a fundamental departure from existing methods in EEG-biometry study. The promising results suggest that hashing may open new research directions and applications in the emerging EEG-based biometry area.

...read moreread less

Proceedings Article•DOI•

Conception and limits of robust perceptual hashing: towards side information assisted hash functions

[...]

Sviatoslav Voloshynovskiy¹, Oleksiy Koval¹, Fokko Beekhof¹, Thierry Pun¹•Institutions (1)

University of Geneva¹

04 Feb 2009-Proceedings of SPIE

TL;DR: Side-information assisted robust perceptual hashing is proposed as a solution to the above shortcomings and is based on both achievable rate and probability of error.

...read moreread less

Abstract: In this paper, we consider some basic concepts behind the design of existing robust perceptual hashing techniques for content identification. We show the limits of robust hashing from the communication perspectives as well as propose an approach that is able to overcome these shortcomings in certain setups. The consideration is based on both achievable rate and probability of error. We use the fact that most robust hashing algorithms are based on dimensionality reduction using random projections and quantization. Therefore, we demonstrate the corresponding achievable rate and probability of error based on random projections and compare with the results for the direct domain. The effect of dimensionality reduction is studied and the corresponding approximations are provided based on the Johnson-Lindenstrauss lemma. Side-information assisted robust perceptual hashing is proposed as a solution to the above shortcomings.

...read moreread less

Book Chapter•DOI•

An Analysis of Random-Walk Cuckoo Hashing

[...]

Alan Frieze¹, Páll Melsted¹, Michael Mitzenmacher²•Institutions (2)

Carnegie Mellon University¹, Harvard University²

21 Aug 2009

TL;DR: This paper significantly advance the state of the art by proving a polylogarithmic bound on the more efficient random-walk method, where items repeatedly kick out random blocking items until a free location for an item is found.

...read moreread less

Abstract: In this paper, we provide a polylogarithmic bound that holds with high probability on the insertion time for cuckoo hashing under the random-walk insertion method. Cuckoo hashing provides a useful methodology for building practical, high-performance hash tables. The essential idea of cuckoo hashing is to combine the power of schemes that allow multiple hash locations for an item with the power to dynamically change the location of an item among its possible locations. Previous work on the case where the number of choices is larger than two has required a breadth-first search analysis, which is both inefficient in practice and currently has only a polynomial high probability upper bound on the insertion time. Here we significantly advance the state of the art by proving a polylogarithmic bound on the more efficient random-walk method, where items repeatedly kick out random blocking items until a free location for an item is found.

...read moreread less

Journal Article•DOI•

Robust Speech Hashing for Content Authentication

[...]

Yuhua Jiao¹, Liping Ji, Xiamu Niu¹•Institutions (1)

Harbin Institute of Technology¹

19 Jun 2009-IEEE Signal Processing Letters

TL;DR: A novel key-dependent robust speech hashing based on speech production model is proposed in this letter, which is highly robust to content preserving operations as well as having high accuracy of tampering localization.

...read moreread less

Abstract: Robust hashing for multimedia authentication is an emerging research area. A novel key-dependent robust speech hashing based on speech production model is proposed in this letter. Robust hash is calculated based on linear spectrum frequencies (LSFs) which model the vocal tract. The correlation between LSFs is decoupled by discrete cosine transformation (DCT). A randomization scheme controlled by a secret key is applied in hash generation for random feature selection. The hash function is key-dependent and collision resistant. Meanwhile, it is highly robust to content preserving operations as well as having high accuracy of tampering localization.

...read moreread less

Patent•

Method and apparatus for generating hash mnemonics

[...]

Todd Adam Bachmann¹•Institutions (1)

Qualcomm¹

09 Jun 2009

TL;DR: In this paper, a hash value is mapped to a plurality of words to form the mnemonic, and the hash value may be mapped to word indices used to identify particular words in word lists.

...read moreread less

Abstract: Methods and computing devices enable users to identify documents using a hash value mapped to a word mnemonic for easy recall and comparison. A hash algorithm may be applied a document to generate a distinguishing hash value. The hash value is mapped to a plurality of words to form the mnemonic. To obtain the words, the hash value may be mapped to word indices used to identify particular words in word lists. Word lists may include a list of nouns, a list of verbs, and a list of adverbs or adjectives, so that the resulting three word mnemonics are memorable. More word lists may be used to map hash values to four-, five- or more word mnemonics. The number-to-mnemonic mapping methods may be used to map large numbers, such as account numbers, telephone numbers, etc. into mnemonics which are easier for people to remember and compare.

...read moreread less

Journal Article•DOI•

Audio Fingerprinting Based on Multiple Hashing in DCT Domain

[...]

Yu Liu¹, Hwan Sik Yun¹, Nam Soo Kim¹•Institutions (1)

Seoul National University¹

28 Apr 2009-IEEE Signal Processing Letters

TL;DR: Experimental results show that the proposed algorithm outperforms the Philips Robust Hash (PRH) algorithm under various distortions.

...read moreread less

Abstract: Audio fingerprinting techniques aim at successfully performing content-based audio identification even when the audio signals are slightly or seriously distorted. In this letter, we propose a novel audio fingerprinting technique based on multiple hashing. In order to improve the robustness of hashing, multiple hash strings are generated through the discrete cosine transform (DCT) which is applied to the temporal energy sequence in each subband. Experimental results show that the proposed algorithm outperforms the Philips Robust Hash (PRH) algorithm under various distortions.

...read moreread less

Proceedings Article•DOI•

On security threats for robust perceptual hashing

[...]

Oleksiy Koval¹, Sviatoslav Voloshynovskiy¹, Patrick Bas², François Cayre²•Institutions (2)

University of Geneva¹, Centre national de la recherche scientifique²

04 Feb 2009-Proceedings of SPIE

TL;DR: After modeling the process of hash extraction and the properties involved in this process, two different security threats are studied, namely the disclosure of the secret feature space and the tampering of the hash.

...read moreread less

Abstract: Perceptual hashing has to deal with the constraints of robustness, accuracy and security. After modeling the process of hash extraction and the properties involved in this process, two different security threats are studied, namely the disclosure of the secret feature space and the tampering of the hash. Two different approaches for performing robust hashing are presented: Random-Based Hash (RBH) where the security is achieved using a random projection matrix and Content-Based Hash (CBH) were the security relies on the difficulty to tamper the hash. As for digital watermarking, different security setups are also devised: the Batch Hash Attack, the Group Hash Attack, the Unique Hash Attack and the Sensitivity Attack. A theoretical analysis of the information leakage in the context of Random-Based Hash is proposed. Finally, practical attacks are presented: (1) Minor Component Analysis is used to estimate the secret projection of Random-Based Hashes and (2) Salient point tampering is used to tamper the hash of Content-Based Hashes systems.

...read moreread less

Proceedings Article•DOI•

DCT based multiple hashing technique for robust audio fingerprinting

[...]

Yu Liu¹, Cho Kiho¹, Hwan Sik Yun¹, Jong Won Shin¹, Nam Soo Kim¹ - Show less +1 more•Institutions (1)

Seoul National University¹

19 Apr 2009

TL;DR: A novel audio fingerprinting technique based on combining fingerprint matching results for multiple hash tables in order to improve the robustness of hashing is presented.

...read moreread less

Abstract: Audio fingerprinting techniques should successfully perform content-based audio identification even when the audio files are slightly or seriously distorted. In this paper, we present a novel audio fingerprinting technique based on combining fingerprint matching results for multiple hash tables in order to improve the robustness of hashing. Multiple hash tables are built based on the discrete cosine transform (DCT) which is applied to the time sequence of energies in each sub-band. Experimental results show that the recognition errors are significantly reduced compared with Philips Robust Hash (PRH) [1] under various distortions.

...read moreread less

Journal Article•DOI•

Quantum Hashing for Multimedia

[...]

Minho Jin¹, Chang D. Yoo¹•Institutions (1)

KAIST¹

01 Dec 2009-IEEE Transactions on Information Forensics and Security

TL;DR: The quantum hashing system is shown to be more robust against various distortions than the binary hashing system using the same intermediate hash values.

...read moreread less

Abstract: In this paper, a novel multimedia identification system based on quantum hashing is considered. Many traditional systems are based on binary hash which is obtained by encoding intermediate hash extracted from multimedia content. In the system considered, the intermediate hash values extracted from a query are encoded into quantum hash values by incorporating uncertainty in the binary hash values. For this, the intermediate hash difference between the query and its true-underlying content is considered as a random process. Then, the uncertainty is represented by the probability density estimate of the intermediate hash difference. The quantum hashing system is evaluated using both audio and video databases, and with marginal increment in computational cost, the quantum hashing system is shown to be more robust against various distortions than the binary hashing system using the same intermediate hash values.

...read moreread less

Journal Article•DOI•

A Study on the Randomness Measure of Image Hashing

[...]

Guopu Zhu¹, Jiwu Huang¹, Sam Kwong², Yang Jianquan¹•Institutions (2)

Sun Yat-sen University¹, City University of Hong Kong²

01 Dec 2009-IEEE Transactions on Information Forensics and Security

TL;DR: The fact that if the image features of an image hash function are scaled by a constant that is large than one, then the tradeoff between the robustness and the fragility of the image hashfunction will not change at all, but the security indicated by the randomness measure will increase is shown.

...read moreread less

Abstract: How to measure the security of image hashing is still an open issue in the field of image authentication. Some works have been conducted on the security measure of image hashing. One of the most important works is the randomness measure proposed by Swaminathan, which uses differential entropy as a metric to evaluate the security of randomized image features and has been applied mainly in the security analysis of the feature extraction stage of image hashing. It is meaningful to measure the randomness of the image features over the secret-key set for the security of image hashing because the image features extracted by image hashing should be generated randomly and difficult to guess. However, as is well known, differential entropy is not invariant to scaling; thus it might not be enough to evaluate the security of randomized image features. In this paper, we show the fact that if the image features of an image hash function are scaled by a constant that is large than one, then the tradeoff between the robustness and the fragility of the image hash function will not change at all, but the security indicated by the randomness measure will increase. The above-mentioned fact seems to contradict the following. First, the security of image hashing, which conflicts with robustness and fragility, cannot increase freely. Secondly, a deterministic operation, such as deterministic scaling, does not change the security of image hashing in terms of the difficulty of guessing the secret key or randomized image features. Therefore, the randomness measure should be modified to be invariant to scaling at least.

...read moreread less

Proceedings Article•DOI•

Progressive hashing for packet processing using set associative memory

[...]

Michel Hanna¹, Socrates Demetriades¹, Sangyeun Cho¹, Rami Melhem¹•Institutions (1)

University of Pittsburgh¹

19 Oct 2009

TL;DR: This paper presents "Progressive Hashing" (PH), a general open addressing hash-based packet processing scheme for Internet routers using the set associative memory architecture and shows by experimenting with real IP lookup tables and synthetic packet filtering databases that PH reduces the overflow over the multiple hashing.

...read moreread less

Abstract: As the Internet grows, both the number of rules in packet filtering databases and the number of prefixes in IP lookup tables inside the router are growing. The packet processing engine is a critical part of the Internet router as it is used to perform packet forwarding (PF) and packet classification (PC). In both applications, processing has to be at wire speed. It is common to use hash-based schemes in packet processing engines; however, the downside of classic hashing techniques such as overflow and worst case memory access time, has to be dealt with. Implementing hash tables using set associative memory has the property that each bucket of a hash table can be searched in one memory cycle outperforming the conventional Ternary CAMs in terms of power and scalability.In this paper we present "Progressive Hashing" (PH), a general open addressing hash-based packet processing scheme for Internet routers using the set associative memory architecture. Our scheme is an extension of the multiple hashing scheme and is amendable to high-performance hardware implementation with low overflow and low memory access latency. We show by experimenting with real IP lookup tables and synthetic packet filtering databases that PH reduces the overflow over the multiple hashing. The proposed PH processing engine is estimated to achieve an average processing speed of 160 Gbps for the PC application and 320 Gbps for the PF application.

...read moreread less

Proceedings Article•DOI•

Local Feature Hashing for face recognition

[...]

Zhihong Zeng¹, Tianhong Fang¹, Shishir K. Shah¹, Ioannis A. Kakadiaris¹•Institutions (1)

University of Houston¹

28 Sep 2009

TL;DR: The LFH algorithm is built on the p-stable distribution Locality-Sensitive Hashing scheme that projects a set of local features representing a query image to an ID histogram where the maximum bin is regarded as the recognized ID.

...read moreread less

Abstract: In this paper, we present Local Feature Hashing (LFH), a novel approach for face recognition. Focusing on the scalability of face recognition systems, we build our LFH algorithm on the p-stable distribution Locality-Sensitive Hashing (pLSH) scheme that projects a set of local features representing a query image to an ID histogram where the maximum bin is regarded as the recognized ID. Our extensive experiments on two publicly available databases demonstrate the advantages of our LFH method, including: i) significant computational improvement over naive search; ii) hashing in high-dimensional Euclidean space without embedding; and iii) robustness to pose, facial expression, illumination and partial occlusion.

...read moreread less

Patent•

Robust hashing of digital media data

[...]

Sergey Ioffe¹•Institutions (1)

Google¹

29 Sep 2009

TL;DR: In this article, a robust hashing method is applied to media data (e.g., video, image, and audio data), producing a hash output that is robust with respect to at least one attribute of the media data.

...read moreread less

Abstract: A robust hashing method is applied to media data (e.g., video, image, and/or audio data), producing a hash output that is robust with respect to at least one attribute of the media data. A histogram is generated for the media data and the histogram is hashed using a weighted hashing procedure. The histogram can be derived from a plurality of randomized versions of the media file, each randomized version of the media file altered to a random extent with respect to the attribute. The histogram can also be derived from a plurality of feature descriptors computed for the media data that are coarsely encoded with respect to the attribute. The weighted hashing procedure includes assigning a weight to components of the histogram and applying a plurality of hash functions to a number of versions of each component, the number of versions based on the assigned weight.

...read moreread less

Proceedings Article•DOI•

Shape-based features for image hashing

[...]

Li Weng¹, Bart Preneel¹•Institutions (1)

Katholieke Universiteit Leuven¹

28 Jun 2009

TL;DR: Two features are proposed for natural image hashing, based on the description of shapes, in terms of contours and regions, which have good robustness and discriminability and better ROC performance.

...read moreread less

Abstract: Perceptual hashing is a solution for identification and authentication of multimedia content. The key of this technique is the extraction of proper features. In this paper, two features are proposed for natural image hashing. They are based on the description of shapes, in terms of contours and regions. The contour-based feature is formed by edge detection. The region-based feature is formed by the angular radial transform. Simulation results show that they have good robustness and discriminability. Compared to some other features, better ROC performance is achieved.

...read moreread less

Book Chapter•DOI•

Corruption-localizing hashing

[...]

Giovanni Di Crescenzo¹, Shaoquan Jiang², Reihaneh Safavi-Naini³•Institutions (3)

Telcordia Technologies¹, University of Electronic Science and Technology of China², University of Calgary³

21 Sep 2009

TL;DR: This work introduces and investigates the new notion of corruptionlocalizing hashing, defined as a natural extension of collision-intractable hashing, and designs two such schemes, one starting from any collision- Intractable hash function, and the other starting fromAny collision- intractable keyed hash function.

...read moreread less

Abstract: Collision-intractable hashing is an important cryptographic primitive with numerous applications including efficient integrity checking for transmitted and stored data, and software. In several of these applications, it is important that in addition to detecting corruption of the data we also localize the corruptions. This motivates us to introduce and investigate the new notion of corruptionlocalizing hashing, defined as a natural extension of collision-intractable hashing. Our main contribution is in formally defining corruption-localizing hash schemes and designing two such schemes, one starting from any collision-intractable hash function, and the other starting from any collision-intractable keyed hash function. Both schemes have attractive efficiency properties in three important metrics: localization factor, tag length and localization running time, capturing the quality of localization, and performance in terms of storage and time complexity, respectively. The closest previous results, when modified to satisfy our formal definitions, only achieve similar properties in the case of a single corruption.

...read moreread less

Proceedings Article•DOI•

Vocabulary-based hashing for image search

[...]

Yingyu Liang¹, Jianmin Li¹, Bo Zhang¹•Institutions (1)

Tsinghua University¹

19 Oct 2009

TL;DR: This paper proposes a hash function family based on feature vocabularies, which can be employed to build a high-dimensional index for approximate nearest neighbor search and investigates the application in building indexes for image search.

...read moreread less

Abstract: This paper proposes a hash function family based on feature vocabularies and investigates the application in building indexes for image search. Each hash function is associated with a set of feature points, i.e. a vocabulary, and maps an input point to the ID of the nearest one in the vocabulary. The function family can be employed to build a high-dimensional index for approximate nearest neighbor search. Then we concentrate on its application in image search. Guiding rules for the construction of the vocabularies are derived, which improve the effectiveness of the approach in this context by taking advantage of the data distribution. The rules are applied to design an algorithm for vocabulary construction in practice. Experiments show promising performance of the approach and the effectiveness of the guiding rules. Comparison with the popular Euclidean locality-sensitive hashing also shows the advantage of our approach in image search.

...read moreread less