scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Proceedings ArticleDOI
29 Oct 2012
TL;DR: A novel framework for efficient large-scale video retrieval that integrates feature pooling and hashing in a single framework, and shows that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution.
Abstract: This paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we employ compact binary codes to facilitate nearest neighbor search. Unlike the previous methods which capitalize on only one type of hash code for retrieval, this paper combines heterogeneous hash codes to effectively describe the diverse and multi-scale visual contents in videos. Our method integrates feature pooling and hashing in a single framework. In the pooling stage, we cast video frames into a set of pre-specified components, which capture a variety of semantics of video contents. In the hashing stage, we represent each video component as a compact hash code, and combine multiple hash codes into hash tables for effective search. To speed up the retrieval while retaining most informative codes, we propose a graph-based influence maximization method to bridge the pooling and hashing stages. We show that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution. Our method works very efficiently, retrieving thousands of video clips from TRECVID dataset in about 0.001 second. For a larger scale synthetic dataset with 1M samples, it uses less than 1 second in response to 100 queries. Our method is extensively evaluated in both unsupervised and supervised scenarios, and the results on TRECVID Multimedia Event Detection and Columbia Consumer Video datasets demonstrate the success of our proposed technique.

60 citations

Journal ArticleDOI
TL;DR: This work addresses the issue of security of hashes and proposes a keying technique, and thereby a key-dependent hash function, based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies.
Abstract: Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.

59 citations

Proceedings ArticleDOI
21 Oct 2013
TL;DR: Topology Preserving Hashing is proposed, a novel hashing method that is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space and is formulated as a generalized eigendecomposition problem with closed form solutions.
Abstract: Binary hashing has been widely used for efficient similarity search. Learning efficient codes has become a research focus and it is still a challenge. In many cases, the real-world data often lies on a low-dimensional manifold, which should be taken into account to capture meaningful neighbors with hashing. The importance of a manifold is its topology, which represents the neighborhood relationships between its subregions and the relative proximities between the neighbors of each subregion, e.g. the relative ranking of neighbors of each subregion. Most existing hashing methods try to preserve the neighborhood relationships by mapping similar points to close codes, while ignoring the neighborhood rankings. Moreover, most hashing methods lack in providing a good ranking for query results since they use Hamming distance as the similarity metric, and in practice, there are often a lot of results sharing the same distance to a query. In this paper, we propose a novel hashing method to solve these two issues jointly. The proposed method is referred to as Topology Preserving Hashing (TPH). TPH is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space. The learning stage of TPH is formulated as a generalized eigendecomposition problem with closed form solutions. Experimental comparisons with other state-of-the-art methods on three noted image benchmarks demonstrate the efficacy of the proposed method.

58 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: A novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems and proposes a novel hashing scheme for arbitrary non- linear kernels via random subspace projection in reproducing kernel Hilbert space.
Abstract: This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems. Our key idea is to represent each sample with compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels. Then the problem of solving the nonlinear SVM can be transformed into solving a linear SVM over the hash bits. The proposed Hash-SVM enjoys dramatic storage cost reduction owing to the compact binary representation, as well as a (sub-)linear training complexity via linear SVM. As a critical component of Hash-SVM, we propose a novel hashing scheme for arbitrary non-linear kernels via random subspace projection in reproducing kernel Hilbert space. Our comprehensive analysis reveals a well behaved theoretic bound of the deviation between the proposed hashing-based kernel approximation and the original kernel function. We also derive requirements on the hash bits for achieving a satisfactory accuracy level. Several experiments on large-scale visual classification benchmarks are conducted, including one with over 1 million images. The results show that Hash-SVM greatly reduces the computational complexity (more than ten times faster in many cases) while keeping comparable accuracies.

58 citations

Journal ArticleDOI
Shumeet Baluja1, Michele Covell1
TL;DR: A method to learn a similarity function from only weakly labeled positive examples is described, used as the basis of a hash function to severely constrain the number of points considered for each lookup in a large corpus of high-dimensional data points.
Abstract: The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data set.

57 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838