Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Submodular video hashing: a unified framework towards video pooling and indexing

[...]

Liangliang Cao¹, Zhenguo Li², Yadong Mu², Shih-Fu Chang²•Institutions (2)

IBM¹, Columbia University²

29 Oct 2012

TL;DR: A novel framework for efficient large-scale video retrieval that integrates feature pooling and hashing in a single framework, and shows that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution.

...read moreread less

Abstract: This paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we employ compact binary codes to facilitate nearest neighbor search. Unlike the previous methods which capitalize on only one type of hash code for retrieval, this paper combines heterogeneous hash codes to effectively describe the diverse and multi-scale visual contents in videos. Our method integrates feature pooling and hashing in a single framework. In the pooling stage, we cast video frames into a set of pre-specified components, which capture a variety of semantics of video contents. In the hashing stage, we represent each video component as a compact hash code, and combine multiple hash codes into hash tables for effective search. To speed up the retrieval while retaining most informative codes, we propose a graph-based influence maximization method to bridge the pooling and hashing stages. We show that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution. Our method works very efficiently, retrieving thousands of video clips from TRECVID dataset in about 0.001 second. For a larger scale synthetic dataset with 1M samples, it uses less than 1 second in response to 100 queries. Our method is extensively evaluated in both unsupervised and supervised scenarios, and the results on TRECVID Multimedia Event Detection and Columbia Consumer Video datasets demonstrate the success of our proposed technique.

...read moreread less

60 citations

Journal Article•DOI•

Perceptual audio hashing functions

[...]

Hamza Ozer¹, Bulent Sankur, Nasir Memon², Emin Anarim•Institutions (2)

Scientific and Technological Research Council of Turkey¹, New York University²

01 Jan 2005-EURASIP Journal on Advances in Signal Processing

TL;DR: This work addresses the issue of security of hashes and proposes a keying technique, and thereby a key-dependent hash function, based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies.

...read moreread less

Abstract: Perceptual hash functions provide a tool for fast and reliable identification of content. We present new audio hash functions based on summarization of the time-frequency spectral characteristics of an audio document. The proposed hash functions are based on the periodicity series of the fundamental frequency and on singular-value description of the cepstral frequencies. They are found, on one hand, to perform very satisfactorily in identification and verification tests, and on the other hand, to be very resilient to a large variety of attacks. Moreover, we address the issue of security of hashes and propose a keying technique, and thereby a key-dependent hash function.

...read moreread less

59 citations

Proceedings Article•DOI•

Topology preserving hashing for similarity search

[...]

Lei Zhang¹, Yongdong Zhang¹, Jinhui Tang², Xiaoguang Gu¹, Jintao Li¹, Qi Tian³ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, Nanjing University of Science and Technology², University of Texas at San Antonio³

21 Oct 2013

TL;DR: Topology Preserving Hashing is proposed, a novel hashing method that is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space and is formulated as a generalized eigendecomposition problem with closed form solutions.

...read moreread less

Abstract: Binary hashing has been widely used for efficient similarity search. Learning efficient codes has become a research focus and it is still a challenge. In many cases, the real-world data often lies on a low-dimensional manifold, which should be taken into account to capture meaningful neighbors with hashing. The importance of a manifold is its topology, which represents the neighborhood relationships between its subregions and the relative proximities between the neighbors of each subregion, e.g. the relative ranking of neighbors of each subregion. Most existing hashing methods try to preserve the neighborhood relationships by mapping similar points to close codes, while ignoring the neighborhood rankings. Moreover, most hashing methods lack in providing a good ranking for query results since they use Hamming distance as the similarity metric, and in practice, there are often a lot of results sharing the same distance to a query. In this paper, we propose a novel hashing method to solve these two issues jointly. The proposed method is referred to as Topology Preserving Hashing (TPH). TPH is distinct from prior works by preserving the neighborhood rankings of data points in Hamming space. The learning stage of TPH is formulated as a generalized eigendecomposition problem with closed form solutions. Experimental comparisons with other state-of-the-art methods on three noted image benchmarks demonstrate the efficacy of the proposed method.

...read moreread less

58 citations

Proceedings Article•DOI•

Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification

[...]

Yadong Mu¹, Gang Hua², Wei Fan³, Shih-Fu Chang⁴•Institutions (4)

AT&T Labs¹, Stevens Institute of Technology², Huawei³, Columbia University⁴

23 Jun 2014

TL;DR: A novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems and proposes a novel hashing scheme for arbitrary non- linear kernels via random subspace projection in reproducing kernel Hilbert space.

...read moreread less

Abstract: This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems. Our key idea is to represent each sample with compact hash bits, over which an inner product is defined to serve as the surrogate of the original nonlinear kernels. Then the problem of solving the nonlinear SVM can be transformed into solving a linear SVM over the hash bits. The proposed Hash-SVM enjoys dramatic storage cost reduction owing to the compact binary representation, as well as a (sub-)linear training complexity via linear SVM. As a critical component of Hash-SVM, we propose a novel hashing scheme for arbitrary non-linear kernels via random subspace projection in reproducing kernel Hilbert space. Our comprehensive analysis reveals a well behaved theoretic bound of the deviation between the proposed hashing-based kernel approximation and the original kernel function. We also derive requirements on the hash bits for achieving a satisfactory accuracy level. Several experiments on large-scale visual classification benchmarks are conducted, including one with over 1 million images. The results show that Hash-SVM greatly reduces the computational complexity (more than ten times faster in many cases) while keeping comparable accuracies.

...read moreread less

58 citations

Journal Article•DOI•

Learning to hash: forgiving hash functions and applications

[...]

Shumeet Baluja¹, Michele Covell¹•Institutions (1)

Google¹

01 Dec 2008-Data Mining and Knowledge Discovery

TL;DR: A method to learn a similarity function from only weakly labeled positive examples is described, used as the basis of a hash function to severely constrain the number of points considered for each lookup in a large corpus of high-dimensional data points.

...read moreread less

Abstract: The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data set.

...read moreread less

57 citations

Collapse

Network Information

Performance

Metrics

1,120

Papers

57,460

Citations

No. of papers in the topic in previous years
Year	Papers
2023	33
2022	89
2021	11
2020	16
2019	16
2018	38

Feature hashing

Papers published on a yearly basis

Papers

Trending Questions (2)

Network Information

Related Topics (5)

Performance

Metrics