scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Proceedings ArticleDOI
01 Apr 2017
TL;DR: Experiments on two large public image datasets have shown that the proposed novel hashing method, which is referred to as Partial Random Spherical Hashing (PRSH), outperforms state-of-the-arts in terms of both learning efficiency and retrieval accuracy.
Abstract: Hashing-based searching approaches have been widely employed in large-scale image retrieval tasks. However, most hashing schemes are developed based on hyperplane projection, which may not effectively capture the spatially coherent data structure. More importantly, the existing approaches often compromise between learning efficiency and retrieval accuracy, and can thus barely satisfy the real-time requirements. In this paper, we propose a novel hashing method, which is referred to as Partial Random Spherical Hashing (PRSH), for large-scale image retrieval. First, the images are encoded into a lower hamming space via some randomly generated hyperspheres. Then, a fast learning scheme is adopted to adjust the codes and make them have universal approximation capability to the original image features. The co-play between randomness and learned parameters results in a both efficient and effective learning scheme for constructing hash functions. Experiments on two large public image datasets have shown that our PRSH method outperforms state-of-the-arts in terms of both learning efficiency and retrieval accuracy.

1 citations

Posted Content
TL;DR: This work proposes a much simpler approach to binary hashing, which is faster and trivially parallelizable, but it also improves over the more complex, coupled objective function, and achieves state-of-the-art precision and recall in experiments with image retrieval.
Abstract: Binary hashing is a well-known approach for fast approximate nearest-neighbor search in information retrieval. Much work has focused on affinity-based objective functions involving the hash functions or binary codes. These objective functions encode neighborhood information between data points and are often inspired by manifold learning algorithms. They ensure that the hash functions differ from each other through constraints or penalty terms that encourage codes to be orthogonal or dissimilar across bits, but this couples the binary variables and complicates the already difficult optimization. We propose a much simpler approach: we train each hash function (or bit) independently from each other, but introduce diversity among them using techniques from classifier ensembles. Surprisingly, we find that not only is this faster and trivially parallelizable, but it also improves over the more complex, coupled objective function, and achieves state-of-the-art precision and recall in experiments with image retrieval.

1 citations

Book ChapterDOI
05 Sep 2014
TL;DR: This work presents an advanced spam detection technique (ASDT), based on the extremum characteristic theory, Rabin fingerprint algorithm, modified Bayesian method and optimization theory, which demonstrates that ASDT has the best accuracy, speed and robustness on spam filtering.
Abstract: Nowadays, email spam problems continue growing drastically and many spam detection algorithms have been developed at the same time. However, there are several shortcomings shared by most of these algorithms. In order to solve these shortcomings, we present an advanced spam detection technique(ASDT). It is based on the extremum characteristic theory, Rabin fingerprint algorithm, modified Bayesian method and optimization theory. Then we designed several experiments to evaluate ASDT’s performance, including accuracy, speed and robustness, by comparing them with SFSPH, SFSPH-S, the famous DSC algorithm and the Email Remove-duplicate Algorithm Based on SHA-1(ERABS). Our extensive experiments demonstrated that ASDT has the best accuracy, speed and robustness on spam filtering.

1 citations

Journal ArticleDOI
Braddock Gaskill1
TL;DR: In this article, the use of feature bit vectors using the hashing trick for improving relevance in personalized search and other personalization applications is introduced. But they use a single bit per dimension instead of floating point results in an order of magnitude decrease in data structure size while preserving or even improving quality.
Abstract: Many real world problems require fast and efficient lexical comparison of large numbers of short text strings. Search personalization is one such domain. We introduce the use of feature bit vectors using the hashing trick for improving relevance in personalized search and other personalization applications. We present results of several lexical hashing and comparison methods. These methods are applied to a user's historical behavior and are used to predict future behavior. Using a single bit per dimension instead of floating point results in an order of magnitude decrease in data structure size, while preserving or even improving quality. We use real data to simulate a search personalization task. A simple method for combining bit vectors demonstrates an order of magnitude improvement in compute time on the task with only a small decrease in accuracy.

1 citations

Book ChapterDOI
04 Jan 2017
TL;DR: Experimental results show that, in the set of the queries’ near neighbors obtained by the proposed FCMR, the proportions of relevant documents can be much boosted, and it indicates that the retrieval based on near neighbors can be effectively conducted.
Abstract: Cross-media retrieval is an imperative approach to handle the explosive growth of multimodal data on the web. However, existed approaches to cross-media retrieval are computationally expensive due to the curse of dimensionality. To efficiently retrieve in multimodal data, it is essential to reduce the proportion of irrelevant documents. In this paper, we propose a cross-media retrieval approach (FCMR) based on locality-sensitive hashing (LSH) and neural networks. Multimodal information is projected by LSH algorithm to cluster similar objects into the same hash bucket and dissimilar objects into different ones, using hash functions learned through neural networks. Once given a textual or visual query, it can be efficiently mapped to a hash bucket in which objects stored can be near neighbors of this query. Experimental results show that, in the set of the queries’ near neighbors obtained by the proposed method, the proportions of relevant documents can be much boosted, and it indicates that the retrieval based on near neighbors can be effectively conducted. Further evaluations on two public datasets demonstrate the effectiveness of the proposed retrieval method compared to the baselines.

1 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838