scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Patent
Yadong Mu1, Zhu Liu1
05 Jun 2015
TL;DR: In this paper, the authors proposed a method for training a neural network by iteratively adjusting parameters of the neural network based on concurrent application of multiple loss functions to the subset of images.
Abstract: A method includes receiving, at a neural network, a subset of images of a plurality of images of a training image set. The method includes training the neural network by iteratively adjusting parameters of the neural network based on concurrent application of multiple loss functions to the subset of images. The multiple loss functions include a classification loss function and a hashing loss function. The classification loss function is associated with an image classification function that extracts image features from an image. The hashing loss function is associated with a hashing function that generates a hash code for the image.

24 citations

Proceedings ArticleDOI
TL;DR: In this article, the authors propose an end-to-end approach to reduce the size of the embedding tables by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition.
Abstract: Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller memory cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category's representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters.

24 citations

01 Jan 2004
TL;DR: The finding is that the well-known function for hashing sequence of symbols, ELFhash, is not very good in this regard, and the other two functions are better and thus recommended.
Abstract: Hashing large collection of URLs is an inevitable problem in many Web research activities. Through a large scale experiment, three hash functions are compared in this paper. Two metrics were developed for the comparison, which are related to web structure analysis and Web crawling, respectively. The finding is that the well-known function for hashing sequence of symbols, ELFhash, is not very good in this regard, and the other two functions are better and thus recommended.

24 citations

Proceedings Article
19 Jun 2011
TL;DR: This paper proposes a novel (semi-)supervised hashing method named Semi-Supervised SimHash (S3H) for high-dimensional data similarity search that learns the optimal feature weights from prior knowledge to relocate the data such that similar data have similar hash codes.
Abstract: Searching documents that are similar to a query document is an important component in modern information retrieval. Some existing hashing methods can be used for efficient document similarity search. However, unsupervised hashing methods cannot incorporate prior knowledge for better hashing. Although some supervised hashing methods can derive effective hash functions from prior knowledge, they are either computationally expensive or poorly discriminative. This paper proposes a novel (semi-)supervised hashing method named Semi-Supervised SimHash (S3H) for high-dimensional data similarity search. The basic idea of S3H is to learn the optimal feature weights from prior knowledge to relocate the data such that similar data have similar hash codes. We evaluate our method with several state-of-the-art methods on two large datasets. All the results show that our method gets the best performance.

24 citations

Proceedings ArticleDOI
28 Feb 2017
TL;DR: Experimental results on the three benchmark datasets show that the binary hash codes generated by the proposed method has superior performance gains over other state-of-the-art hashing methods.
Abstract: With the increasing amount of image data, the image retrieval methods have several drawbacks, such as the low expression ability of visual feature, high dimension of feature, low precision of image retrieval and so on To solve these problems, a learning method of binary hashing based on deep convolutional neural networks is proposed The basic idea is to add a hash layer into the deep learning framework and simultaneously learn image features and hash functions which should satisfy independence and quantization error minimized First, convolutional neural network is employed to learn the intrinsic implications of training images so as to improve the distinguish ability and expression ability of visual feature Second, the visual feature is putted into the hash layer, in which hash functions are learned And the learned hash functions should satisfy the classification error and quantization error minimized and the independence constraint Finally, given an input image, hash codes are generated by the output of the hash layer of the proposed framework and large scale image retrieval can be accomplished in low-dimensional hamming space Experimental results on the three benchmark datasets show that the binary hash codes generated by the proposed method has superior performance gains over other state-of-the-art hashing methods

24 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838