scispace - formally typeset
Search or ask a question
Topic

Feature hashing

About: Feature hashing is a research topic. Over the lifetime, 993 publications have been published within this topic receiving 51462 citations.


Papers
More filters
Posted Content
TL;DR: This paper proposes a supervised hashing method based on a well designed deep convolutional neural network, which tries to learn hashing code and compact representations of data simultaneously.
Abstract: Hashing-based methods seek compact and efficient binary codes that preserve the neighborhood structure in the original data space. For most existing hashing methods, an image is first encoded as a vector of hand-crafted visual feature, followed by a hash projection and quantization step to get the compact binary vector. Most of the hand-crafted features just encode the low-level information of the input, the feature may not preserve the semantic similarities of images pairs. Meanwhile, the hashing function learning process is independent with the feature representation, so the feature may not be optimal for the hashing projection. In this paper, we propose a supervised hashing method based on a well designed deep convolutional neural network, which tries to learn hashing code and compact representations of data simultaneously. The proposed model learn the binary codes by adding a compact sigmoid layer before the loss layer. Experiments on several image data sets show that the proposed model outperforms other state-of-the-art methods.

6 citations

Proceedings ArticleDOI
08 Sep 2015
TL;DR: TF-IDF-CF is chosen as the feature selection method and an accuracy of 98.2612 with F-measure 0.9841 is obtained which depicts the effectiveness of proposed scheme.
Abstract: An efficient email spam filtering system by selecting relevant features to reduce the dimensions has become a pivotal aspect in the field of machine learning based spam filtering. To deal with noisy features, TF-IDF-CF is chosen as the feature selection method in this study. The selected relevant feature sets are submitted to LibSVM and MNB classifiers to construct ham and spam models. An accuracy of 98.2612 with F-measure 0.9841 is obtained which depicts the effectiveness of proposed scheme.

6 citations

09 Jan 2014
TL;DR: This study focuses on the second group of hashing algorithms and criticizes the hashing algorithms using Feistel Network which are widely utilized in text mining studies and proposes a new approach which is mainly built on the substitution boxes (sboxes) and processes the text faster than the other implementations.
Abstract: This study focuses on the second group of hashing algorithms and criticizes the hashing algorithms using Feistel Networkwhich are widely utilized in text mining studies. We propose a new approach which is mainly built on the substitution boxes (sboxes),which is in the core of all Feistel Networks and processes the text faster than the other implementations.

6 citations

Proceedings ArticleDOI
14 Jul 2014
TL;DR: This paper proposes a cross-media hashing approach based on kernel regression (abbreviated as KRCMH) to obtain the hash codes for the data objects across different modalities and achieves superior cross- media retrieval performance comparing with the state-of-the-art methods.
Abstract: Cross-media retrieval is a challenging problem in multimedia retrieval area. In the real-world, many applications involve multi-modal data, e.g., web pages containing both images and texts. How to utilize the intrinsic intra-modality and inter-modality similarity to learn the appropriate relationships of the data objects and provide efficient search across different modalities is the core of cross-media retrieval. Inspired by the fact that hashing methods well address the fast retrieval problem in the large-scale data settings, designing a cross-media hashing approach which can perform efficient retrieval over heterogenous high-dimensional feature spaces is highly desirable. In this paper, we propose a cross-media hashing approach based on kernel regression (abbreviated as KRCMH) to obtain the hash codes for the data objects across different modalities. The experiments on two real-world data sets show that KRCMH achieves superior cross-media retrieval performance comparing with the state-of-the-art methods.

6 citations

Book ChapterDOI
18 Sep 2005
TL;DR: This work investigates the class of hash functions based on checksums to encode the type signatures of MPI datatype and finds that hash functionsbased on Galois Field enables good hashing, computation of the signature of unidatatype in $\mathcal{O}$(1) and computation ofThe concatenation of two datatypes in $\ mathcal{ O}$ (1) additionally.
Abstract: Detecting misuse of datatypes in an application code is a desirable feature for an MPI library. To support this goal we investigate the class of hash functions based on checksums to encode the type signatures of MPI datatype. The quality of these hash functions is assessed in terms of hashing, timing and comparing to other functions published for this particular problem (Gropp, 7th European PVM/MPI Users’ Group Meeting, 2000) or for other applications (CRCs). In particular hash functions based on Galois Field enables good hashing, computation of the signature of unidatatype in $\mathcal{O}$(1) and computation of the concatenation of two datatypes in $\mathcal{O}$(1) additionally.

6 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Convolutional neural network
74.7K papers, 2M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202289
202111
202016
201916
201838