scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 2018"


Journal ArticleDOI
TL;DR: Compared with state-of-the-art approaches, SSDH achieves higher retrieval accuracy, while the classification performance is not sacrificed; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets.
Abstract: This paper presents a simple yet effective supervised deep hash approach that constructs binary hash codes from labeled data for large-scale image search. We assume that the semantic labels are governed by several latent attributes with each attribute on or off , and classification relies on these attributes. Based on this assumption, our approach, dubbed supervised semantics-preserving deep hashing (SSDH), constructs hash functions as a latent layer in a deep network and the binary codes are learned by minimizing an objective function defined over classification error and other desirable hash codes properties. With this design, SSDH has a nice characteristic that classification and retrieval are unified in a single learning model. Moreover, SSDH performs joint learning of image representations, hash codes, and classification in a point-wised manner, and thus is scalable to large-scale datasets. SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets. Compared with state-of-the-art approaches, SSDH achieves higher retrieval accuracy, while the classification performance is not sacrificed.

273 citations


Journal ArticleDOI
TL;DR: Extensive experiments show that the proposed remote sensing image retrieval approach based on DHNNs can remarkably outperform state-of-the-art methods under both of the examined conditions.
Abstract: As one of the most challenging tasks of remote sensing big data mining, large-scale remote sensing image retrieval has attracted increasing attention from researchers. Existing large-scale remote sensing image retrieval approaches are generally implemented by using hashing learning methods, which take handcrafted features as inputs and map the high-dimensional feature vector to the low-dimensional binary feature vector to reduce feature-searching complexity levels. As a means of applying the merits of deep learning, this paper proposes a novel large-scale remote sensing image retrieval approach based on deep hashing neural networks (DHNNs). More specifically, DHNNs are composed of deep feature learning neural networks and hashing learning neural networks and can be optimized in an end-to-end manner. Rather than requiring to dedicate expertise and effort to the design of feature descriptors, we can automatically learn good feature extraction operations and feature hashing mapping under the supervision of labeled samples. To broaden the application field, DHNNs are evaluated under two representative remote sensing cases: scarce and sufficient labeled samples. To make up for a lack of labeled samples, DHNNs can be trained via transfer learning for the former case. For the latter case, DHNNs can be trained via supervised learning from scratch with the aid of a vast number of labeled samples. Extensive experiments on one public remote sensing image data set with a limited number of labeled samples and on another public data set with plenty of labeled samples show that the proposed remote sensing image retrieval approach based on DHNNs can remarkably outperform state-of-the-art methods under both of the examined conditions.

232 citations


Journal ArticleDOI
TL;DR: This paper proposes a new learning-based hashing method called "fast supervised discrete hashing" (FSDH) based on “supervised discrete hashing” (SDH), which uses a very simple yet effective regression of the class labels of training examples to the corresponding hash code to accelerate the algorithm.
Abstract: Learning-based hashing algorithms are “hot topics” because they can greatly increase the scale at which existing methods operate. In this paper, we propose a new learning-based hashing method called “fast supervised discrete hashing” (FSDH) based on “supervised discrete hashing” (SDH). Regressing the training examples (or hash code) to the corresponding class labels is widely used in ordinary least squares regression. Rather than adopting this method, FSDH uses a very simple yet effective regression of the class labels of training examples to the corresponding hash code to accelerate the algorithm. To the best of our knowledge, this strategy has not previously been used for hashing. Traditional SDH decomposes the optimization into three sub-problems, with the most critical sub-problem - discrete optimization for binary hash codes - solved using iterative discrete cyclic coordinate descent (DCC), which is time-consuming. However, FSDH has a closed-form solution and only requires a single rather than iterative hash code-solving step, which is highly efficient. Furthermore, FSDH is usually faster than SDH for solving the projection matrix for least squares regression, making FSDH generally faster than SDH. For example, our results show that FSDH is about 12-times faster than SDH when the number of hashing bits is 128 on the CIFAR-10 data base, and FSDH is about 151-times faster than FastHash when the number of hashing bits is 64 on the MNIST data-base. Our experimental results show that FSDH is not only fast, but also outperforms other comparative methods.

223 citations


Journal ArticleDOI
TL;DR: Quantization-based Hashing (QBH) is a generic framework which incorporates the advantages of quantization error reduction methods into conventional property preserving hashing methods and can be applied to both unsupervised and supervised hashing methods.

179 citations


Journal ArticleDOI
TL;DR: A deep hashing method is proposed to extensively exploit both spatial details and semantic information, in which, it leverage hierarchical convolutional features to construct image pyramid representation and a new loss function is proposed that maintains the semantic similarity and balanceable property of hash codes.
Abstract: Hashing has been an important and effective technology in image retrieval due to its computational efficiency and fast search speed. The traditional hashing methods usually learn hash functions to obtain binary codes by exploiting hand-crafted features, which cannot optimally represent the information of the sample. Recently, deep learning methods can achieve better performance, since deep learning architectures can learn more effective image representation features. However, these methods only use semantic features to generate hash codes by shallow projection but ignore texture details. In this paper, we proposed a novel hashing method, namely hierarchical recurrent neural hashing (HRNH), to exploit hierarchical recurrent neural network to generate effective hash codes. There are three contributions of this paper. First, a deep hashing method is proposed to extensively exploit both spatial details and semantic information, in which, we leverage hierarchical convolutional features to construct image pyramid representation. Second, our proposed deep network can exploit directly convolutional feature maps as input to preserve the spatial structure of convolutional feature maps. Finally, we propose a new loss function that considers the quantization error of binarizing the continuous embeddings into the discrete binary codes, and simultaneously maintains the semantic similarity and balanceable property of hash codes. Experimental results on four widely used data sets demonstrate that the proposed HRNH can achieve superior performance over other state-of-the-art hashing methods.

118 citations


Journal ArticleDOI
TL;DR: This work proposes a novel supervised hashing approach, termed as Robust Discrete Code Modeling (RDCM), which directly learns high-quality discretebinary codes and hash functions by effectively suppressing the influence of unreliable binary codes and potentially noisily-labeled samples.

101 citations


Journal ArticleDOI
TL;DR: A novel hashing model is proposed to efficiently learn robust discrete binary codes, which is referred as Robust and Flexible Discrete Hashing (RFDH), which is directly learned based on discrete matrix decomposition so that the large quantization error caused by relaxation is avoided.
Abstract: Multimodal hashing approaches have gained great success on large-scale cross-modal similarity search applications, due to their appealing computation and storage efficiency. However, it is still a challenge work to design binary codes to represent the original features with good performance in an unsupervised manner. We argue that there are some limitations that need to be further considered for unsupervised multimodal hashing: 1) most existing methods drop the discrete constraints to simplify the optimization, which will cause large quantization error; 2) many methods are sensitive to outliers and noises since they use $\ell _{2}$ -norm in their objective functions which can amplify the errors; and 3) the weight of each modality, which greatly influences the retrieval performance, is manually or empirically determined and may not fully fit the specific training set. The above limitations may significantly degrade the retrieval accuracy of unsupervised multimodal hashing methods. To address these problems, in this paper, a novel hashing model is proposed to efficiently learn robust discrete binary codes, which is referred as Robust and Flexible Discrete Hashing (RFDH). In the proposed RFDH model, binary codes are directly learned based on discrete matrix decomposition, so that the large quantization error caused by relaxation is avoided. Moreover, the $\ell _{2,1}$ -norm is used in the objective function to improve the robustness, such that the learned model is not sensitive to data outliers and noises. In addition, the weight of each modality is adaptively adjusted according to training data. Hence the important modality will get large weights during the hash learning procedure. Owing to above merits of RFDH, it can generate more effective hash codes. Besides, we introduce two kinds of hash function learning methods to project unseen instances into hash codes. Extensive experiments on several well-known large databases demonstrate superior performance of the proposed hash model over most state-of-the-art unsupervised multimodal hashing methods.

94 citations


Journal ArticleDOI
TL;DR: Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible.
Abstract: Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data, such as documents, images, and videos. In this paper, we propose a novel learning-based hashing method called “supervised discrete hashing with relaxation” (SDHR) based on “supervised discrete hashing” (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words), thus fixing the regression target. In SDHR, the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible. As expected, SDHR generally outperforms SDH. Experimental results on two large-scale image data sets (CIFAR-10 and MNIST) and a large-scale and challenging face data set (FRGC) demonstrate the effectiveness and efficiency of SDHR.

83 citations


Journal ArticleDOI
TL;DR: A novel adaptive similarity measure which is consistent with k-nearest neighbor search is presented, and it is proved that it leads to a valid kernel if the original similarity function is a kernel function.

76 citations


Journal ArticleDOI
TL;DR: This work proposes a novel supervised hashing method for scalable face image retrieval, i.e., Deep Hashing based on Classification and Quantization errors (DHCQ), by simultaneously learning feature representations of images, hash codes and classifiers.

76 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed robust image hashing scheme has better performances with respect to robustness, anti-collision, and efficiency than some of state-of-the-art schemes.

Journal ArticleDOI
TL;DR: The proposed method uses perceptual hashing to binarize low-level feature maps and combines several feature channels for feature encoding and three regional statistics are computed for hierarchical feature description.

Proceedings ArticleDOI
27 May 2018
TL;DR: The Weight-Median Sketch as mentioned in this paper adopts the core data structure used in the Count-Sketch, but instead of sketching counts, it captures sketched gradient updates to the model parameters.
Abstract: We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. Unlike related sketches that capture the most frequently-occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median Sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis that establishes recovery guarantees for batch and online learning, and demonstrate empirical improvements in memory-accuracy trade-offs over alternative memory-budgeted methods, including count-based sketches and feature hashing.

Journal ArticleDOI
TL;DR: R reversed hashing (ReSH) is introduced, which is spectral hashing with its input and output interchanged, and can seamlessly overcome the drawback of SH.
Abstract: Hashing is emerging as a powerful tool for building highly efficient indices in large-scale search systems. In this paper, we study spectral hashing (SH), which is a classical method of unsupervised hashing. In general, SH solves for the hash codes by minimizing an objective function that tries to preserve the similarity structure of the data given. Although computationally simple, very often SH performs unsatisfactorily and lags distinctly behind the state-of-the-art methods. We observe that the inferior performance of SH is mainly due to its imperfect formulation; that is, the optimization of the minimization problem in SH actually cannot ensure that the similarity structure of the high-dimensional data is really preserved in the low-dimensional hash code space. In this paper, we, therefore, introduce reversed SH (ReSH), which is SH with its input and output interchanged. Unlike SH, which estimates the similarity structure from the given high-dimensional data, our ReSH defines the similarities between data points according to the unknown low-dimensional hash codes. Equipped with such a reversal mechanism, ReSH can seamlessly overcome the drawback of SH. More precisely, the minimization problem in our ReSH can be optimized if and only if similar data points are mapped to adjacent hash codes, and mostly important, dissimilar data points are considerably separated from each other in the code space. Finally, we solve the minimization problem in ReSH by multilayer neural networks and obtain state-of-the-art retrieval results on three benchmark data sets.

Journal ArticleDOI
TL;DR: A novel discrete supervised hash learning framework that can be scalable to large-scale data sets of various types and provides a flexible paradigm to incorporate with arbitrary hash function, including deep neural networks and kernel methods, as well as any types of data to hash.
Abstract: The hashing method maps similar data of various types to binary hashcodes with smaller hamming distance, and it has received broad attention due to its low-storage cost and fast retrieval speed However, the existing limitations make the present algorithms difficult to deal with for large-scale data sets: 1) discrete constraints are involved in the learning of the hash function and 2) pairwise or triplet similarity is adopted to generate efficient hashcodes, resulting in both time and space complexity greater than $O(n^{2})$ To address these issues, we propose a novel discrete supervised hash learning framework that can be scalable to large-scale data sets of various types First, the discrete learning procedure is decomposed into a binary classifier learning scheme and binary codes learning scheme, which makes the learning procedure more efficient Second, by adopting the asymmetric low-rank matrix factorization , we propose the fast clustering-based batch coordinate descent method, such that the time and space complexity are reduced to $O(n)$ The proposed framework also provides a flexible paradigm to incorporate with arbitrary hash function, including deep neural networks and kernel methods, as well as any types of data to hash, including images and videos Experiments on large-scale data sets demonstrate that the proposed method is superior or comparable with the state-of-the-art hashing algorithms

Journal ArticleDOI
Kun Ding1, Chunlei Huo1, Bin Fan1, Shiming Xiang1, Chunhong Pan1 
TL;DR: This paper developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique, and could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.
Abstract: Hashing-based semantic similarity search is becoming increasingly important for building large-scale content-based retrieval system. The state-of-the-art supervised hashing techniques use flexible two-step strategy to learn hash functions. The first step learns binary codes for training data by solving binary optimization problems with millions of variables, thus usually requiring intensive computations. Despite simplicity and efficiency, locality-sensitive hashing (LSH) has never been recognized as a good way to generate such codes due to its poor performance in traditional approximate neighbor search. We claim in this paper that the true merit of LSH lies in transforming the semantic labels to obtain the binary codes, resulting in an effective and efficient two-step hashing framework. Specifically, we developed the locality-sensitive two-step hashing (LS-TSH) that generates the binary codes through LSH rather than any complex optimization technique. Theoretically, with proper assumption, LS-TSH is actually a useful LSH scheme, so that it preserves the label-based semantic similarity and possesses sublinear query complexity for hash lookup. Experimentally, LS-TSH could obtain comparable retrieval accuracy with state of the arts with two to three orders of magnitudes faster training speed.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a robust image hashing construction method by using the location-context information of the features and constructed hash is attached to the image before transmission and it can be used for analyzing at destination to filter out the geometric transformations occurred in the received image.
Abstract: The main problem addressed in this paper is the robust tamper detection of the image received in a transmission under various content-preserving attacks To this aim the progressive feature point selection method is proposed to extract the feature points of high robustness; with which, the local feature and color feature are then generated for each feature point Afterwards, the robust image hashing construction method is proposed by using the location-context information of the features The constructed hash is attached to the image before transmission and it can be used for analyzing at destination to filter out the geometric transformations occurred in the received image After image restoration, the similarity of the global hashes between the source image and restored image is calculated to determine whether the received image has the same contents as the trusted one or has been maliciously tampered When the received image being judged as a tampered image, the hashes calculated with the proposed Horizontal Location-Context Hashing (HLCH) and Vertical Location-Context Hashing (VLCH) methods will be used to locate the tampered regions Experimental results on different images with tampering of arbitrary size and location demonstrate that our image authentication and tampering localization scheme are superior to the state-of-the-art methods under various attacks

Journal ArticleDOI
TL;DR: A bagging–boosting-based semi-supervised multi-hashing with query-adaptive re-ranking (BBSHR) is proposed, which yields better precision and recall rates for given numbers of hash tables and bits.

Journal ArticleDOI
TL;DR: A generic hashing framework with a new linear pairwise distance preserving objective and pointwise constraint, which achieves consistent improvement over the state-of-the-art unsupervised hashing methods, and is validated on four large-scale benchmark data sets.
Abstract: Binary hashing approaches the approximate nearest neighbor search problem by transferring the data to Hamming space with explicit or implicit distance preserving constraint. With compact data representation, binary hashing identifies the approximate nearest neighbors via very efficient Hamming distance computation. In this paper, we propose a generic hashing framework with a new linear pairwise distance preserving objective and pointwise constraint. In our framework, the direct distance preserving objective aims to keep the linear relationship between the Euclidean distance and the Hamming distance of data points. On the other hand, to impose the pointwise constraint, we instantiate the framework from three different perspectives with pseudo-supervised, unsupervised, and supervised clues and obtain three different hashing methods. The first one is a pseudo-supervised hashing method, which adopts a certain existing unsupervised hashing method to generate binary codes as pseudo-supervised information. For the second one, we get an unsupervised hashing method by considering the quantization loss. The third one, as a supervised hashing method, learns the hash functions in a two-step paradigm. Furthermore, we improve the above-mentioned framework by constraining the global scope of the proposed linear distance preserving objective to a local range. We validate our framework on four large-scale benchmark data sets. The experiments demonstrate that our pseudo-supervised method achieves consistent improvement over the state-of-the-art unsupervised hashing methods, while our unsupervised and supervised methods achieve promising performance compared with the state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: In this paper, an unsupervised domain adaptation model is proposed to learn hash codes from training images belonging to seen classes, which can efficiently encode images of unseen classes to binary codes.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper proposes and analyzes several evolving naive Bayes classification algorithms, based on the well-known count-min sketch, in order to minimize the space needed to store the training data and includes the hashing trick, a technique for dimensionality reduction, to compress that down to a lower dimensional space.
Abstract: A well-known learning task in big data stream mining is classification. Extensively studied in the offline setting, in the streaming setting – where data are evolving and even infinite – it is still a challenge. In the offline setting, training needs to store all the data in memory for the learning task; yet, in the streaming setting, this is impossible to do due to the massive amount of data that is generated in real-time. To cope with these resource issues, this paper proposes and analyzes several evolving naive Bayes classification algorithms, based on the well-known count-min sketch, in order to minimize the space needed to store the training data. The proposed algorithms also adapt concept drift approaches, such as ADWIN, to deal with the fact that streaming data may be evolving and change over time. However, handling sparse, very high-dimensional data in such framework is highly challenging. Therefore, we include the hashing trick, a technique for dimensionality reduction, to compress that down to a lower dimensional space, which leads to a large memory saving.We give a theoretical analysis which demonstrates that our proposed algorithms provide a similar accuracy quality to the classical big data stream mining algorithms using a reasonable amount of resources. We validate these theoretical results by an extensive evaluation on both synthetic and real-world datasets.

01 Jan 2018
TL;DR: Machine learning methods can be used for solving important binary classification tasks in domains such as display advertising and recommender systems as mentioned in this paper, where categorical features have been used for classification tasks.
Abstract: Machine learning methods can be used for solving important binary classification tasks in domains such as display advertising and recommender systems. In many of these domains categorical features ...

Proceedings Article
01 Jan 2018
TL;DR: MISSION is presented, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure.
Abstract: Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions of dimensions. At such scale, even storing the feature vector is impossible, causing most existing feature selection methods to fail. Workarounds like feature hashing, a standard approach to large-scale machine learning, helps with the computational feasibility, but at the cost of losing the interpretability of features. In this paper, we present MISSION, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure. MISSION retains the simplicity of feature hashing without sacrificing the interpretability of the features while using only O(log^2(p)) working memory. We demonstrate that MISSION accurately and efficiently performs feature selection on real-world, large-scale datasets with billions of dimensions.

Proceedings ArticleDOI
01 Jul 2018
TL;DR: This paper proposes a novel algorithm called node2hash based on feature hashing for generating node embeddings that shows a competitive performance on multi-class node classification and link prediction tasks on three real-world networks from various domains.
Abstract: The goal of network representation learning is to embed nodes so as to encode the proximity structures of a graph into a continuous low-dimensional feature space. In this paper, we propose a novel algorithm called node2hash based on feature hashing for generating node embeddings. This approach follows the encoder-decoder framework. There are two main mapping functions in this framework. The first is an encoder to map each node into high-dimensional vectors. The second is a decoder to hash these vectors into a lower dimensional feature space. More specifically, we firstly derive a proximity measurement called expected distance as target which combines position distribution and co-occurrence statistics of nodes over random walks so as to build a proximity matrix, then introduce a set of T different hash functions into feature hashing to generate uniformly distributed vector representations of nodes from the proximity matrix. Compared with the existing state-of-the-art network representation learning approaches, node2hash shows a competitive performance on multi-class node classification and link prediction tasks on three real-world networks from various domains.

Posted Content
TL;DR: MISSION as discussed by the authors is a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure.
Abstract: Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions of dimensions. At such scale, even storing the feature vector is impossible, causing most existing feature selection methods to fail. Workarounds like feature hashing, a standard approach to large-scale machine learning, helps with the computational feasibility, but at the cost of losing the interpretability of features. In this paper, we present MISSION, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure. MISSION retains the simplicity of feature hashing without sacrificing the interpretability of the features while using only O(log^2(p)) working memory. We demonstrate that MISSION accurately and efficiently performs feature selection on real-world, large-scale datasets with billions of dimensions.

Journal ArticleDOI
TL;DR: Empirical results show that the proposed unified and concise unsupervised hashing framework, called binary multidimensional scaling, outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.
Abstract: Hashing is a useful technique for fast nearest neighbor search due to its low storage cost and fast query speed. Unsupervised hashing aims at learning binary hash codes for the original features so that the pairwise distances can be best preserved. While several works have targeted on this task, the results are not satisfactory mainly due to the over-simplified model. In this paper, we propose a unified and concise unsupervised hashing framework, called binary multidimensional scaling , which is able to learn the hash code for distance preservation in both batch and online mode. In the batch mode, unlike most existing hashing methods, we do not need to simplify the model by predefining the form of hash map. Instead, we learn the binary codes directly based on the pairwise distances among the normalized original features by alternating minimization. This enables a stronger expressive power of the hash map. In the online mode, we consider the holistic distance relationship between current query example and those we have already learned, rather than only focusing on current data chunk. It is useful when the data come in a streaming fashion. Empirical results show that while being efficient for training, our algorithm outperforms state-of-the-art methods by a large margin in terms of distance preservation, which is practical for real-world applications.

Proceedings ArticleDOI
02 Sep 2018
TL;DR: In this paper, parameter quantization and perfect feature hashing are used to compress NLU models, which achieves a 14-fold reduction in memory usage compared to the original models with minimal predictive performance impact.
Abstract: In this paper we investigate statistical model compression applied to natural language understanding (NLU) models. Small-footprint NLU models are important for enabling offline systems on hardware restricted devices, and for decreasing on-demand model loading latency in cloud-based systems. To compress NLU models, we present two main techniques, parameter quantization and perfect feature hashing. These techniques are complementary to existing model pruning strategies such as L1 regularization. We performed experiments on a large scale NLU system. The results show that our approach achieves 14-fold reduction in memory usage compared to the original models with minimal predictive performance impact.

Posted Content
TL;DR: To compress NLU models, two main techniques are presented, parameter quantization and perfect feature hashing, complementary to existing model pruning strategies such as L1 regularization.
Abstract: In this paper we investigate statistical model compression applied to natural language understanding (NLU) models. Small-footprint NLU models are important for enabling offline systems on hardware restricted devices, and for decreasing on-demand model loading latency in cloud-based systems. To compress NLU models, we present two main techniques, parameter quantization and perfect feature hashing. These techniques are complementary to existing model pruning strategies such as L1 regularization. We performed experiments on a large scale NLU system. The results show that our approach achieves 14-fold reduction in memory usage compared to the original models with minimal predictive performance impact.

Journal ArticleDOI
TL;DR: This paper proposes a Multiple Hierarchical Deep Hashing (MHDH) approach for large scale image retrieval that seeks to integrate multiple hierarchical non-linear transformations with hidden neural network layer for hashing code generation.
Abstract: Learning-based hashing methods are becoming the mainstream for large scale visual search. They consist of two main components: hash codes learning for training data and hash functions learning for encoding new data points. The performance of a content-based image retrieval system crucially depends on the feature representation, and currently Convolutional Neural Networks (CNNs) has been proved effective for extracting high-level visual features for large scale image retrieval. In this paper, we propose a Multiple Hierarchical Deep Hashing (MHDH) approach for large scale image retrieval. Moreover, MHDH seeks to integrate multiple hierarchical non-linear transformations with hidden neural network layer for hashing code generation. The learned binary codes represent potential concepts that connect to class labels. In addition, extensive experiments on two popular datasets demonstrate the superiority of our MHDH over both supervised and unsupervised hashing methods.

Proceedings ArticleDOI
01 Jul 2018
TL;DR: Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.
Abstract: Feature hashing is widely used to process large scale sparse features for learning of predictive models. Collisions inherently happen in the hashing process and hurt the model performance. In this paper, we develop a feature hashing scheme called Cuckoo Feature Hashing(CCFH) based on the principle behind Cuckoo hashing, a hashing scheme designed to resolve collisions. By providing multiple possible hash locations for each feature, CCFH prevents the collisions between predictive features by dynamically hashing them into alternative locations during model training. Experimental results on prediction tasks with hundred-millions of features demonstrate that CCFH can achieve the same level of performance by using only 15%-25% parameters compared with conventional feature hashing.