scispace - formally typeset
Search or ask a question

Showing papers on "Feature hashing published in 2016"


Proceedings Article
12 Feb 2016
TL;DR: A novel Deep Hashing Network (DHN) architecture for supervised hashing is proposed, in which good image representation tailored to hash coding and formally control the quantization error are jointly learned.
Abstract: Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the semantic similarity on data pairs, has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of hand-crafted or machine-learned features, followed by another separate quantization step that generates binary codes. However, suboptimal hash coding may be produced, because the quantization error is not statistically minimized and the feature representation is not optimally compatible with the binary coding. In this paper, we propose a novel Deep Hashing Network (DHN) architecture for supervised hashing, in which we jointly learn good image representation tailored to hash coding and formally control the quantization error. The DHN model constitutes four key components: (1) a subnetwork with multiple convolution-pooling layers to capture image representations; (2) a fully-connected hashing layer to generate compact binary hash codes; (3) a pairwise cross-entropy loss layer for similarity-preserving learning; and (4) a pairwise quantization loss for controlling hashing quality. Extensive experiments on standard image retrieval datasets show the proposed DHN model yields substantial boosts over latest state-of-the-art hashing methods.

588 citations


Proceedings Article
12 Feb 2016
TL;DR: A novel method, called column sampling based discrete supervised hashing (COSDISH), to directly learn the discrete hashing code from semantic information and can outperform the state-of-the-art methods in real applications like image retrieval.
Abstract: By leveraging semantic (label) information, supervised hashing has demonstrated better accuracy than unsupervised hashing in many real applications. Because the hashing-code learning problem is essentially a discrete optimization problem which is hard to solve, most existing supervised hashing methods try to solve a relaxed continuous optimization problem by dropping the discrete constraints. However, these methods typically suffer from poor performance due to the errors caused by the relaxation. Some other methods try to directly solve the discrete optimization problem. However, they are typically time-consuming and unscalable. In this paper, we propose a novel method, called column sampling based discrete supervised hashing (COSDISH), to directly learn the discrete hashing code from semantic information. COSDISH is an iterative method, in each iteration of which several columns are sampled from the semantic similarity matrix and then the hashing code is decomposed into two parts which can be alternately optimized in a discrete way. Theoretical analysis shows that the learning (optimization) algorithm of COSDISH has a constant-approximation bound in each step of the alternating optimization procedure. Empirical results on datasets with semantic labels illustrate that COSDISH can outperform the state-of-the-art methods in real applications like image retrieval.

260 citations


Proceedings ArticleDOI
13 Aug 2016
TL;DR: A new Deep Visual-Semantic Hashing model that generates compact hash codes of images and sentences in an end-to-end deep learning architecture, which capture the intrinsic cross-modal correspondences between visual data and natural language.
Abstract: Due to the storage and retrieval efficiency, hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval Cross-modal hashing, which enables efficient retrieval of images in response to text queries or vice versa, has received increasing attention recently Most existing work on cross-modal hashing does not capture the spatial dependency of images and temporal dynamics of text sentences for learning powerful feature representations and cross-modal embeddings that mitigate the heterogeneity of different modalities This paper presents a new Deep Visual-Semantic Hashing (DVSH) model that generates compact hash codes of images and sentences in an end-to-end deep learning architecture, which capture the intrinsic cross-modal correspondences between visual data and natural language DVSH is a hybrid deep architecture that constitutes a visual-semantic fusion network for learning joint embedding space of images and text sentences, and two modality-specific hashing networks for learning hash functions to generate compact binary codes Our architecture effectively unifies joint multimodal embedding and cross-modal hashing, which is based on a novel combination of Convolutional Neural Networks over images, Recurrent Neural Networks over sentences, and a structured max-margin objective that integrates all things together to enable learning of similarity-preserving and high-quality hash codes Extensive empirical evidence shows that our DVSH approach yields state of the art results in cross-modal retrieval experiments on image-sentences datasets, ie standard IAPR TC-12 and large-scale Microsoft COCO

254 citations


Proceedings Article
Yue Cao1, Mingsheng Long1, Jianmin Wang1, Han Zhu1, Qingfu Wen1 
12 Feb 2016
TL;DR: A novel Deep Quantization Network architecture for supervised hashing is proposed, which learns image representation for hash coding and formally control the quantization error and yields substantial boosts over latest state-of-the-art hashing methods.
Abstract: Hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing improves the quality of hash coding by exploiting the semantic similarity on data pairs and has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of hand-crafted or machine-learned features, then quantized by a separate quantization step that generates binary codes. However, suboptimal hash coding may be produced, since the quantization error is not statistically minimized and the feature representation is not optimally compatible with the hash coding. In this paper, we propose a novel Deep Quantization Network (DQN) architecture for supervised hashing, which learns image representation for hash coding and formally control the quantization error. The DQN model constitutes four key components: (1) a sub-network with multiple convolution-pooling layers to capture deep image representations; (2) a fully connected bottleneck layer to generate dimension-reduced representation optimal for hash coding; (3) a pairwise cosine loss layer for similarity-preserving learning; and (4) a product quantization loss for controlling hashing quality and the quantizability of bottleneck representation. Extensive experiments on standard image retrieval datasets show the proposed DQN model yields substantial boosts over latest state-of-the-art hashing methods.

248 citations


Journal ArticleDOI
TL;DR: This work incorporates ring partition and invariant vector distance to image hashing algorithm for enhancing rotation robustness and discriminative capability, and demonstrates that the proposed hashing algorithm is robust at commonly used digital operations to images.
Abstract: Robustness and discrimination are two of the most important objectives in image hashing. We incorporate ring partition and invariant vector distance to image hashing algorithm for enhancing rotation robustness and discriminative capability. As ring partition is unrelated to image rotation, the statistical features that are extracted from image rings in perceptually uniform color space, i.e., CIE L*a*b* color space, are rotation invariant and stable. In particular, the Euclidean distance between vectors of these perceptual features is invariant to commonly used digital operations to images (e.g., JPEG compression, gamma correction, and brightness/contrast adjustment), which helps in making image hash compact and discriminative. We conduct experiments to evaluate the efficiency with 250 color images, and demonstrate that the proposed hashing algorithm is robust at commonly used digital operations to images. In addition, with the receiver operating characteristics curve, we illustrate that our hashing is much better than the existing popular hashing algorithms at robustness and discrimination.

192 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.
Abstract: By transforming data into binary representation, i.e., Hashing, we can perform high-speed search with low storage cost, and thus, Hashing has collected increasing research interest in the recent years. Recently, how to generate Hashcode for multimodal data (e.g., images with textual tags, documents with photos, and so on) for large-scale cross-modality search (e.g., searching semantically related images in database for a document query) is an important research issue because of the fast growth of multimodal data in the Web. To address this issue, a novel framework for multimodal Hashing is proposed, termed as Collective Matrix Factorization Hashing (CMFH). The key idea of CMFH is to learn unified Hashcodes for different modalities of one multimodal instance in the shared latent semantic space in which different modalities can be effectively connected. Therefore, accurate cross-modality search is supported. Based on the general framework, we extend it in the unsupervised scenario where it tries to preserve the Euclidean structure, and in the supervised scenario where it fully exploits the label information of data. The corresponding theoretical analysis and the optimization algorithms are given. We conducted comprehensive experiments on three benchmark data sets for cross-modality search. The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.

181 citations


Journal ArticleDOI
TL;DR: This work proposes a cross-modal hashing method based on collective matrix factorization, which considers both the label consistency across different modalities and the local geometric consistency in each modality, leading to a substantial improvement on the discriminative power of latent semantic features obtained by collective Matrix factorization.
Abstract: The target of cross-modal hashing is to embed heterogeneous multimedia data into a common low-dimensional Hamming space, which plays a pivotal part in multimedia retrieval due to the emergence of big multimodal data. Recently, matrix factorization has achieved great success in cross-modal hashing. However, how to effectively use label information and local geometric structure is still a challenging problem for these approaches. To address this issue, we propose a cross-modal hashing method based on collective matrix factorization, which considers both the label consistency across different modalities and the local geometric consistency in each modality. These two elements are formulated as a graph Laplacian term in the objective function, leading to a substantial improvement on the discriminative power of latent semantic features obtained by collective matrix factorization. Moreover, the proposed method learns unified hash codes for different modalities of an instance to facilitate cross-modal search, and the objective function is solved using an iterative strategy. The experimental results on two benchmark data sets show the effectiveness of the proposed method and its superiority over state-of-the-art cross-modal hashing methods.

171 citations


Book ChapterDOI
20 Nov 2016
TL;DR: Wang et al. as discussed by the authors proposed a triplet label based deep hashing method which aims to maximize the likelihood of the given triplet labels, which outperforms all the baselines on CIFAR-10 and NUS-WIDE datasets.
Abstract: Hashing is one of the most popular and powerful approximate nearest neighbor search techniques for large-scale image retrieval. Most traditional hashing methods first represent images as off-the-shelf visual features and then produce hashing codes in a separate stage. However, off-the-shelf visual features may not be optimally compatible with the hash code learning procedure, which may result in sub-optimal hash codes. Recently, deep hashing methods have been proposed to simultaneously learn image features and hash codes using deep neural networks and have shown superior performance over traditional hashing methods. Most deep hashing methods are given supervised information in the form of pairwise labels or triplet labels. The current state-of-the-art deep hashing method DPSH [1], which is based on pairwise labels, performs image feature learning and hash code learning simultaneously by maximizing the likelihood of pairwise similarities. Inspired by DPSH [1], we propose a triplet label based deep hashing method which aims to maximize the likelihood of the given triplet labels. Experimental results show that our method outperforms all the baselines on CIFAR-10 and NUS-WIDE datasets, including the state-of-the-art method DPSH [1] and all the previous triplet label based deep hashing methods.

151 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: Zhang et al. as discussed by the authors proposed a zero-shot hashing (ZSH) method, which compresses images of unseen categories to binary codes with hash functions learned from limited training data of seen categories.
Abstract: Hashing has shown its efficiency and effectiveness in facilitating large-scale multimedia applications. Supervised knowledge (\emph{e.g.}, semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions. However, confronted with the rapid growth of newly-emerging concepts and multimedia data on the Web, existing supervised hashing approaches may easily suffer from the scarcity and validity of supervised information due to the expensive cost of manual labelling. In this paper, we propose a novel hashing scheme, termed \emph{zero-shot hashing} (ZSH), which compresses images of "unseen" categories to binary codes with hash functions learned from limited training data of "seen" categories. Specifically, we project independent data labels (i.e., 0/1-form label vectors) into semantic embedding space, where semantic relationships among all the labels can be precisely characterized and thus seen supervised knowledge can be transferred to unseen classes. Moreover, in order to cope with the semantic shift problem, we rotate the embedded space to more suitably align the embedded semantics with the low-level visual feature space, thereby alleviating the influence of semantic gap. In the meantime, to exert positive effects on learning high-quality hash functions, we further propose to preserve local structural property and discrete nature in binary codes. Besides, we develop an efficient alternating algorithm to solve the ZSH model. Extensive experiments conducted on various real-life datasets show the superior zero-shot image retrieval performance of ZSH as compared to several state-of-the-art hashing methods.

136 citations


Journal ArticleDOI
TL;DR: Experiments carried out on an archive of aerial images point out that the presented hashing methods are much faster, while keeping a similar (or even higher) retrieval accuracy, than those typically used in RS, which exploit an exact nearest neighbor search.
Abstract: Large-scale remote sensing (RS) image search and retrieval have recently attracted great attention, due to the rapid evolution of satellite systems, that results in a sharp growing of image archives. An exhaustive search through linear scan from such archives is time demanding and not scalable in operational applications. To overcome such a problem, this paper introduces hashing-based approximate nearest neighbor search for fast and accurate image search and retrieval in large RS data archives. The hashing aims at mapping high-dimensional image feature vectors into compact binary hash codes, which are indexed into a hash table that enables real-time search and accurate retrieval. Such binary hash codes can also significantly reduce the amount of memory required for storing the RS images in the auxiliary archives. In particular, in this paper, we introduce in RS two kernel-based nonlinear hashing methods. The first hashing method defines hash functions in the kernel space by using only unlabeled images, while the second method leverages on the semantic similarity extracted by annotated images to describe much distinctive hash functions in the kernel space. The effectiveness of considered hashing methods is analyzed in terms of RS image retrieval accuracy and retrieval time. Experiments carried out on an archive of aerial images point out that the presented hashing methods are much faster, while keeping a similar (or even higher) retrieval accuracy, than those typically used in RS, which exploit an exact nearest neighbor search.

131 citations


Journal ArticleDOI
Ying Zhang1, Huchuan Lu1, Lihe Zhang1, Xiang Ruan2, Shun Sakai2 
TL;DR: A novel anomaly detection approach based on Locality Sensitive Hashing Filters (LSHF), which hashes normal activities into multiple feature buckets with Locality Sense Hashing (LSH) functions to filter out abnormal activities and outperforms state-of-the-art anomaly detection methods.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep architecture that learns instance-aware image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category.
Abstract: Similarity-preserving hashing is a commonly used method for nearest neighbor search in large-scale image retrieval. For image retrieval, deep-network-based hashing methods are appealing, since they can simultaneously learn effective image representations and compact hash codes. This paper focuses on deep-network-based hashing for multi-label images, each of which may contain objects of multiple categories. In most existing hashing methods, each image is represented by one piece of hash code, which is referred to as semantic hashing. This setting may be suboptimal for multi-label image retrieval. To solve this problem, we propose a deep architecture that learns instance-aware image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category. The instance-aware representations not only bring advantages to semantic hashing but also can be used in category-aware hashing, in which an image is represented by multiple pieces of hash codes and each piece of code corresponds to a category. Extensive evaluations conducted on several benchmark data sets demonstrate that for both the semantic hashing and the category-aware hashing, the proposed method shows substantial improvement over the state-of-the-art supervised and unsupervised hashing methods.

Proceedings Article
09 Jul 2016
TL;DR: A novel deep semantic-preserving and ranking-based hashing (DSRH) architecture is presented, which consists of three components: a deep CNN for learning image representations, a hash stream of a binary mapping layer by evenly dividing the learnt representations into multiple bags and encoding each bag into one hash bit, and a classification stream.
Abstract: Hashing techniques have been intensively investigated for large scale vision applications. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised hashing methods only construct similarity-preserving hash codes. Observing that semantic structures carry complementary information, we propose the idea of cotraining for hashing, by jointly learning projections from image representations to hash codes and classification. Specifically, a novel deep semantic-preserving and ranking-based hashing (DSRH) architecture is presented, which consists of three components: a deep CNN for learning image representations, a hash stream of a binary mapping layer by evenly dividing the learnt representations into multiple bags and encoding each bag into one hash bit, and a classification stream. Mean-while, our model is learnt under two constraints at the top loss layer of hash stream: a triplet ranking loss and orthogonality constraint. The former aims to preserve the relative similarity ordering in the triplets, while the latter makes different hash bit as independent as possible. We have conducted experiments on CIFAR-10 and NUS-WIDE image benchmarks, demonstrating that our approach can provide superior image search accuracy than other state-of-the-art hashing techniques.

Proceedings ArticleDOI
06 Jun 2016
TL;DR: A novel supervised cross-modal hashing method, Correlation Autoencoder Hashing (CAH), to learn discriminative and compact binary codes based on deep autoencoders, which jointly maximizes the feature correlation revealed by bimodal data and the semantic correlation conveyed in similarity labels.
Abstract: Due to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from large-scale datasets. While there is increasing interest in cross-modal hashing which facilitates cross-media retrieval by embedding data from different modalities into a common Hamming space, how to distill the cross-modal correlation structure effectively remains a challenging problem. In this paper, we propose a novel supervised cross-modal hashing method, Correlation Autoencoder Hashing (CAH), to learn discriminative and compact binary codes based on deep autoencoders. Specifically, CAH jointly maximizes the feature correlation revealed by bimodal data and the semantic correlation conveyed in similarity labels, while embeds them into hash codes by nonlinear deep autoencoders. Extensive experiments clearly show the superior effectiveness and efficiency of CAH against the state-of-the-art hashing methods on standard cross-modal retrieval benchmarks.

Journal ArticleDOI
TL;DR: Both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings and demonstrate that the proposed techniques can significantly outperform both the naive construction methods and the state-of-the-art hashing algorithms.
Abstract: Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both the naive construction methods and the state-of-the-art hashing algorithms.

Proceedings Article
12 Feb 2016
TL;DR: This paper proposes Online Cross-modal Hashing (OCMH) which can effectively address the above two problems by learning the shared latent codes (SLC) of hash codes and dynamic transfer matrix, and demonstrates the effectiveness and efficiency of OCMH for online cross- modal web image retrieval.
Abstract: Cross-modal hashing (CMH) is an efficient technique for the fast retrieval of web image data, and it has gained a lot of attentions recently. However, traditional CMH methods usually apply batch learning for generating hash functions and codes. They are inefficient for the retrieval of web images which usually have streaming fashion. Online learning can be exploited for CMH. But existing online hashing methods still cannot solve two essential problems: efficient updating of hash codes and analysis of cross-modal correlation. In this paper, we propose Online Cross-modal Hashing (OCMH) which can effectively address the above two problems by learning the shared latent codes (SLC). In OCMH, hash codes can be represented by the permanent SLC and dynamic transfer matrix. Therefore, inefficient updating of hash codes is transformed to the efficient updating of SLC and transfer matrix, and the time complexity is irrelevant to the database size. Moreover, SLC is shared by all the modalities, and thus it can encode the latent cross-modal correlation, which further improves the overall cross-modal correlation between heterogeneous data. Experimental results on two real-world multi-modal web image datasets: MIR Flickr and NUS-WIDE, demonstrate the effectiveness and efficiency of OCMH for online cross-modal web image retrieval.

Journal ArticleDOI
Di Wang1, Xinbo Gao1, Xiumei Wang1, Lihuo He1, Bo Yuan1 
TL;DR: The proposed MDBE can preserve both discriminability and similarity for hash codes, and will enhance retrieval accuracy, compared with the state-of-the-art methods for large-scale cross-modal retrieval task.
Abstract: Multimodal hashing, which conducts effective and efficient nearest neighbor search across heterogeneous data on large-scale multimedia databases, has been attracting increasing interest, given the explosive growth of multimedia content on the Internet. Recent multimodal hashing research mainly aims at learning the compact binary codes to preserve semantic information given by labels. The overwhelming majority of these methods are similarity preserving approaches which approximate pairwise similarity matrix with Hamming distances between the to-be-learnt binary hash codes. However, these methods ignore the discriminative property in hash learning process, which results in hash codes from different classes undistinguished, and therefore reduces the accuracy and robustness for the nearest neighbor search. To this end, we present a novel multimodal hashing method, named multimodal discriminative binary embedding (MDBE), which focuses on learning discriminative hash codes. First, the proposed method formulates the hash function learning in terms of classification, where the binary codes generated by the learned hash functions are expected to be discriminative. And then, it exploits the label information to discover the shared structures inside heterogeneous data. Finally, the learned structures are preserved for hash codes to produce similar binary codes in the same class. Hence, the proposed MDBE can preserve both discriminability and similarity for hash codes, and will enhance retrieval accuracy. Thorough experiments on benchmark data sets demonstrate that the proposed method achieves excellent accuracy and competitive computational efficiency compared with the state-of-the-art methods for large-scale cross-modal retrieval task.

Journal ArticleDOI
TL;DR: The proposed methods have this capability to localize the tampering area, which is not possible in all hashing schemes, and are robust to a wide range of distortions and attacks such as additive noise, blurring, brightness changes and JPEG compression.
Abstract: Perceptual image hashing finds increasing attention in several multimedia security applications such as image identification/authentication, tamper detection, and watermarking. Robust feature extraction is the main challenge in hashing schemes. Local binary pattern (LBP) is a new feature which is due to its simplicity, discriminative power, computational efficiency, and robustness to illumination changes has been used in various image applications. In this paper, we propose a robust image hashing scheme using center-symmetric local binary patterns (CSLBP). In the proposed image hashing, CSLBP features are extracted from each non-overlapping block within the original gray-scale image. For each block, the final hash code is obtained by inner product of its CSLBP feature vector and a pseudorandom weight vector. Furthermore, singular value decomposition (SVD) is combined with CSLBP to introduce a more robust hashing method called SVD-CSLBP. Performances of the proposed hashing schemes are evaluated with two groups of popular applications in perceptual image hashing schemes: image identification and image authentication. Experimental results show that the proposed methods are robust to a wide range of distortions and attacks such as additive noise, blurring, brightness changes and JPEG compression. Moreover, the proposed methods have this capability to localize the tampering area, which is not possible in all hashing schemes.

Journal ArticleDOI
Xianglong Liu1, Bowen Du1, Cheng Deng2, Ming Liu1, Bo Lang1 
TL;DR: This paper proposes a structure sensitive hashing based on cluster prototypes, which explicitly exploits both global and local structures and significantly outperforms state-of-the-art hashing methods in terms of semantic and metric neighbor search.
Abstract: Hashing has been proved as an attractive solution to approximate nearest neighbor search, owing to its theoretical guarantee and computational efficiency. Though most of prior hashing algorithms can achieve low memory and computation consumption by pursuing compact hash codes, however, they are still far beyond the capability of learning discriminative hash functions from the data with complex inherent structure among them. To address this issue, in this paper, we propose a structure sensitive hashing based on cluster prototypes, which explicitly exploits both global and local structures. An alternating optimization algorithm, respectively, minimizing the quantization loss and spectral embedding loss, is presented to simultaneously discover the cluster prototypes for each hash function, and optimally assign unique binary codes to them satisfying the affinity alignment between them. For hash codes of a desired length, an adaptive bit assignment is further appended to the product quantization of the subspaces, approximating the Hamming distances and meanwhile balancing the variance among hash functions. Experimental results on four large-scale benchmarks CIFAR-10, NUS-WIDE, SIFT1M, and GIST1M demonstrate that our approach significantly outperforms state-of-the-art hashing methods in terms of semantic and metric neighbor search.

Posted Content
TL;DR: In this article, the authors show that a trivial solution that encodes the output of a classifier significantly outperforms existing supervised or semi-supervised methods, while using much shorter codes, and propose two alternative protocols for supervised hashing: one based on retrieval on a disjoint set of classes, and another based on transfer learning to new classes.
Abstract: Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes. When hashing is supervised, the codes are trained using labels on the training data. This paper first shows that the evaluation protocols used in the literature for supervised hashing are not satisfactory: we show that a trivial solution that encodes the output of a classifier significantly outperforms existing supervised or semi-supervised methods, while using much shorter codes. We then propose two alternative protocols for supervised hashing: one based on retrieval on a disjoint set of classes, and another based on transfer learning to new classes. We provide two baseline methods for image-related tasks to assess the performance of (semi-)supervised hashing: without coding and with unsupervised codes. These baselines give a lower- and upper-bound on the performance of a supervised hashing scheme.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a robust image hashing based on color vector angle and Canny operator, which first converts input image to a normalized image by interpolation and Gaussian low-pass filtering, and then, color vector angles and image edges are extracted from the normalized image.
Abstract: Image hashing is a novel technology of multimedia processing, and finds many applications, such as image forensics, image retrieval and image indexing. Conventional image hashing algorithms have limitations in reaching desirable classification performances between rotation robustness and discrimination. Aiming at this issue, we propose a robust image hashing based on color vector angle and Canny operator. Specifically, our hashing firstly converts input image to a normalized image by interpolation and Gaussian low-pass filtering. And then, color vector angles and image edges are both extracted from the normalized image. Finally, statistical features incorporating color vector angles and image edges are calculated to form image hash. We conduct experiments with 2762 images to validate efficiency of our hashing. The experimental results show that our hashing is robust against normal digital processing, such as image rotation, brightness/contrast adjustment and JPEG compression, and reaches good discrimination. Receiver operating characteristics (ROC) curve comparisons with some state-of-the-art algorithms indicate that our hashing outperforms these compared algorithms in classification performances between robustness and discriminative capability.

Journal ArticleDOI
TL;DR: A novel unsupervised hashing method for projecting local feature descriptors from a high-dimensional feature space to a lower-dimensional Hamming space via compact bilinear projections rather than a single large projection matrix is proposed.
Abstract: The potential value of hashing techniques has led to it becoming one of the most active research areas in computer vision and multimedia. However, most existing hashing methods for image search and retrieval are based on global feature representations, which are susceptible to image variations such as viewpoint changes and background cluttering. Traditional global representations gather local features directly to output a single vector without the analysis of the intrinsic geometric property of local features. In this paper, we propose a novel unsupervised hashing method called unsupervised bilinear local hashing (UBLH) for projecting local feature descriptors from a high-dimensional feature space to a lower-dimensional Hamming space via compact bilinear projections rather than a single large projection matrix. UBLH takes the matrix expression of local features as input and preserves the feature-to-feature and image-to-image structures of local features simultaneously. Experimental results on challenging data sets including Caltech-256, SUN397, and Flickr 1M demonstrate the superiority of UBLH compared with state-of-the-art hashing methods.

Journal ArticleDOI
TL;DR: MGCMH is unsupervised method which integrates multi-graph learning and hash function learning into a joint framework, to learn unified hash space for all modalities, and an alternating learning algorithm is proposed to jointly optimize the modality weights, hash codes and functions.
Abstract: With the advance of internet and multimedia technologies, large-scale multi-modal representation techniques such as cross-modal hashing, are increasingly demanded for multimedia retrieval. In cross-modal hashing, three essential problems should be seriously considered. The first is that effective cross-modal relationship should be learned from training data with scarce label information. The second is that appropriate weights should be assigned for different modalities to reflect their importance. The last is the scalability of training process which is usually ignored by previous methods. In this paper, we propose Multi-graph Cross-modal Hashing (MGCMH) by comprehensively considering these three points. MGCMH is unsupervised method which integrates multi-graph learning and hash function learning into a joint framework, to learn unified hash space for all modalities. In MGCMH, different modalities are assigned with proper weights for the generation of multi-graph and hash codes respectively. As a result, more precise cross-modal relationship can be preserved in the hash space. Then Nystrom approximation approach is leveraged to efficiently construct the graphs. Finally an alternating learning algorithm is proposed to jointly optimize the modality weights, hash codes and functions. Experiments conducted on two real-world multi-modal datasets demonstrate the effectiveness of our method, in comparison with several representative cross-modal hashing methods.

Posted Content
TL;DR: In this paper, a Correlation Hashing Network (CHN) is proposed for cross-modal hashing, which jointly learns good data representation tailored to hash coding and formally controls the quantization error.
Abstract: Hashing is widely applied to approximate nearest neighbor search for large-scale multimodal retrieval with storage and computation efficiency. Cross-modal hashing improves the quality of hash coding by exploiting semantic correlations across different modalities. Existing cross-modal hashing methods first transform data into low-dimensional feature vectors, and then generate binary codes by another separate quantization step. However, suboptimal hash codes may be generated since the quantization error is not explicitly minimized and the feature representation is not jointly optimized with the binary codes. This paper presents a Correlation Hashing Network (CHN) approach to cross-modal hashing, which jointly learns good data representation tailored to hash coding and formally controls the quantization error. The proposed CHN is a hybrid deep architecture that constitutes a convolutional neural network for learning good image representations, a multilayer perception for learning good text representations, two hashing layers for generating compact binary codes, and a structured max-margin loss that integrates all things together to enable learning similarity-preserving and high-quality hash codes. Extensive empirical study shows that CHN yields state of the art cross-modal retrieval performance on standard benchmarks.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: A SRH that explores the discriminative representation obtained by deep neural networks to design hashing approaches, using the long-short term memory (LSTM) network to model the structure of video samples.
Abstract: Hashing for large-scale multimedia is a popular research topic, attracting much attention in computer vision and visual information retrieval. Previous works mostly focus on hashing the images and texts while the approaches designed for videos are limited. In this paper, we propose a \textit{Supervised Recurrent Hashing} (SRH) that explores the discriminative representation obtained by deep neural networks to design hashing approaches. The long-short term memory (LSTM) network is deployed to model the structure of video samples. The max-pooling mechanism is introduced to embedding the frames into fixed-length representations that are fed into supervised hashing loss. Experiments on UCF-101 dataset demonstrate that the proposed method can significantly outperforms several state-of-the-art methods.

Journal ArticleDOI
TL;DR: Receiver operating characteristic (ROC) curve comparisons with state-of-the-art algorithms illustrate that the proposed hashing has better performances than the compared algorithms in classification between robustness and discrimination.

Journal ArticleDOI
TL;DR: The combination of global and local features is robust against the content-preserving operations, which has a desirable discriminative capability and is capable of localizing the counterfeit area.
Abstract: Background Image authentication is one of the challenging research areas in the multimedia technology due to the availability of image editing tools. Image hash may be used for image authentication which should be invariant to perceptually similar image and sensitive to content changes. The challenging issue in image hashing is to design a system which simultaneously provides rotation robustness, desirable discrimination, sensitivity and localization of forged area with minimum hash length.

Proceedings Article
12 Feb 2016
TL;DR: A new deep convolutional neural network to learn discriminative and compact binary representations of faces for face video retrieval and integrates feature extraction and hash learning into a unified optimization framework for the optimal compatibility of feature extractor and hash functions.
Abstract: Retrieving faces from large mess of videos is an attractive research topic with wide range of applications. Its challenging problems are large intra-class variations, and tremendous time and space complexity. In this paper, we develop a new deep convolutional neural network (deep CNN) to learn discriminative and compact binary representations of faces for face video retrieval. The network integrates feature extraction and hash learning into a unified optimization framework for the optimal compatibility of feature extractor and hash functions. In order to better initialize the network, the low-rank discriminative binary hashing is proposed to pre-learn hash functions during the training procedure. Our method achieves excellent performances on two challenging TV-Series datasets.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A multilinear hyperplane hashing that generates a hash bit using multiple linear projections with strong locality sensitivity to hyperplane queries is proposed and an angular quantization based learning framework for compact multil inear hashing is introduced, which considerably boosts the search performance with less hash bits.
Abstract: Hashing has become an increasingly popular technique for fast nearest neighbor search. Despite its successful progress in classic pointto-point search, there are few studies regarding point-to-hyperplane search, which has strong practical capabilities of scaling up applications like active learning with SVMs. Existing hyperplane hashing methods enable the fast search based on randomly generated hash codes, but still suffer from a low collision probability and thus usually require long codes for a satisfying performance. To overcome this problem, this paper proposes a multilinear hyperplane hashing that generates a hash bit using multiple linear projections. Our theoretical analysis shows that with an even number of random linear projections, the multilinear hash function possesses strong locality sensitivity to hyperplane queries. To leverage its sensitivity to the angle distance, we further introduce an angular quantization based learning framework for compact multilinear hashing, which considerably boosts the search performance with less hash bits. Experiments with applications to large-scale (up to one million) active learning on two datasets demonstrate the overall superiority of the proposed approach.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: Binary optimized hashing (BOH) is proposed, in which it is proved that if the loss function is Lipschitz continuous, the binary optimization problem can be relaxed to a bound-constrained continuous optimization problem.
Abstract: This paper studies the problem of learning to hash, which is essentially a mixed integer optimization problem, containing both the binary hash code output and the (continuous) parameters forming the hash functions. Different from existing relaxation methods in hashing, which have no theoretical guarantees for the error bound of the relaxations, we propose binary optimized hashing (BOH), in which we prove that if the loss function is Lipschitz continuous, the binary optimization problem can be relaxed to a bound-constrained continuous optimization problem. Then we introduce a surrogate objective function, which only depends on unbinarized hash functions and does not need the slack variables transforming unbinarized hash functions to discrete functions, to approximate the relaxed objective function. We show that the approximation error is bounded and the bound is small when the problem is optimized. We apply the proposed approach to learn hash codes from either handcraft feature inputs or raw image inputs. Extensive experiments are carried out on three benchmarks, demonstrating that our approach outperforms state-of-the-arts with a significant margin on search accuracies.