scispace - formally typeset
Search or ask a question
Author

Rongkai Xia

Bio: Rongkai Xia is an academic researcher from Sun Yat-sen University. The author has contributed to research in topics: Sparse approximation & Cluster analysis. The author has an hindex of 3, co-authored 5 publications receiving 1173 citations.

Papers
More filters
Proceedings Article
27 Jul 2014
TL;DR: Extensive empirical evaluations on three benchmark datasets with different kinds of images show that the proposed method has superior performance gains over several state-of-the-art supervised and unsupervised hashing methods.
Abstract: Hashing is a popular approximate nearest neighbor search approach for large-scale image retrieval. Supervised hashing, which incorporates similarity/ dissimilarity information on entity pairs to improve the quality of hashing function learning, has recently received increasing attention. However, in the existing supervised hashing methods for images, an input image is usually encoded by a vector of handcrafted visual features. Such hand-crafted feature vectors do not necessarily preserve the accurate semantic similarities of images pairs, which may often degrade the performance of hashing function learning. In this paper, we propose a supervised hashing method for image retrieval, in which we automatically learn a good image representation tailored to hashing as well as a set of hash functions. The proposed method has two stages. In the first stage, given the pairwise similarity matrix S over training images, we propose a scalable coordinate descent method to decompose S into a product of HHT where H is a matrix with each of its rows being the approximate hash code associated to a training image. In the second stage, we propose to simultaneously learn a good feature representation for the input images as well as a set of hash functions, via a deep convolutional network tailored to the learned hash codes in H and optionally the discrete class labels of the images. Extensive empirical evaluations on three benchmark datasets with different kinds of images show that the proposed method has superior performance gains over several state-of-the-art supervised and unsupervised hashing methods.

925 citations

Proceedings Article
27 Jul 2014
TL;DR: This paper proposes a novel Markov chain method for Robust Multi-view Spectral Clustering (RMSC), which has a flavor of lowrank and sparse decomposition, and has superior performance over several state-of-the-art methods for multi-view clustering.
Abstract: Multi-view clustering, which seeks a partition of the data in multiple views that often provide complementary information to each other, has received considerable attention in recent years. In real life clustering problems, the data in each view may have considerable noise. However, existing clustering methods blindly combine the information from multi-view data with possibly considerable noise, which often degrades their performance. In this paper, we propose a novel Markov chain method for Robust Multi-view Spectral Clustering (RMSC). Our method has a flavor of lowrank and sparse decomposition, where we firstly construct a transition probability matrix from each single view, and then use these matrices to recover a shared low-rank transition probability matrix as a crucial input to the standard Markov chain method for clustering. The optimization problem of RMSC has a low-rank constraint on the transition probability matrix, and simultaneously a probabilistic simplex constraint on each of its rows. To solve this challenging optimization problem, we propose an optimization procedure based on the Augmented Lagrangian Multiplier scheme. Experimental results on various real world datasets show that the proposed method has superior performance over several state-of-the-art methods for multi-view clustering.

576 citations

Journal ArticleDOI
TL;DR: This paper proposes a scalable solution for large-scale RMTL, with either the least squares loss or the squared hinge loss, by a divide-and-conquer method, and demonstrates that this method is substantially faster than the state-of-the-art APG algorithms for R MTL.
Abstract: Multitask learning (MTL) aims at improving the generalization performance of multiple tasks by exploiting the shared factors among them. An important line of research in the MTL is the robust MTL (RMTL) methods, which use trace-norm regularization to capture task relatedness via a low-rank structure. The existing algorithms for the RMTL optimization problems rely on the accelerated proximal gradient (APG) scheme that needs repeated full singular value decomposition (SVD) operations. However, the time complexity of a full SVD is $O(\min (md^{2},m^{2}d))$ for an RMTL problem with $m$ tasks and $d$ features, which becomes unaffordable in real-world MTL applications that often have a large number of tasks and high-dimensional features. In this paper, we propose a scalable solution for large-scale RMTL, with either the least squares loss or the squared hinge loss, by a divide-and-conquer method. The proposed method divides the original RMTL problem into several size-reduced subproblems, solves these cheaper subproblems in parallel by any base algorithm (e.g., APG) for RMTL, and then combines the results to obtain the final solution. Our theoretical analysis indicates that, with high probability, the recovery errors of the proposed divide-and-conquer algorithm are bounded by those of the base algorithm. Furthermore, in order to solve the subproblems with the least squares loss or the squared hinge loss, we propose two efficient base algorithms based on the linearized alternating direction method, respectively. Experimental results demonstrate that, with little loss of accuracy, our method is substantially faster than the state-of-the-art APG algorithms for RMTL.

18 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, a comprehensive survey of the learning to hash algorithms is presented, categorizing them according to the manners of preserving the similarities into: pairwise similarity preserving, multi-wise similarity preservation, implicit similarity preserving and quantization, and discuss their relations.
Abstract: Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.

838 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.
Abstract: Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of hand-engineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing sub-optimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.

822 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A novel Deep Supervised Hashing method to learn compact similarity-preserving binary code for the huge body of image data and extensive experiments show the promising performance of the method compared with the state-of-the-arts.
Abstract: In this paper, we present a new hashing method to learn compact binary codes for highly efficient image retrieval on large-scale datasets. While the complex image appearance variations still pose a great challenge to reliable retrieval, in light of the recent progress of Convolutional Neural Networks (CNNs) in learning robust image representation on various vision tasks, this paper proposes a novel Deep Supervised Hashing (DSH) method to learn compact similarity-preserving binary code for the huge body of image data. Specifically, we devise a CNN architecture that takes pairs of images (similar/dissimilar) as training inputs and encourages the output of each image to approximate discrete values (e.g. +1/-1). To this end, a loss function is elaborately designed to maximize the discriminability of the output space by encoding the supervised information from the input image pairs, and simultaneously imposing regularization on the real-valued outputs to approximate the desired discrete values. For image retrieval, new-coming query images can be easily encoded by propagating through the network and then quantizing the network outputs to binary codes representation. Extensive experiments on two large scale datasets CIFAR-10 and NUS-WIDE show the promising performance of our method compared with the state-of-the-arts.

699 citations

Journal ArticleDOI
TL;DR: This overview reviews theoretical underpinnings of multi-view learning and attempts to identify promising venues and point out some specific challenges which can hopefully promote further research in this rapidly developing field.

679 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work proposes an effective deep learning framework to generate binary hash codes for fast image retrieval by employing a hidden layer for representing the latent concepts that dominate the class labels in convolutional neural networks.
Abstract: Approximate nearest neighbor search is an efficient strategy for large-scale image retrieval. Encouraged by the recent advances in convolutional neural networks (CNNs), we propose an effective deep learning framework to generate binary hash codes for fast image retrieval. Our idea is that when the data labels are available, binary codes can be learned by employing a hidden layer for representing the latent concepts that dominate the class labels. The utilization of the CNN also allows for learning image representations. Unlike other supervised methods that require pair-wised inputs for binary code learning, our method learns hash codes and image representations in a point-wised manner, making it suitable for large-scale datasets. Experimental results show that our method outperforms several state-of-the-art hashing algorithms on the CIFAR-10 and MNIST datasets. We further demonstrate its scalability and efficacy on a large-scale dataset of 1 million clothing images.

605 citations