Showing papers by "Fumin Shen published in 2017"

PDF

Open Access

Journal Article•DOI•

Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval

[...]

Xing Xu¹, Fumin Shen¹, Yang Yang¹, Heng Tao Shen¹, Xuelong Li² - Show less +1 more•Institutions (2)

University of Electronic Science and Technology of China¹, Chinese Academy of Sciences²

01 May 2017-IEEE Transactions on Image Processing

TL;DR: A novel cross- modal hashing method, termed discrete cross-modal hashing (DCH), which directly learns discriminative binary codes while retaining the discrete constraints, and an effective discrete optimization algorithm is developed for DCH to jointly learn the modality-specific hash function and the unified binary codes.

...read moreread less

Abstract: Hashing based methods have attracted considerable attention for efficient cross-modal retrieval on large-scale multimedia data. The core problem of cross-modal hashing is how to learn compact binary codes that construct the underlying correlations between heterogeneous features from different modalities. A majority of recent approaches aim at learning hash functions to preserve the pairwise similarities defined by given class labels. However, these methods fail to explicitly explore the discriminative property of class labels during hash function learning. In addition, they usually discard the discrete constraints imposed on the to-be-learned binary codes, and compromise to solve a relaxed problem with quantization to obtain the approximate binary solution. Therefore, the binary codes generated by these methods are suboptimal and less discriminative to different classes. To overcome these drawbacks, we propose a novel cross-modal hashing method, termed discrete cross-modal hashing (DCH), which directly learns discriminative binary codes while retaining the discrete constraints. Specifically, DCH learns modality-specific hash functions for generating unified binary codes, and these binary codes are viewed as representative features for discriminative classification with class labels. An effective discrete optimization algorithm is developed for DCH to jointly learn the modality-specific hash function and the unified binary codes. Extensive experiments on three benchmark data sets highlight the superiority of DCH under various cross-modal scenarios and show its state-of-the-art performance.

...read moreread less

358 citations

Proceedings Article•DOI•

Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval

[...]

Li Liu¹, Fumin Shen, Yuming Shen¹, Xianglong Liu², Ling Shao¹ - Show less +1 more•Institutions (2)

University of East Anglia¹, Beihang University²

21 Jul 2017

TL;DR: This paper introduces a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework, and is the first hashing work specifically designed for category-level SBIR with an end to end deep architecture.

...read moreread less

Abstract: Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSHs superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.

...read moreread less

221 citations

Proceedings Article•DOI•

Zero-Shot Action Recognition with Error-Correcting Output Codes

[...]

Jie Qin¹, Li Liu², Ling Shao¹, Fumin Shen, Bingbing Ni³, Jiaxin Chen¹, Yunhong Wang¹ - Show less +3 more•Institutions (3)

Beihang University¹, University of East Anglia², Shanghai Jiao Tong University³

21 Jul 2017

TL;DR: This paper explores zero-shot action recognition from a novel perspective by adopting the Error-Correcting Output Codes (dubbed ZSECOC), which equips the conventional ECOC with the additional capability of ZSAR, by addressing the domain shift problem.

...read moreread less

Abstract: Recently, zero-shot action recognition (ZSAR) has emerged with the explosive growth of action categories. In this paper, we explore ZSAR from a novel perspective by adopting the Error-Correcting Output Codes (dubbed ZSECOC). Our ZSECOC equips the conventional ECOC with the additional capability of ZSAR, by addressing the domain shift problem. In particular, we learn discriminative ZSECOC for seen categories from both category-level semantics and intrinsic data structures. This procedure deals with domain shift implicitly by transferring the well-established correlations among seen categories to unseen ones. Moreover, a simple semantic transfer strategy is developed for explicitly transforming the learned embeddings of seen categories to better fit the underlying structure of unseen categories. As a consequence, our ZSECOC inherits the promising characteristics from ECOC as well as overcomes domain shift, making it more discriminative for ZSAR. We systematically evaluate ZSECOC on three realistic action benchmarks, i.e. Olympic Sports, HMDB51 and UCF101. The experimental results clearly show the superiority of ZSECOC over the state-of-the-art methods.

...read moreread less

157 citations

Proceedings Article•DOI•

Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning

[...]

Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song - Show less +2 more

21 Jul 2017

TL;DR: A novel projection framework based on matrix tri-factorization with manifold regularizations for zero-shot learning that significantly outperforms the state-of-the-arts and devise an effective prediction scheme by exploiting the test-time manifold structure.

...read moreread less

Abstract: Zero-shot learning (ZSL) aims to recognize objects of unseen classes with available training data from another set of seen classes. Existing solutions are focused on exploring knowledge transfer via an intermediate semantic embedding (e.g., attributes) shared between seen and unseen classes. In this paper, we propose a novel projection framework based on matrix tri-factorization with manifold regularizations. Specifically, we learn the semantic embedding projection by decomposing the visual feature matrix under the guidance of semantic embedding and class label matrices. By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces. To avoid the projection domain shift problem, we devise an effective prediction scheme by exploiting the test-time manifold structure. Extensive experiments on four benchmark datasets show that our approach significantly outperforms the state-of-the-arts, yielding an average improvement ratio by 7.4% and 31.9% for the recognition and retrieval task, respectively.

...read moreread less

136 citations

Proceedings Article•DOI•

From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis

[...]

Yang Long¹, Li Liu², Ling Shao², Fumin Shen³, Guiguang Ding⁴, Jungong Han⁵ - Show less +2 more•Institutions (5)

University of Sheffield¹, University of East Anglia², University of Electronic Science and Technology of China³, Tsinghua University⁴, Northumbria University⁵

01 Jul 2017

TL;DR: A new Zero-shot learning framework that can synthesise visual features for unseen classes without acquiring real images is proposed and extensive experimental results manifest that the proposed approach significantly improve the state-of-the-art results.

...read moreread less

Abstract: Robust object recognition systems usually rely on powerful feature extraction mechanisms from a large number of real images. However, in many realistic applications, collecting sufficient images for ever-growing new classes is unattainable. In this paper, we propose a new Zero-shot learning (ZSL) framework that can synthesise visual features for unseen classes without acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS) algorithm, semantic attributes are effectively utilised as an intermediate clue to synthesise unseen visual features at the training stage. Hereafter, ZSL recognition is converted into the conventional supervised problem, i.e. the synthesised visual features can be straightforwardly fed to typical classifiers such as SVM. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data. Extensive experimental results manifest that our proposed approach significantly improve the state-of-the-art results.

...read moreread less

130 citations

Journal Article•DOI•

Asymmetric Binary Coding for Image Search

[...]

Fumin Shen¹, Yang Yang¹, Li Liu, Wei Liu², Dacheng Tao³, Heng Tao Shen¹ - Show less +2 more•Institutions (3)

University of Electronic Science and Technology of China¹, Tencent², University of Sydney³

01 Sep 2017-IEEE Transactions on Multimedia

TL;DR: A general binary coding framework based on asymmetric hash functions, named asymmetric inner-product binary coding (AIBC), which extends the AIBC approach to the supervised hashing scenario, where the inner products of learned binary codes are forced to fit the supervised similarities.

...read moreread less

Abstract: Learning to hash has attracted broad research interests in recent computer vision and machine learning studies, due to its ability to accomplish efficient approximate nearest neighbor search. However, the closely related task, maximum inner product search (MIPS), has rarely been studied in this literature. To facilitate the MIPS study, in this paper, we introduce a general binary coding framework based on asymmetric hash functions, named asymmetric inner-product binary coding (AIBC). In particular, AIBC learns two different hash functions, which can reveal the inner products between original data vectors by the generated binary vectors. Although conceptually simple, the associated optimization is very challenging due to the highly nonsmooth nature of the objective that involves sign functions. We tackle the nonsmooth optimization in an alternating manner, by which each single coding function is optimized in an efficient discrete manner. We also simplify the objective by discarding the quadratic regularization term which significantly boosts the learning efficiency. Both problems are optimized in an effective discrete way without continuous relaxations, which produces high-quality hash codes. In addition, we extend the AIBC approach to the supervised hashing scenario, where the inner products of learned binary codes are forced to fit the supervised similarities. Extensive experiments on several benchmark image retrieval databases validate the superiority of the AIBC approaches over many recently proposed hashing algorithms.

...read moreread less

121 citations

Journal Article•DOI•

Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval

[...]

Xiaobo Shen¹, Fumin Shen², Quansen Sun¹, Yang Yang², Yun-Hao Yuan³, Heng Tao Shen⁴ - Show less +2 more•Institutions (4)

Nanjing University of Science and Technology¹, University of Electronic Science and Technology of China², Yangzhou University³, University of Queensland⁴

01 Dec 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper proposes an unsupervised hashing method for semi-paired cross-view retrieval, dubbed semi-PAired discrete hashing (SPDH), and explores the underlying structure of the constructed common latent subspace, where both paired and unpaired samples are well aligned.

...read moreread less

Abstract: Due to the significant reduction in computational cost and storage, hashing techniques have gained increasing interests in facilitating large-scale cross-view retrieval tasks. Most cross-view hashing methods are developed by assuming that data from different views are well paired, e.g., text-image pairs. In real-world applications, however, this fully-paired multiview setting may not be practical. The more practical yet challenging semi-paired cross-view retrieval problem, where pairwise correspondences are only partially provided, has less been studied. In this paper, we propose an unsupervised hashing method for semi-paired cross-view retrieval, dubbed semi-paired discrete hashing (SPDH). In specific, SPDH explores the underlying structure of the constructed common latent subspace, where both paired and unpaired samples are well aligned. To effectively preserve the similarities of semi-paired data in the latent subspace, we construct the cross-view similarity graph with the help of anchor data pairs. SPDH jointly learns the latent features and hash codes with a factorization-based coding scheme. For the formulated objective function, we devise an efficient alternating optimization algorithm, where the key binary code learning problem is solved in a bit-by-bit manner with each bit generated with a closed-form solution. The proposed method is extensively evaluated on four benchmark datasets with both fully-paired and semi-paired settings and the results demonstrate the superiority of SPDH over several other state-of-the-art methods in term of both accuracy and scalability.

...read moreread less

110 citations

Journal Article•DOI•

Learning robust and discriminative low-rank representations for face recognition with occlusion

[...]

Guangwei Gao¹, Jian Yang², Xiao-Yuan Jing³, Fumin Shen⁴, Wankou Yang⁵, Dong Yue³ - Show less +2 more•Institutions (5)

Minjiang University¹, Nanjing University of Science and Technology², Nanjing University of Posts and Telecommunications³, University of Electronic Science and Technology of China⁴, Southeast University⁵

01 Jun 2017-Pattern Recognition

TL;DR: This paper proposes a novel method by exploiting the low-rankness of both the data representation and each occlusion-induced error image simultaneously, by which the global structure of data together with the error images can be well captured.

...read moreread less

108 citations

Proceedings Article•DOI•

Deep Asymmetric Pairwise Hashing

[...]

Fumin Shen¹, Xin Gao¹, Li Liu², Yang Yang¹, Heng Tao Shen¹ - Show less +1 more•Institutions (2)

University of Electronic Science and Technology of China¹, University of East Anglia²

23 Oct 2017

TL;DR: This work proposes a novel Deep Asymmetric Pairwise Hashing approach (DAPH) for supervised hashing, and devise an efficient alternating algorithm to optimize the asymmetric deep hash functions and high-quality binary code jointly.

...read moreread less

Abstract: Recently, deep neural networks based hashing methods have greatly improved the multimedia retrieval performance by simultaneously learning feature representations and binary hash functions. Inspired by the latest advance in the asymmetric hashing scheme, in this work, we propose a novel Deep Asymmetric Pairwise Hashing approach (DAPH) for supervised hashing. The core idea is that two deep convolutional models are jointly trained such that their output codes for a pair of images can well reveal the similarity indicated by their semantic labels. A pairwise loss is elaborately designed to preserve the pairwise similarities between images as well as incorporating the independence and balance hash code learning criteria. By taking advantage of the flexibility of asymmetric hash functions, we devise an efficient alternating algorithm to optimize the asymmetric deep hash functions and high-quality binary code jointly. Experiments on three image benchmarks show that DAPH achieves the state-of-the-art performance on large-scale image retrieval.

...read moreread less

105 citations

Proceedings Article•DOI•

Unsupervised cross-modal retrieval through adversarial learning

[...]

Li He¹, Xing Xu², Huimin Lu³, Yang Yang², Fumin Shen², Heng Tao Shen² - Show less +2 more•Institutions (3)

Qualcomm¹, University of Electronic Science and Technology of China², Kyushu Institute of Technology³

01 Jul 2017

TL;DR: A novel Unsupervised Cross-modal retrieval method based on Adversarial Learning, namely UCAL is proposed, which adds an additional regularization by introducing adversarial learning and introduces a modality classifier to predict the modality of a transformed feature.

...read moreread less

Abstract: The core of existing cross-modal retrieval approaches is to close the gap between different modalities either by finding a maximally correlated subspace or by jointly learning a set of dictionaries. However, the statistical characteristics of the transformed features were never considered. Inspired by recent advances in adversarial learning and domain adaptation, we propose a novel Unsupervised Cross-modal retrieval method based on Adversarial Learning, namely UCAL. In addition to maximizing the correlations between modalities, we add an additional regularization by introducing adversarial learning. In particular, we introduce a modality classifier to predict the modality of a transformed feature. This can be viewed as a regularization on the statistical aspect of the feature transforms, which ensures that the transformed features are also statistically indistinguishable. Experiments on popular multimodal datasets show that UCAL achieves competitive performance compared to state of the art supervised cross-modal retrieval methods.

...read moreread less

90 citations

Posted Content•

Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval

[...]

Li Liu¹, Fumin Shen, Yuming Shen¹, Xianglong Liu², Ling Shao¹ - Show less +1 more•Institutions (2)

University of East Anglia¹, Beihang University²

16 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Deep Sketch Hashing (DSH) as mentioned in this paper uses three convolutional neural networks to encode free-hand sketches, natural images and auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion.

...read moreread less

Abstract: Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSH's superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.

...read moreread less

Journal Article•DOI•

Exploiting Web Images for Dataset Construction: A Domain Robust Approach

[...]

Yazhou Yao¹, Jian Zhang¹, Fumin Shen², Xian-Sheng Hua³, Jingsong Xu¹, Zhenmin Tang⁴ - Show less +2 more•Institutions (4)

University of Technology, Sydney¹, University of Electronic Science and Technology of China², Alibaba Group³, Nanjing University of Science and Technology⁴

01 Aug 2017-IEEE Transactions on Multimedia

TL;DR: This work proposes to solve the employed problems by the cutting-plane and concave-convex procedure algorithm and builds an image dataset with 20 categories that can be generalized well to unseen target domains.

...read moreread less

Abstract: Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time-consuming and labor intensive. To reduce the cost of manual labeling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the “dataset bias problem.” To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually nonsalient and less relevant expansions are filtered out. By treating each selected expansion as a “bag” and the retrieved images as “instances,” image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison, and object detection demonstrate the domain robustness of our dataset.

...read moreread less

Journal Article•DOI•

Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval

[...]

Li Liu¹, Zijia Lin², Ling Shao¹, Fumin Shen³, Guiguang Ding², Jungong Han⁴ - Show less +2 more•Institutions (4)

Southwest University¹, Tsinghua University², University of Electronic Science and Technology of China³, Northumbria University⁴

01 Jan 2017-IEEE Transactions on Image Processing

TL;DR: This paper introduces a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities and significantly outperforms the state-of-the-art multimodality hashing techniques.

...read moreread less

Abstract: With the dramatic development of the Internet, how to exploit large-scale retrieval techniques for multimodal web data has become one of the most popular but challenging problems in computer vision and multimedia. Recently, hashing methods are used for fast nearest neighbor search in large-scale data spaces, by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. Inspired by this, in this paper, we introduce a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities. Particularly, in the learning phase, each bit of a code can be sequentially learned with a discrete optimization scheme that jointly minimizes its empirical loss based on a boosting strategy. In a bitwise manner, hash functions are then learned for each modality, mapping the corresponding representations into unified hash codes. We regard this approach as cross-modality sequential discrete hashing (CSDH), which can effectively reduce the quantization errors arisen in the oversimplified rounding-off step and thus lead to high-quality binary codes. In the test phase, a simple fusion scheme is utilized to generate a unified hash code for final retrieval by merging the predicted hashing results of an unseen instance from different modalities. The proposed CSDH has been systematically evaluated on three standard data sets: Wiki, MIRFlickr, and NUS-WIDE, and the results show that our method significantly outperforms the state-of-the-art multimodality hashing techniques.

...read moreread less

Journal Article•DOI•

Discrete Nonnegative Spectral Clustering

[...]

Yang Yang, Fumin Shen, Zi Huang¹, Heng Tao Shen, Xuelong Li² - Show less +1 more•Institutions (2)

University of Queensland¹, Chinese Academy of Sciences²

01 Sep 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel spectral clustering scheme is proposed which deeply explores cluster label properties, including discreteness, nonnegativity, and discrimination, as well as learns robust out-of-sample prediction functions and preserves the natural nonnegative characteristic of the clustering labels.

...read moreread less

Abstract: Spectral clustering has been playing a vital role in various research areas. Most traditional spectral clustering algorithms comprise two independent stages (e.g., first learning continuous labels and then rounding the learned labels into discrete ones), which may cause unpredictable deviation of resultant cluster labels from genuine ones, thereby leading to severe information loss and performance degradation. In this work, we study how to achieve discrete clustering as well as reliably generalize to unseen data. We propose a novel spectral clustering scheme which deeply explores cluster label properties, including discreteness, nonnegativity, and discrimination, as well as learns robust out-of-sample prediction functions. Specifically, we explicitly enforce a discrete transformation on the intermediate continuous labels, which leads to a tractable optimization problem with a discrete solution. Besides, we preserve the natural nonnegative characteristic of the clustering labels to enhance the interpretability of the results. Moreover, to further compensate the unreliability of the learned clustering labels, we integrate an adaptive robust module with $\ell _{2,p}$ loss to learn prediction function for grouping unseen data. We also show that the out-of-sample component can inject discriminative knowledge into the learning of cluster labels under certain conditions. Extensive experiments conducted on various data sets have demonstrated the superiority of our proposal as compared to several existing clustering approaches.

...read moreread less

Proceedings Article•

Compressed K -means for large-scale clustering

[...]

Xiaobo Shen¹, Weiwei Liu², Ivor W. Tsang², Fumin Shen³, Quan-Sen Sun¹ - Show less +1 more•Institutions (3)

Nanjing University of Science and Technology¹, University of Technology, Sydney², University of Electronic Science and Technology of China³

04 Feb 2017

TL;DR: Extensive experimental results demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

...read moreread less

Abstract: Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

...read moreread less

Proceedings Article•DOI•

Discretely Coding Semantic Rank Orders for Supervised Image Hashing

[...]

Li Liu¹, Ling Shao¹, Fumin Shen, Mengyang Yu²•Institutions (2)

University of East Anglia¹, ETH Zurich²

21 Jul 2017

TL;DR: A novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes with quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly.

...read moreread less

Abstract: Learning to hash has been recognized to accomplish highly efficient storage and retrieval for large-scale visual data. Particularly, ranking-based hashing techniques have recently attracted broad research attention because ranking accuracy among the retrieved data is well explored and their objective is more applicable to realistic search tasks. However, directly optimizing discrete hash codes without continuous-relaxations on a nonlinear ranking objective is infeasible by either traditional optimization methods or even recent discrete hashing algorithms. To address this challenging issue, in this paper, we introduce a novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes. In DSeRH, a generalized Adaptive Discrete Minimization (ADM) approach is proposed to discretely optimize binary codes with the quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly. Additionally, instead of using 0/1 independent labels to form rank orders as in previous works, we generate the listwise rank orders from the high-level semantic word embeddings which can quantitatively capture the intrinsic correlation between different categories. We evaluate our DSeRH, coupled with both linear and deep convolutional neural network (CNN) hash functions, on three image datasets, i.e., CIFAR-10, SUN397 and ImageNet100, and the results manifest that DSeRH can outperform the state-of-the-art ranking-based hashing methods.

...read moreread less

Journal Article•DOI•

Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge

[...]

Mengqiu Hu¹, Yang Yang¹, Fumin Shen¹, Luming Zhang², Heng Tao Shen¹, Xuelong Li³ - Show less +2 more•Institutions (3)

University of Electronic Science and Technology of China¹, Hefei University of Technology², Chinese Academy of Sciences³

01 Oct 2017-IEEE Transactions on Image Processing

TL;DR: This paper proposes an effective and robust scheme, termed robust multi-view semi-supervised learning (RMSL), for facilitating image annotation task, and exploits both labeled images and unlabeled images to uncover the intrinsic data structural information.

...read moreread less

Abstract: Driven by the rapid development of Internet and digital technologies, we have witnessed the explosive growth of Web images in recent years. Seeing that labels can reflect the semantic contents of the images, automatic image annotation, which can further facilitate the procedure of image semantic indexing, retrieval, and other image management tasks, has become one of the most crucial research directions in multimedia. Most of the existing annotation methods, heavily rely on well-labeled training data (expensive to collect) and/or single view of visual features (insufficient representative power). In this paper, inspired by the promising advance of feature engineering (e.g., CNN feature and scale-invariant feature transform feature) and inexhaustible image data (associated with noisy and incomplete labels) on the Web, we propose an effective and robust scheme, termed robust multi-view semi-supervised learning (RMSL), for facilitating image annotation task. Specifically, we exploit both labeled images and unlabeled images to uncover the intrinsic data structural information. Meanwhile, to comprehensively describe an individual datum, we take advantage of the correlated and complemental information derived from multiple facets of image data (i.e., multiple views or features). We devise a robust pairwise constraint on outcomes of different views to achieve annotation consistency. Furthermore, we integrate a robust classifier learning component via $\ell _{2,p}$ loss, which can provide effective noise identification power during the learning process. Finally, we devise an efficient iterative algorithm to solve the optimization problem in RMSL. We conduct comprehensive experiments on three different data sets, and the results illustrate that our proposed approach is promising for automatic image annotation.

...read moreread less

Journal Article•DOI•

Joint Hierarchical Category Structure Learning and Large-Scale Image Classification

[...]

Yanyun Qu¹, Li Lin¹, Fumin Shen, Chang Lu¹, Yang Wu², Yuan Xie³, Dacheng Tao⁴ - Show less +3 more•Institutions (4)

Xiamen University¹, Nara Institute of Science and Technology², Chinese Academy of Sciences³, University of Technology, Sydney⁴

01 Sep 2017-IEEE Transactions on Image Processing

TL;DR: The experimental results show that the proposed novel image classification method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.

...read moreread less

Abstract: We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. The experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.

...read moreread less

Proceedings Article•DOI•

Classification by Retrieval: Binarizing Data and Classifiers

[...]

Fumin Shen¹, Yadong Mu², Yang Yang¹, Wei Liu³, Li Liu, Jingkuan Song¹, Heng Tao Shen¹ - Show less +3 more•Institutions (3)

University of Electronic Science and Technology of China¹, Peking University², Tencent³

07 Aug 2017

TL;DR: A generic formulation that significantly expedites the training and deployment of image classification models, particularly under the scenarios of many image categories and high feature dimensions, and proposes a novel bit-flipping procedure which enjoys high efficacy and a local optimality guarantee.

...read moreread less

Abstract: This paper proposes a generic formulation that significantly expedites the training and deployment of image classification models, particularly under the scenarios of many image categories and high feature dimensions. As the core idea, our method represents both the images and learned classifiers using binary hash codes, which are simultaneously learned from the training data. Classifying an image thereby reduces to retrieving its nearest class codes in the Hamming space. Specifically, we formulate multiclass image classification as an optimization problem over binary variables. The optimization alternatingly proceeds over the binary classifiers and image hash codes. Profiting from the special property of binary codes, we show that the sub-problems can be efficiently solved through either a binary quadratic program (BQP) or a linear program. In particular, for attacking the BQP problem, we propose a novel bit-flipping procedure which enjoys high efficacy and a local optimality guarantee. Our formulation supports a large family of empirical loss functions and is, in specific, instantiated by exponential and linear losses. Comprehensive evaluations are conducted on several representative image benchmarks. The experiments consistently exhibit reduced computational and memory complexities of model training and deployment, without sacrificing classification accuracy.

...read moreread less

Journal Article•DOI•

Joint Hierarchical Category Structure Learning and Large-Scale Image Classification

[...]

Yanyun Qu¹, Li Lin¹, Fumin Shen, Chang Lu¹, Yang Wu², Yuan Xie³, Dacheng Tao⁴ - Show less +3 more•Institutions (4)

Xiamen University¹, Nara Institute of Science and Technology², Chinese Academy of Sciences³, University of Technology, Sydney⁴

15 Sep 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a novel image classification method based on learning hierarchical inter-class structures, where a visual tree is constructed by hierarchical spectral clustering and a test sample label is efficiently predicted by searching for the best path over the entire tree.

...read moreread less

Abstract: We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. Experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.

...read moreread less

Proceedings Article•DOI•

Binary Coding for Partial Action Analysis with Limited Observation Ratios

[...]

Jie Qin¹, Li Liu², Ling Shao², Bingbing Ni³, Chen Chen⁴, Fumin Shen, Yunhong Wang¹ - Show less +3 more•Institutions (4)

Beihang University¹, University of East Anglia², Shanghai Jiao Tong University³, University of Central Florida⁴

01 Jul 2017

TL;DR: This work proposes a novel approach, named Partial Reconstructive Binary Coding (PRBC), for action analysis based on limited frame glimpses during any period of the complete execution, via a joint learning framework, which collaboratively tackles feature reconstruction as well as binary coding.

...read moreread less

Abstract: Traditional action recognition methods aim to recognize actions with complete observations/executions. However, it is often difficult to capture fully executed actions due to occlusions, interruptions, etc. Meanwhile, action prediction/recognition in advance based on partial observations is essential for preventing the situation from deteriorating. Besides, fast spotting human activities using partially observed data is a critical ingredient for retrieval systems. Inspired by the recent success of data binarization in efficient retrieval/recognition, we propose a novel approach, named Partial Reconstructive Binary Coding (PRBC), for action analysis based on limited frame glimpses during any period of the complete execution. Specifically, we learn discriminative compact binary codes for partial actions via a joint learning framework, which collaboratively tackles feature reconstruction as well as binary coding. We obtain the solution to PRBC based on a discrete alternating iteration algorithm. Extensive experiments on four realistic action datasets in terms of three tasks (i.e., partial action retrieval, recognition and prediction) clearly show the superiority of PRBC over the state-of-the-art methods, along with significantly reduced memory load and computational costs during the online test.

...read moreread less

Proceedings Article•DOI•

Attribute hashing for zero-shot image retrieval

[...]

Xu Yahui¹, Yang Yang¹, Fumin Shen¹, Xing Xu¹, Yuxuan Zhou¹, Heng Tao Shen¹ - Show less +2 more•Institutions (1)

University of Electronic Science and Technology of China¹

01 Jul 2017

TL;DR: This work proposes a multi-layer hierarchy for hashing, which fully exploits attributes to model the relationships among visual features, binary codes and labels, and deliberately preserves the nature of hash codes to the greatest extent.

...read moreread less

Abstract: Hashing has been recognized as one of the most promising ways in indexing and retrieving high-dimensional data due to the excellent merits in efficiency and effectiveness. Nevertheless, most existing approaches inevitably suffer from the problem of “semantic gap”, especially when facing the rapid evolution of newly-emerging “unseen” categories on the Web. In this work, we propose an innovative approach, termed Attribute Hashing (AH), to facilitate zero-shot image retrieval (i.e., query by “unseen” images). In particular, we propose a multi-layer hierarchy for hashing, which fully exploits attributes to model the relationships among visual features, binary codes and labels. Besides, we deliberately preserve the nature of hash codes (i.e., discreteness and local structure) to the greatest extent. We conduct extensive experiments on several real-world image datasets to show the superiority of our proposed AH approach as compared to the state-of-the-arts.

...read moreread less

Posted Content•

Towards Automatic Construction of Diverse, High-quality Image Dataset

[...]

Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, Heng Tao Shen - Show less +3 more

22 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors formulate noisy textual queries removing and noisy images filtering as a multi-view and multi-instance learning problem separately and propose a novel image dataset construction framework by employing multiple textual queries.

...read moreread less

Abstract: The availability of labeled image datasets has been shown critical for high-level image understanding, which continuously drives the progress of feature designing and models developing. However, constructing labeled image datasets is laborious and monotonous. To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual queries. We aim at collecting diverse and accurate images for given queries from the Web. Specifically, we formulate noisy textual queries removing and noisy images filtering as a multi-view and multi-instance learning problem separately. Our proposed approach not only improves the accuracy but also enhances the diversity of the selected images. To verify the effectiveness of our proposed approach, we construct an image dataset with 100 categories. The experiments show significant performance gains by using the generated data of our approach on several tasks, such as image classification, cross-dataset generalization, and object detection. The proposed method also consistently outperforms existing weakly supervised and web-supervised approaches.

...read moreread less

Posted Content•

From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis

[...]

Yang Long¹, Li Liu², Ling Shao², Fumin Shen³, Guiguang Ding⁴, Jungong Han⁵ - Show less +2 more•Institutions (5)

University of Sheffield¹, University of East Anglia², University of Electronic Science and Technology of China³, Tsinghua University⁴, Northumbria University⁵

04 May 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors proposed a new zero-shot learning framework that can synthesise visual features for unseen classes without acquiring real images, using the proposed Unseen Visual Data Synthesis (UVDS) algorithm, where semantic attributes are effectively utilized as an intermediate clue to synthesise unseen visual features at the training stage.

...read moreread less

Abstract: Robust object recognition systems usually rely on powerful feature extraction mechanisms from a large number of real images. However, in many realistic applications, collecting sufficient images for ever-growing new classes is unattainable. In this paper, we propose a new Zero-shot learning (ZSL) framework that can synthesise visual features for unseen classes without acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS) algorithm, semantic attributes are effectively utilised as an intermediate clue to synthesise unseen visual features at the training stage. Hereafter, ZSL recognition is converted into the conventional supervised problem, i.e. the synthesised visual features can be straightforwardly fed to typical classifiers such as SVM. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data. Extensive experimental results suggest that our proposed approach significantly improve the state-of-the-art results.

...read moreread less

Journal Article•DOI•

Multi-view feature selection and classification for Alzheimer's Disease diagnosis

[...]

Ming-Xing Zhang¹, Yang Yang¹, Fumin Shen¹, Hanwang Zhang², Yuan Wang² - Show less +1 more•Institutions (2)

University of Electronic Science and Technology of China¹, National University of Singapore²

01 Apr 2017-Multimedia Tools and Applications

TL;DR: This paper proposes a novel multi-view classification method based on l2,p -norm regularization for Alzheimer’s Disease (AD) diagnosis and experimentally demonstrated that this method enhances the performance of disease status classification, comparing to the state-of-the-art methods.

...read moreread less

Abstract: In our present society, Alzheimer's disease (AD) is the most common dementia form in elderly people and has been a big social health problem worldwide. In this paper, we propose a novel multi-view classification method based on l2,p -norm regularization for Alzheimer's Disease (AD) diagnosis. Unlike the previous l2,1 -norm regularized methods using concatenated multi-view features, we further consider the intra-structure and inter-structure relations between features of different views and use a more flexible l2,p -norm regularization in our objective function. We also proposed a more suitable loss function to measure the loss between labels and predicted values for classification task. It experimentally demonstrated that this method enhances the performance of disease status classification, comparing to the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

A new web-supervised method for image dataset constructions

[...]

Yazhou Yao¹, Jian Zhang¹, Fumin Shen², Xian-Sheng Hua³, Jingsong Xu¹, Zhenmin Tang⁴ - Show less +2 more•Institutions (4)

University of Technology, Sydney¹, University of Electronic Science and Technology of China², Alibaba Group³, Nanjing University of Science and Technology⁴

02 May 2017-Neurocomputing

TL;DR: A novel automatic image dataset construction framework is proposed by employing multiple query expansions, which indicates the effectiveness of the method is superior to weak supervised and web supervised state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Transductive Visual-Semantic Embedding for Zero-shot Learning

[...]

Xing Xu¹, Fumin Shen¹, Yang Yang¹, Jie Shao¹, Zi Huang² - Show less +1 more•Institutions (2)

University of Electronic Science and Technology of China¹, University of Queensland²

06 Jun 2017

TL;DR: This work proposes a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL that achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks.

...read moreread less

Abstract: Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (eg, attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes Most existing ZSL approaches achieve this by learning a projection from the visual feature space to the semantic representation space based on the source instances, and directly applying it to the target instances However, the intrinsic manifold structures residing in both semantic representations and visual features are not effectively incorporated into the learned projection function Moreover, these methods may suffer from the inherent projection shift problem, due to the disjointness between seen and unseen classes To overcome these drawbacks, we propose a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL In specific, TVSE first learns a latent embedding space to incorporate the manifold structures in both labeled source instances and unlabeled target instances under the transductive setting In the learned space, each instance is viewed as a mixture of seen class scores TVSE then effectively constructs the relational mapping between seen and unseen classes using the available semantic representations, and applies it to map the seen class scores of the target instances to their predictions of unseen classes Extensive experiments on four benchmark datasets demonstrate that the proposed TVSE achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks

...read moreread less

Journal Article•DOI•

Bilinear Optimized Product Quantization for Scalable Visual Content Analysis

[...]

Litao Yu¹, Zi Huang¹, Fumin Shen², Jingkuan Song², Heng Tao Shen², Xiaofang Zhou¹ - Show less +2 more•Institutions (2)

University of Queensland¹, University of Electronic Science and Technology of China²

30 Jun 2017-IEEE Transactions on Image Processing

TL;DR: A novel PQ method based on bilinear projection, which can well exploit the natural data structure and reduce the computational complexity, and achieves competitive retrieval and classification accuracies while having significant lower time and space complexities.

...read moreread less

Abstract: Product quantization (PQ) has been recognized as a useful technique to encode visual feature vectors into compact codes to reduce both the storage and computation cost. Recent advances in retrieval and vision tasks indicate that high-dimensional descriptors are critical to ensuring high accuracy on large-scale data sets. However, optimizing PQ codes with high-dimensional data is extremely time-consuming and memory-consuming. To solve this problem, in this paper, we present a novel PQ method based on bilinear projection, which can well exploit the natural data structure and reduce the computational complexity. Specifically, we learn a global bilinear projection for PQ, where we provide both non-parametric and parametric solutions. The non-parametric solution does not need any data distribution assumption. The parametric solution can avoid the problem of local optima caused by random initialization, and enjoys a theoretical error bound. Besides, we further extend this approach by learning locally bilinear projections to fit underlying data distributions. We show by extensive experiments that our proposed method, dubbed bilinear optimization product quantization, achieves competitive retrieval and classification accuracies while having significant lower time and space complexities.

...read moreread less

Journal Article•DOI•

Large-scale image retrieval with supervised sparse hashing

[...]

Yan Xu¹, Fumin Shen², Xing Xu¹, Lianli Gao¹, Yuan Wang³, Xiao Tan⁴ - Show less +2 more•Institutions (4)

University of Electronic Science and Technology of China¹, Nanjing University of Science and Technology², National University of Singapore³, University of Hong Kong⁴

15 Mar 2017-Neurocomputing

TL;DR: This paper reduces the effective number of parameters involved in the learned projection matrix according to sparsity regularizer, which helps avoid overfitting problem and effectively lower both the storage and computation cost.

...read moreread less

Journal Article•DOI•

Indexing and mining large-scale neuron databases using maximum inner product search

[...]

Zhongyu Li¹, Ruogu Fang², Fumin Shen³, Amin Katouzian⁴, Shaoting Zhang¹ - Show less +1 more•Institutions (4)

University of North Carolina at Charlotte¹, Florida International University², University of Electronic Science and Technology of China³, IBM⁴

01 Mar 2017-Pattern Recognition

TL;DR: An asymmetric binary coding strategy based on the maximum inner product search (MIPS) is employed, which not only makes it easier to learn the binary coding functions, but also preserves the non-linear characteristics of the neuron morphological data.

...read moreread less