scispace - formally typeset
Search or ask a question

Showing papers by "Fumin Shen published in 2017"


Journal ArticleDOI
TL;DR: A novel cross- modal hashing method, termed discrete cross-modal hashing (DCH), which directly learns discriminative binary codes while retaining the discrete constraints, and an effective discrete optimization algorithm is developed for DCH to jointly learn the modality-specific hash function and the unified binary codes.
Abstract: Hashing based methods have attracted considerable attention for efficient cross-modal retrieval on large-scale multimedia data. The core problem of cross-modal hashing is how to learn compact binary codes that construct the underlying correlations between heterogeneous features from different modalities. A majority of recent approaches aim at learning hash functions to preserve the pairwise similarities defined by given class labels. However, these methods fail to explicitly explore the discriminative property of class labels during hash function learning. In addition, they usually discard the discrete constraints imposed on the to-be-learned binary codes, and compromise to solve a relaxed problem with quantization to obtain the approximate binary solution. Therefore, the binary codes generated by these methods are suboptimal and less discriminative to different classes. To overcome these drawbacks, we propose a novel cross-modal hashing method, termed discrete cross-modal hashing (DCH), which directly learns discriminative binary codes while retaining the discrete constraints. Specifically, DCH learns modality-specific hash functions for generating unified binary codes, and these binary codes are viewed as representative features for discriminative classification with class labels. An effective discrete optimization algorithm is developed for DCH to jointly learn the modality-specific hash function and the unified binary codes. Extensive experiments on three benchmark data sets highlight the superiority of DCH under various cross-modal scenarios and show its state-of-the-art performance.

358 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper introduces a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework, and is the first hashing work specifically designed for category-level SBIR with an end to end deep architecture.
Abstract: Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named Deep Sketch Hashing (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSHs superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.

221 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper explores zero-shot action recognition from a novel perspective by adopting the Error-Correcting Output Codes (dubbed ZSECOC), which equips the conventional ECOC with the additional capability of ZSAR, by addressing the domain shift problem.
Abstract: Recently, zero-shot action recognition (ZSAR) has emerged with the explosive growth of action categories. In this paper, we explore ZSAR from a novel perspective by adopting the Error-Correcting Output Codes (dubbed ZSECOC). Our ZSECOC equips the conventional ECOC with the additional capability of ZSAR, by addressing the domain shift problem. In particular, we learn discriminative ZSECOC for seen categories from both category-level semantics and intrinsic data structures. This procedure deals with domain shift implicitly by transferring the well-established correlations among seen categories to unseen ones. Moreover, a simple semantic transfer strategy is developed for explicitly transforming the learned embeddings of seen categories to better fit the underlying structure of unseen categories. As a consequence, our ZSECOC inherits the promising characteristics from ECOC as well as overcomes domain shift, making it more discriminative for ZSAR. We systematically evaluate ZSECOC on three realistic action benchmarks, i.e. Olympic Sports, HMDB51 and UCF101. The experimental results clearly show the superiority of ZSECOC over the state-of-the-art methods.

157 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: A novel projection framework based on matrix tri-factorization with manifold regularizations for zero-shot learning that significantly outperforms the state-of-the-arts and devise an effective prediction scheme by exploiting the test-time manifold structure.
Abstract: Zero-shot learning (ZSL) aims to recognize objects of unseen classes with available training data from another set of seen classes. Existing solutions are focused on exploring knowledge transfer via an intermediate semantic embedding (e.g., attributes) shared between seen and unseen classes. In this paper, we propose a novel projection framework based on matrix tri-factorization with manifold regularizations. Specifically, we learn the semantic embedding projection by decomposing the visual feature matrix under the guidance of semantic embedding and class label matrices. By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces. To avoid the projection domain shift problem, we devise an effective prediction scheme by exploiting the test-time manifold structure. Extensive experiments on four benchmark datasets show that our approach significantly outperforms the state-of-the-arts, yielding an average improvement ratio by 7.4% and 31.9% for the recognition and retrieval task, respectively.

136 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: A new Zero-shot learning framework that can synthesise visual features for unseen classes without acquiring real images is proposed and extensive experimental results manifest that the proposed approach significantly improve the state-of-the-art results.
Abstract: Robust object recognition systems usually rely on powerful feature extraction mechanisms from a large number of real images. However, in many realistic applications, collecting sufficient images for ever-growing new classes is unattainable. In this paper, we propose a new Zero-shot learning (ZSL) framework that can synthesise visual features for unseen classes without acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS) algorithm, semantic attributes are effectively utilised as an intermediate clue to synthesise unseen visual features at the training stage. Hereafter, ZSL recognition is converted into the conventional supervised problem, i.e. the synthesised visual features can be straightforwardly fed to typical classifiers such as SVM. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data. Extensive experimental results manifest that our proposed approach significantly improve the state-of-the-art results.

130 citations


Journal ArticleDOI
TL;DR: A general binary coding framework based on asymmetric hash functions, named asymmetric inner-product binary coding (AIBC), which extends the AIBC approach to the supervised hashing scenario, where the inner products of learned binary codes are forced to fit the supervised similarities.
Abstract: Learning to hash has attracted broad research interests in recent computer vision and machine learning studies, due to its ability to accomplish efficient approximate nearest neighbor search. However, the closely related task, maximum inner product search (MIPS), has rarely been studied in this literature. To facilitate the MIPS study, in this paper, we introduce a general binary coding framework based on asymmetric hash functions, named asymmetric inner-product binary coding (AIBC). In particular, AIBC learns two different hash functions, which can reveal the inner products between original data vectors by the generated binary vectors. Although conceptually simple, the associated optimization is very challenging due to the highly nonsmooth nature of the objective that involves sign functions. We tackle the nonsmooth optimization in an alternating manner, by which each single coding function is optimized in an efficient discrete manner. We also simplify the objective by discarding the quadratic regularization term which significantly boosts the learning efficiency. Both problems are optimized in an effective discrete way without continuous relaxations, which produces high-quality hash codes. In addition, we extend the AIBC approach to the supervised hashing scenario, where the inner products of learned binary codes are forced to fit the supervised similarities. Extensive experiments on several benchmark image retrieval databases validate the superiority of the AIBC approaches over many recently proposed hashing algorithms.

121 citations


Journal ArticleDOI
TL;DR: This paper proposes an unsupervised hashing method for semi-paired cross-view retrieval, dubbed semi-PAired discrete hashing (SPDH), and explores the underlying structure of the constructed common latent subspace, where both paired and unpaired samples are well aligned.
Abstract: Due to the significant reduction in computational cost and storage, hashing techniques have gained increasing interests in facilitating large-scale cross-view retrieval tasks. Most cross-view hashing methods are developed by assuming that data from different views are well paired, e.g., text-image pairs. In real-world applications, however, this fully-paired multiview setting may not be practical. The more practical yet challenging semi-paired cross-view retrieval problem, where pairwise correspondences are only partially provided, has less been studied. In this paper, we propose an unsupervised hashing method for semi-paired cross-view retrieval, dubbed semi-paired discrete hashing (SPDH). In specific, SPDH explores the underlying structure of the constructed common latent subspace, where both paired and unpaired samples are well aligned. To effectively preserve the similarities of semi-paired data in the latent subspace, we construct the cross-view similarity graph with the help of anchor data pairs. SPDH jointly learns the latent features and hash codes with a factorization-based coding scheme. For the formulated objective function, we devise an efficient alternating optimization algorithm, where the key binary code learning problem is solved in a bit-by-bit manner with each bit generated with a closed-form solution. The proposed method is extensively evaluated on four benchmark datasets with both fully-paired and semi-paired settings and the results demonstrate the superiority of SPDH over several other state-of-the-art methods in term of both accuracy and scalability.

110 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel method by exploiting the low-rankness of both the data representation and each occlusion-induced error image simultaneously, by which the global structure of data together with the error images can be well captured.

108 citations


Proceedings ArticleDOI
23 Oct 2017
TL;DR: This work proposes a novel Deep Asymmetric Pairwise Hashing approach (DAPH) for supervised hashing, and devise an efficient alternating algorithm to optimize the asymmetric deep hash functions and high-quality binary code jointly.
Abstract: Recently, deep neural networks based hashing methods have greatly improved the multimedia retrieval performance by simultaneously learning feature representations and binary hash functions. Inspired by the latest advance in the asymmetric hashing scheme, in this work, we propose a novel Deep Asymmetric Pairwise Hashing approach (DAPH) for supervised hashing. The core idea is that two deep convolutional models are jointly trained such that their output codes for a pair of images can well reveal the similarity indicated by their semantic labels. A pairwise loss is elaborately designed to preserve the pairwise similarities between images as well as incorporating the independence and balance hash code learning criteria. By taking advantage of the flexibility of asymmetric hash functions, we devise an efficient alternating algorithm to optimize the asymmetric deep hash functions and high-quality binary code jointly. Experiments on three image benchmarks show that DAPH achieves the state-of-the-art performance on large-scale image retrieval.

105 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: A novel Unsupervised Cross-modal retrieval method based on Adversarial Learning, namely UCAL is proposed, which adds an additional regularization by introducing adversarial learning and introduces a modality classifier to predict the modality of a transformed feature.
Abstract: The core of existing cross-modal retrieval approaches is to close the gap between different modalities either by finding a maximally correlated subspace or by jointly learning a set of dictionaries. However, the statistical characteristics of the transformed features were never considered. Inspired by recent advances in adversarial learning and domain adaptation, we propose a novel Unsupervised Cross-modal retrieval method based on Adversarial Learning, namely UCAL. In addition to maximizing the correlations between modalities, we add an additional regularization by introducing adversarial learning. In particular, we introduce a modality classifier to predict the modality of a transformed feature. This can be viewed as a regularization on the statistical aspect of the feature transforms, which ensures that the transformed features are also statistically indistinguishable. Experiments on popular multimodal datasets show that UCAL achieves competitive performance compared to state of the art supervised cross-modal retrieval methods.

90 citations


Posted Content
TL;DR: Deep Sketch Hashing (DSH) as mentioned in this paper uses three convolutional neural networks to encode free-hand sketches, natural images and auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion.
Abstract: Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSH's superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.

Journal ArticleDOI
TL;DR: This work proposes to solve the employed problems by the cutting-plane and concave-convex procedure algorithm and builds an image dataset with 20 categories that can be generalized well to unseen target domains.
Abstract: Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time-consuming and labor intensive. To reduce the cost of manual labeling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the “dataset bias problem.” To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually nonsalient and less relevant expansions are filtered out. By treating each selected expansion as a “bag” and the retrieved images as “instances,” image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison, and object detection demonstrate the domain robustness of our dataset.

Journal ArticleDOI
TL;DR: This paper introduces a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities and significantly outperforms the state-of-the-art multimodality hashing techniques.
Abstract: With the dramatic development of the Internet, how to exploit large-scale retrieval techniques for multimodal web data has become one of the most popular but challenging problems in computer vision and multimedia. Recently, hashing methods are used for fast nearest neighbor search in large-scale data spaces, by embedding high-dimensional feature descriptors into a similarity preserving Hamming space with a low dimension. Inspired by this, in this paper, we introduce a novel supervised cross-modality hashing framework, which can generate unified binary codes for instances represented in different modalities. Particularly, in the learning phase, each bit of a code can be sequentially learned with a discrete optimization scheme that jointly minimizes its empirical loss based on a boosting strategy. In a bitwise manner, hash functions are then learned for each modality, mapping the corresponding representations into unified hash codes. We regard this approach as cross-modality sequential discrete hashing (CSDH), which can effectively reduce the quantization errors arisen in the oversimplified rounding-off step and thus lead to high-quality binary codes. In the test phase, a simple fusion scheme is utilized to generate a unified hash code for final retrieval by merging the predicted hashing results of an unseen instance from different modalities. The proposed CSDH has been systematically evaluated on three standard data sets: Wiki, MIRFlickr, and NUS-WIDE, and the results show that our method significantly outperforms the state-of-the-art multimodality hashing techniques.

Journal ArticleDOI
TL;DR: A novel spectral clustering scheme is proposed which deeply explores cluster label properties, including discreteness, nonnegativity, and discrimination, as well as learns robust out-of-sample prediction functions and preserves the natural nonnegative characteristic of the clustering labels.
Abstract: Spectral clustering has been playing a vital role in various research areas. Most traditional spectral clustering algorithms comprise two independent stages (e.g., first learning continuous labels and then rounding the learned labels into discrete ones), which may cause unpredictable deviation of resultant cluster labels from genuine ones, thereby leading to severe information loss and performance degradation. In this work, we study how to achieve discrete clustering as well as reliably generalize to unseen data. We propose a novel spectral clustering scheme which deeply explores cluster label properties, including discreteness, nonnegativity, and discrimination, as well as learns robust out-of-sample prediction functions. Specifically, we explicitly enforce a discrete transformation on the intermediate continuous labels, which leads to a tractable optimization problem with a discrete solution. Besides, we preserve the natural nonnegative characteristic of the clustering labels to enhance the interpretability of the results. Moreover, to further compensate the unreliability of the learned clustering labels, we integrate an adaptive robust module with $\ell _{2,p}$ loss to learn prediction function for grouping unseen data. We also show that the out-of-sample component can inject discriminative knowledge into the learning of cluster labels under certain conditions. Extensive experiments conducted on various data sets have demonstrated the superiority of our proposal as compared to several existing clustering approaches.

Proceedings Article
04 Feb 2017
TL;DR: Extensive experimental results demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.
Abstract: Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: A novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes with quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly.
Abstract: Learning to hash has been recognized to accomplish highly efficient storage and retrieval for large-scale visual data. Particularly, ranking-based hashing techniques have recently attracted broad research attention because ranking accuracy among the retrieved data is well explored and their objective is more applicable to realistic search tasks. However, directly optimizing discrete hash codes without continuous-relaxations on a nonlinear ranking objective is infeasible by either traditional optimization methods or even recent discrete hashing algorithms. To address this challenging issue, in this paper, we introduce a novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes. In DSeRH, a generalized Adaptive Discrete Minimization (ADM) approach is proposed to discretely optimize binary codes with the quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly. Additionally, instead of using 0/1 independent labels to form rank orders as in previous works, we generate the listwise rank orders from the high-level semantic word embeddings which can quantitatively capture the intrinsic correlation between different categories. We evaluate our DSeRH, coupled with both linear and deep convolutional neural network (CNN) hash functions, on three image datasets, i.e., CIFAR-10, SUN397 and ImageNet100, and the results manifest that DSeRH can outperform the state-of-the-art ranking-based hashing methods.

Journal ArticleDOI
TL;DR: This paper proposes an effective and robust scheme, termed robust multi-view semi-supervised learning (RMSL), for facilitating image annotation task, and exploits both labeled images and unlabeled images to uncover the intrinsic data structural information.
Abstract: Driven by the rapid development of Internet and digital technologies, we have witnessed the explosive growth of Web images in recent years. Seeing that labels can reflect the semantic contents of the images, automatic image annotation, which can further facilitate the procedure of image semantic indexing, retrieval, and other image management tasks, has become one of the most crucial research directions in multimedia. Most of the existing annotation methods, heavily rely on well-labeled training data (expensive to collect) and/or single view of visual features (insufficient representative power). In this paper, inspired by the promising advance of feature engineering (e.g., CNN feature and scale-invariant feature transform feature) and inexhaustible image data (associated with noisy and incomplete labels) on the Web, we propose an effective and robust scheme, termed robust multi-view semi-supervised learning (RMSL), for facilitating image annotation task. Specifically, we exploit both labeled images and unlabeled images to uncover the intrinsic data structural information. Meanwhile, to comprehensively describe an individual datum, we take advantage of the correlated and complemental information derived from multiple facets of image data (i.e., multiple views or features). We devise a robust pairwise constraint on outcomes of different views to achieve annotation consistency. Furthermore, we integrate a robust classifier learning component via $\ell _{2,p}$ loss, which can provide effective noise identification power during the learning process. Finally, we devise an efficient iterative algorithm to solve the optimization problem in RMSL. We conduct comprehensive experiments on three different data sets, and the results illustrate that our proposed approach is promising for automatic image annotation.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed novel image classification method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.
Abstract: We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. The experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.

Proceedings ArticleDOI
07 Aug 2017
TL;DR: A generic formulation that significantly expedites the training and deployment of image classification models, particularly under the scenarios of many image categories and high feature dimensions, and proposes a novel bit-flipping procedure which enjoys high efficacy and a local optimality guarantee.
Abstract: This paper proposes a generic formulation that significantly expedites the training and deployment of image classification models, particularly under the scenarios of many image categories and high feature dimensions. As the core idea, our method represents both the images and learned classifiers using binary hash codes, which are simultaneously learned from the training data. Classifying an image thereby reduces to retrieving its nearest class codes in the Hamming space. Specifically, we formulate multiclass image classification as an optimization problem over binary variables. The optimization alternatingly proceeds over the binary classifiers and image hash codes. Profiting from the special property of binary codes, we show that the sub-problems can be efficiently solved through either a binary quadratic program (BQP) or a linear program. In particular, for attacking the BQP problem, we propose a novel bit-flipping procedure which enjoys high efficacy and a local optimality guarantee. Our formulation supports a large family of empirical loss functions and is, in specific, instantiated by exponential and linear losses. Comprehensive evaluations are conducted on several representative image benchmarks. The experiments consistently exhibit reduced computational and memory complexities of model training and deployment, without sacrificing classification accuracy.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a novel image classification method based on learning hierarchical inter-class structures, where a visual tree is constructed by hierarchical spectral clustering and a test sample label is efficiently predicted by searching for the best path over the entire tree.
Abstract: We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. Experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work proposes a novel approach, named Partial Reconstructive Binary Coding (PRBC), for action analysis based on limited frame glimpses during any period of the complete execution, via a joint learning framework, which collaboratively tackles feature reconstruction as well as binary coding.
Abstract: Traditional action recognition methods aim to recognize actions with complete observations/executions. However, it is often difficult to capture fully executed actions due to occlusions, interruptions, etc. Meanwhile, action prediction/recognition in advance based on partial observations is essential for preventing the situation from deteriorating. Besides, fast spotting human activities using partially observed data is a critical ingredient for retrieval systems. Inspired by the recent success of data binarization in efficient retrieval/recognition, we propose a novel approach, named Partial Reconstructive Binary Coding (PRBC), for action analysis based on limited frame glimpses during any period of the complete execution. Specifically, we learn discriminative compact binary codes for partial actions via a joint learning framework, which collaboratively tackles feature reconstruction as well as binary coding. We obtain the solution to PRBC based on a discrete alternating iteration algorithm. Extensive experiments on four realistic action datasets in terms of three tasks (i.e., partial action retrieval, recognition and prediction) clearly show the superiority of PRBC over the state-of-the-art methods, along with significantly reduced memory load and computational costs during the online test.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work proposes a multi-layer hierarchy for hashing, which fully exploits attributes to model the relationships among visual features, binary codes and labels, and deliberately preserves the nature of hash codes to the greatest extent.
Abstract: Hashing has been recognized as one of the most promising ways in indexing and retrieving high-dimensional data due to the excellent merits in efficiency and effectiveness. Nevertheless, most existing approaches inevitably suffer from the problem of “semantic gap”, especially when facing the rapid evolution of newly-emerging “unseen” categories on the Web. In this work, we propose an innovative approach, termed Attribute Hashing (AH), to facilitate zero-shot image retrieval (i.e., query by “unseen” images). In particular, we propose a multi-layer hierarchy for hashing, which fully exploits attributes to model the relationships among visual features, binary codes and labels. Besides, we deliberately preserve the nature of hash codes (i.e., discreteness and local structure) to the greatest extent. We conduct extensive experiments on several real-world image datasets to show the superiority of our proposed AH approach as compared to the state-of-the-arts.

Posted Content
TL;DR: Zhang et al. as discussed by the authors formulate noisy textual queries removing and noisy images filtering as a multi-view and multi-instance learning problem separately and propose a novel image dataset construction framework by employing multiple textual queries.
Abstract: The availability of labeled image datasets has been shown critical for high-level image understanding, which continuously drives the progress of feature designing and models developing. However, constructing labeled image datasets is laborious and monotonous. To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual queries. We aim at collecting diverse and accurate images for given queries from the Web. Specifically, we formulate noisy textual queries removing and noisy images filtering as a multi-view and multi-instance learning problem separately. Our proposed approach not only improves the accuracy but also enhances the diversity of the selected images. To verify the effectiveness of our proposed approach, we construct an image dataset with 100 categories. The experiments show significant performance gains by using the generated data of our approach on several tasks, such as image classification, cross-dataset generalization, and object detection. The proposed method also consistently outperforms existing weakly supervised and web-supervised approaches.

Posted Content
TL;DR: Zhang et al. as discussed by the authors proposed a new zero-shot learning framework that can synthesise visual features for unseen classes without acquiring real images, using the proposed Unseen Visual Data Synthesis (UVDS) algorithm, where semantic attributes are effectively utilized as an intermediate clue to synthesise unseen visual features at the training stage.
Abstract: Robust object recognition systems usually rely on powerful feature extraction mechanisms from a large number of real images. However, in many realistic applications, collecting sufficient images for ever-growing new classes is unattainable. In this paper, we propose a new Zero-shot learning (ZSL) framework that can synthesise visual features for unseen classes without acquiring real images. Using the proposed Unseen Visual Data Synthesis (UVDS) algorithm, semantic attributes are effectively utilised as an intermediate clue to synthesise unseen visual features at the training stage. Hereafter, ZSL recognition is converted into the conventional supervised problem, i.e. the synthesised visual features can be straightforwardly fed to typical classifiers such as SVM. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data. Extensive experimental results suggest that our proposed approach significantly improve the state-of-the-art results.

Journal ArticleDOI
TL;DR: This paper proposes a novel multi-view classification method based on l2,p -norm regularization for Alzheimer’s Disease (AD) diagnosis and experimentally demonstrated that this method enhances the performance of disease status classification, comparing to the state-of-the-art methods.
Abstract: In our present society, Alzheimer's disease (AD) is the most common dementia form in elderly people and has been a big social health problem worldwide. In this paper, we propose a novel multi-view classification method based on l2,p -norm regularization for Alzheimer's Disease (AD) diagnosis. Unlike the previous l2,1 -norm regularized methods using concatenated multi-view features, we further consider the intra-structure and inter-structure relations between features of different views and use a more flexible l2,p -norm regularization in our objective function. We also proposed a more suitable loss function to measure the loss between labels and predicted values for classification task. It experimentally demonstrated that this method enhances the performance of disease status classification, comparing to the state-of-the-art methods.

Journal ArticleDOI
TL;DR: A novel automatic image dataset construction framework is proposed by employing multiple query expansions, which indicates the effectiveness of the method is superior to weak supervised and web supervised state-of-the-art methods.

Proceedings ArticleDOI
06 Jun 2017
TL;DR: This work proposes a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL that achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks.
Abstract: Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (eg, attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes Most existing ZSL approaches achieve this by learning a projection from the visual feature space to the semantic representation space based on the source instances, and directly applying it to the target instances However, the intrinsic manifold structures residing in both semantic representations and visual features are not effectively incorporated into the learned projection function Moreover, these methods may suffer from the inherent projection shift problem, due to the disjointness between seen and unseen classes To overcome these drawbacks, we propose a novel framework termed transductive visual-semantic embedding (TVSE) for ZSL In specific, TVSE first learns a latent embedding space to incorporate the manifold structures in both labeled source instances and unlabeled target instances under the transductive setting In the learned space, each instance is viewed as a mixture of seen class scores TVSE then effectively constructs the relational mapping between seen and unseen classes using the available semantic representations, and applies it to map the seen class scores of the target instances to their predictions of unseen classes Extensive experiments on four benchmark datasets demonstrate that the proposed TVSE achieves competitive performance compared with the state-of-the-arts for zero-shot recognition and retrieval tasks

Journal ArticleDOI
TL;DR: A novel PQ method based on bilinear projection, which can well exploit the natural data structure and reduce the computational complexity, and achieves competitive retrieval and classification accuracies while having significant lower time and space complexities.
Abstract: Product quantization (PQ) has been recognized as a useful technique to encode visual feature vectors into compact codes to reduce both the storage and computation cost. Recent advances in retrieval and vision tasks indicate that high-dimensional descriptors are critical to ensuring high accuracy on large-scale data sets. However, optimizing PQ codes with high-dimensional data is extremely time-consuming and memory-consuming. To solve this problem, in this paper, we present a novel PQ method based on bilinear projection, which can well exploit the natural data structure and reduce the computational complexity. Specifically, we learn a global bilinear projection for PQ, where we provide both non-parametric and parametric solutions. The non-parametric solution does not need any data distribution assumption. The parametric solution can avoid the problem of local optima caused by random initialization, and enjoys a theoretical error bound. Besides, we further extend this approach by learning locally bilinear projections to fit underlying data distributions. We show by extensive experiments that our proposed method, dubbed bilinear optimization product quantization, achieves competitive retrieval and classification accuracies while having significant lower time and space complexities.

Journal ArticleDOI
TL;DR: This paper reduces the effective number of parameters involved in the learned projection matrix according to sparsity regularizer, which helps avoid overfitting problem and effectively lower both the storage and computation cost.

Journal ArticleDOI
TL;DR: An asymmetric binary coding strategy based on the maximum inner product search (MIPS) is employed, which not only makes it easier to learn the binary coding functions, but also preserves the non-linear characteristics of the neuron morphological data.