scispace - formally typeset
Search or ask a question
Author

Pan Yan

Bio: Pan Yan is an academic researcher from Sun Yat-sen University. The author has contributed to research in topics: Image retrieval & Hash function. The author has an hindex of 1, co-authored 2 publications receiving 75 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep architecture that learns instance-aware image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category.
Abstract: Similarity-preserving hashing is a commonly used method for nearest neighbor search in large-scale image retrieval. For image retrieval, deep-network-based hashing methods are appealing, since they can simultaneously learn effective image representations and compact hash codes. This paper focuses on deep-network-based hashing for multi-label images, each of which may contain objects of multiple categories. In most existing hashing methods, each image is represented by one piece of hash code, which is referred to as semantic hashing. This setting may be suboptimal for multi-label image retrieval. To solve this problem, we propose a deep architecture that learns instance-aware image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category. The instance-aware representations not only bring advantages to semantic hashing but also can be used in category-aware hashing, in which an image is represented by multiple pieces of hash codes and each piece of code corresponds to a category. Extensive evaluations conducted on several benchmark data sets demonstrate that for both the semantic hashing and the category-aware hashing, the proposed method shows substantial improvement over the state-of-the-art supervised and unsupervised hashing methods.

96 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep network-based hashing method for multi-label image retrieval, where each image is represented by multiple pieces of hash codes and each piece of code corresponds to a category.
Abstract: Similarity-preserving hashing is a commonly used method for nearest neighbour search in large-scale image retrieval. For image retrieval, deep-networks-based hashing methods are appealing since they can simultaneously learn effective image representations and compact hash codes. This paper focuses on deep-networks-based hashing for multi-label images, each of which may contain objects of multiple categories. In most existing hashing methods, each image is represented by one piece of hash code, which is referred to as semantic hashing. This setting may be suboptimal for multi-label image retrieval. To solve this problem, we propose a deep architecture that learns \textbf{instance-aware} image representations for multi-label image data, which are organized in multiple groups, with each group containing the features for one category. The instance-aware representations not only bring advantages to semantic hashing, but also can be used in category-aware hashing, in which an image is represented by multiple pieces of hash codes and each piece of code corresponds to a category. Extensive evaluations conducted on several benchmark datasets demonstrate that, for both semantic hashing and category-aware hashing, the proposed method shows substantial improvement over the state-of-the-art supervised and unsupervised hashing methods.

39 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed the selective convolutional descriptor aggregation (SCDA) method, which first localizes the main object in fine-grained images and then aggregates the selected descriptors into a short feature vector using the best practices.
Abstract: Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We propose the selective convolutional descriptor aggregation (SCDA) method. The SCDA first localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and the dimensionality is reduced into a short feature vector using the best practices we found. The SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained data sets confirm the effectiveness of the SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA’s high-mean average precision in fine-grained retrieval. Moreover, on general image retrieval data sets, the SCDA achieves comparable retrieval results with the state-of-the-art general image retrieval approaches.

354 citations

Journal ArticleDOI
TL;DR: The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small.
Abstract: Convolutional neural networks (ConvNets) have achieved excellent recognition performance in various visual recognition tasks. A large labeled training set is one of the most important factors for its success. However, it is difficult to collect sufficient training images with precise labels in some domains, such as apparent age estimation, head pose estimation, multilabel classification, and semantic segmentation. Fortunately, there is ambiguous information among labels, which makes these tasks different from traditional classification. Based on this observation, we convert the label of each image into a discrete label distribution, and learn the label distribution by minimizing a Kullback–Leibler divergence between the predicted and ground-truth label distributions using deep ConvNets. The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small. Experimental results show that the proposed approach produces significantly better results than the state-of-the-art methods for age estimation and head pose estimation. At the same time, it also improves recognition performance for multi-label classification and semantic segmentation tasks.

290 citations

Journal ArticleDOI
TL;DR: Deep Label Distribution Learning (LDL) as mentioned in this paper learns the label distribution by minimizing a Kullback-Leibler divergence between the predicted and ground-truth label distributions using deep ConvNets.
Abstract: Convolutional Neural Networks (ConvNets) have achieved excellent recognition performance in various visual recognition tasks. A large labeled training set is one of the most important factors for its success. However, it is difficult to collect sufficient training images with precise labels in some domains such as apparent age estimation, head pose estimation, multi-label classification and semantic segmentation. Fortunately, there is ambiguous information among labels, which makes these tasks different from traditional classification. Based on this observation, we convert the label of each image into a discrete label distribution, and learn the label distribution by minimizing a Kullback-Leibler divergence between the predicted and ground-truth label distributions using deep ConvNets. The proposed DLDL (Deep Label Distribution Learning) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from over-fitting even when the training set is small. Experimental results show that the proposed approach produces significantly better results than state-of-the-art methods for age estimation and head pose estimation. At the same time, it also improves recognition performance for multi-label classification and semantic segmentation tasks.

240 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval, covering different supervision, different networks, different descriptor type and different retrieval type.
Abstract: The content based image retrieval aims to find the similar images from a large scale dataset against a query image. Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. In early days, various hand designed feature descriptors have been investigated based on the visual cues such as color, texture, shape, etc. that represent the images. However, the deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade. It learns the features automatically from the data. This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. The categorization of existing state-of-the-art methods from different perspectives is also performed for greater understanding of the progress. The taxonomy used in this survey covers different supervision, different networks, different descriptor type and different retrieval type. A performance analysis is also performed using the state-of-the-art methods. The insights are also presented for the benefit of the researchers to observe the progress and to make the best choices. The survey presented in this paper will help in further research progress in image retrieval using deep learning.

101 citations

Book ChapterDOI
08 Sep 2018
TL;DR: An adversarial hashing network with an attention mechanism to enhance the measurement of content similarities by selectively focusing on the informative parts of multi-modal data.
Abstract: Due to the rapid growth of multi-modal data, hashing methods for cross-modal retrieval have received considerable attention. However, finding content similarities between different modalities of data is still challenging due to an existing heterogeneity gap. To further address this problem, we propose an adversarial hashing network with an attention mechanism to enhance the measurement of content similarities by selectively focusing on the informative parts of multi-modal data. The proposed new deep adversarial network consists of three building blocks: (1) the feature learning module to obtain the feature representations; (2) the attention module to generate an attention mask, which is used to divide the feature representations into the attended and unattended feature representations; and (3) the hashing module to learn hash functions that preserve the similarities between different modalities. In our framework, the attention and hashing modules are trained in an adversarial way: the attention module attempts to make the hashing module unable to preserve the similarities of multi-modal data w.r.t. the unattended feature representations, while the hashing module aims to preserve the similarities of multi-modal data w.r.t. the attended and unattended feature representations. Extensive evaluations on several benchmark datasets demonstrate that the proposed method brings substantial improvements over other state-of-the-art cross-modal hashing methods.

96 citations