Recurrent Image Annotation with Explicit Inter-label Dependencies (2020) | Ayushi Dutta

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification

[...]

Wei Zhou, Zhiwu Xia, Pengli Dou, Tao Su, Haifeng Hu - Show less +1 more

21 Jul 2022-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: A novel image multi-label classification framework which aims to align Image Semantics with Label Concepts (ISLC) by proposing a residual encoder to learn salient object features in the images, and exploiting the self-attention layer in aligned decoder to automatically capture the correlation between labels.

...read moreread less

Abstract: Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability from training data to construct a pre-defined graph as the input of graph network, which is inflexible and may degrade model generalizability. Moreover, most of the current methods cannot effectively align the learned salient object features with the label concepts, so that the predicted results of model may not be consistent with the image content. Therefore, how to learn the salient semantic features of images and capture the correlation between labels, and then effectively align them is one of the key to improve the performance of image multi-label classification task. To this end, we propose a novel image multi-label classification framework which aims to align Image Semantics with Label Concepts (ISLC). Specifically, we propose a residual encoder to learn salient object features in the images, and exploit the self-attention layer in aligned decoder to automatically capture the correlation between labels. Then, we leverage the cross-attention layers in aligned decoder to align image semantic features with label concepts, so as to make the labels predicted by model more consistent with image content. Finally, the output features of the last layer of residual encoder and aligned decoder are fused to obtain the final output feature for classification. The proposed ISLC model achieves good performance on various prevalent multi-label image datasets such as MS-COCO 2014, PASCAL VOC 2007, VG-500, and NUS-WIDE with 87.2%, 96.9%, 39.4%, and 64.2%, respectively.

...read moreread less

1 citations

Journal Article•DOI•

Attention-Augmented Memory Network for Image Multi-Label Classification

[...]

Wei Zhou, Yanke Hou, Dihu Chen, Haifeng Hu, Tao Su - Show less +1 more

03 Nov 2022-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: Zhang et al. as mentioned in this paper proposed an Attention-Augmented Memory Network (AAMN) model for image multi-label classification task, which first proposes a novel categorical memory module to excavate the contextual information of various categories from the dataset to augment the current input feature.

...read moreread less

Abstract: The purpose of image multi-label classification is to predict all the object categories presented in an image. Some recent works exploit graph convolution network to capture the correlation between labels. Although promising results have been reported, these methods cannot learn salient object features in the images and ignore the correlation between channel feature maps. In addition, the current researches only learn the feature information within individual input image, but fail to mine the contextual information of various categories from the dataset to enhance the input feature representation. To address these issues, we propose an Attention-Augmented Memory Network (AAMN) model for the image multi-label classification task. Specifically, we first propose a novel categorical memory module to excavate the contextual information of various categories from the dataset to augment the current input feature. Secondly, we design a new channel-relation exploration module to capture the inter-channel relationship of features, so as to enhance the correlation between objects in the images. Thirdly, we develop a spatial-relation enhancement module to model second-order statistics of features and capture long-range dependencies between pixels in feature maps, so as to learn salient object features. Experimental results on standard benchmarks, including MS-COCO 2014, PASCAL VOC 2007, and VG-500, demonstrate the effectiveness and superiority of AAMN model, which outperforms current state-of-the-art methods.

...read moreread less

1 citations

Journal Article•DOI•

Feature learning network with transformer for multi-label image classification

[...]

Wei Zhou, Pengli Dou, Tao Su, Hai Hu, Zhijie Zheng - Show less +1 more

01 Nov 2022-Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors proposed a feature learning network based on Transformer to learn salient features and excavate potential useful features (FL-Tran) for multi-label image classification task.

...read moreread less

1 citations

Book Chapter•DOI•

Worst-Case Adversarial Perturbation and Effect of Feature Normalization on Max-Margin Multi-label Classifiers

[...]

01 Jan 2022

TL;DR: In this article , the authors proposed a generalized adversary generation mechanism by generating worst-case perturbation, when added to the feature vector of the original sample, generates an adversarial sample without the need for the availability of either training data or model to the attacker.

...read moreread less

Abstract: AbstractMulti-label classification is a generalization of single-label classification, where an unseen sample is automatically assigned a subset of semantically relevant labels from a given vocabulary. In parallel, recent research has demonstrated the impact of adversarial examples, which are modifications of original samples and aim at fooling machine learning models. Unlike existing adversary generation techniques which are specific to single-label data and mostly assume the availability of training data and/or model to the attacker, in this paper, we propose a generalized adversary generation mechanism by generating worst-case perturbation. This perturbation, when added to the feature vector of the original sample, generates an adversarial sample without the need for the availability of either training data or model to the attacker. Next, for the first time as per our knowledge, we study and demonstrate the effect of feature normalization as a defense mechanism against adversarial attacks. Extensive experiments show the effectiveness of our adversarial attack and defense mechanisms using state-of-the-art max-margin multi-label classification algorithms on two benchmark datasets.KeywordsMulti-label learningImage annotationMax-margin classifierAdversarial attackFeature normalization

...read moreread less

Book Chapter•DOI•

Impact of Type of Convolution Operation on Performance of Convolutional Neural Networks for Online Signature Verification

[...]

Chandra Sekhar Vorugunti, Balasubramanian Subramanian, A.K. Gautam, Viswanath Pulabaigari

01 Jan 2022

TL;DR: In this article , the authors propose an online signature verification model based on deep learning, which reports an equal error rate (EER) of 9.72% and 3.1% in Skilled_01 categories of MCYT-100 and SVC data sets.

...read moreread less

Abstract: AbstractAn Online signature is a multivariate time series, a commonly used biometric source for user verification. Deep learning (DL) is increasingly becoming ubiquitous as a paradigm for solving problems that come with a wealth of data. Convolution has been its main workhorse. Recently, DL had marked its entry in online signature verification (OSV), a standard bio-metric method that has been mostly dealt with in traditional settings. However, embracing a DL solution to a problem requires certain issues to be tackled, viz. (i) type of convolution, (ii) order of convolution, and (iii) input representation. In this work, we experimentally analyse each of the issues mentioned above regarding OSV, and subsequently present a superior model that reports state-of-the-art (SOTA) performance on three widely used data-sets namely MCYT-100, SVC, and Mobisig. Specifically, the proposed model reports an equal error rate (EER) of 9.72% and 3.1% in Skilled_01 categories of MCYT-100 and SVC data-sets, with gains of around 4% and 3% over the next best performing methods, respectively. The experimental outcome confirms that the interrelationship between the type and order of convolution operation and the input signature representation plays a significant role in the performance of OSV frameworks.KeywordsOnline signature verificationDeep learningImpact of convolutionStep sizeOne shot learning

...read moreread less

Recurrent Image Annotation with Explicit Inter-label Dependencies

Citations

References

Related Papers (5)

Trending Questions (1)