scispace - formally typeset
Proceedings ArticleDOI

Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

Reads0
Chats0
TLDR
This work proposes an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC that yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.
Abstract
The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

read more

Citations
More filters
Journal ArticleDOI

Cascading Hierarchical Networks with Multi-task Balanced Loss for Fine-grained hashing

Xianxian Zeng, +1 more
- 20 Mar 2023 - 
TL;DR: In this article , a cascaded hierarchical data augmentation network is proposed to improve the retrieval accuracy of fine-grained hashing, and an attention-guided method is introduced to balance the loss of multi-task learning.
Posted Content

Privileged Pooling: Better Sample Efficiency Through Supervised Attention

TL;DR: This article propose a visual attention mechanism that is supervised via keypoint annotations that highlight important object parts, which is only required during training and helps the model to focus on regions that are discriminative.
Proceedings ArticleDOI

A Channel Mix Method for Fine-Grained Cross-Modal Retrieval

TL;DR: A channel mix method is developed and performed upon the channels of deep activations across dif-ferent modalities to enhance information interaction in different modalities for fine-grained objects and improve the intra-class separa-bility as well as inter-class compactness for multi-modalities.
Book ChapterDOI

Selecting Discriminative Features for Fine-Grained Visual Classification

TL;DR: An end-to-end model based on selecting discriminative features for fine-grained visual classification without the help of part or bounding box annotations is developed and is superior to the state-of-the-art methods on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets.
Journal ArticleDOI

Learning more discriminative clues with gradual attention for fine-grained visual categorization

TL;DR: Li et al. as mentioned in this paper proposed an end-to-end network composed of the self-calibrated convolution, gradual attention module and feature inverse module for fine-grained visual categorization.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Related Papers (5)