Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

doi:10.1109/ICCV.2017.63

Proceedings ArticleDOI

Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

Sijia Cai, +2 more

- pp 511-520

Chats0

TLDR

This work proposes an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC that yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

Abstract:

The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning-Based Computer Vision for Surveillance in ITS: Evaluation of State-of-the-Art Methods

Jiyang Xie, +7 more

- 10 Mar 2021 -

IEEE Transactions on Vehicular Technolog...

TL;DR: The edge-cloud surveillance resource scheduling for the CV methods is discussed and the deep learning-based CV methods in the VSS are reviewed, including detection, classification, and tracking methods, for better understanding of the relationship between the CV-based ITS services and these methods.

...read moreread less

Journal ArticleDOI

Multi-scale structural kernel representation for object detection

Hao Wang, +3 more

- 01 Feb 2021 -

Pattern Recognition

TL;DR: A Multi-scale Structural Kernel Representation (MSKR) is developed based on the polynomial kernel approximation, which can generate more discriminative representations, and so be flexibly integrated into deep CNNs for improving performance of object detection.

...read moreread less

Posted Content

Mixed High-Order Attention Network for Person Re-Identification

Binghui Chen, +2 more

- 16 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes the High-Order Attention (HOA) module to model and utilize the complex and high-order statistics information in attention mechanism, so as to capture the subtle differences among pedestrians and to produce the discriminative attention proposals.

...read moreread less

Posted Content

Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification

Jianwei Song, +1 more

- 04 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces two lightweight modules that can be easily plugged into existing convolutional neural networks and introduces a feature boosting and suppression module that boosts the most salient part of feature maps to obtain a part-specific representation and suppresses it to force the following network to mine other potential parts.

...read moreread less

Journal ArticleDOI

A new dataset of dog breed images and a benchmark for finegrained classification

Ding-Nan Zou, +3 more

- 30 Nov 2020 -

Computational Visual Media

TL;DR: An image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset, which has only one dog in each image and provides annotated bounding boxes for the whole body and head is introduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Collapse

Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

Citations

Deep Learning-Based Computer Vision for Surveillance in ITS: Evaluation of State-of-the-Art Methods

Multi-scale structural kernel representation for object detection

Mixed High-Order Attention Network for Person Re-Identification

Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification

A new dataset of dog breed images and a benchmark for finegrained classification

References

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

ImageNet Large Scale Visual Recognition Challenge

Related Papers (5)

Deep Residual Learning for Image Recognition

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

3D Object Representations for Fine-Grained Categorization

Bilinear CNN Models for Fine-Grained Visual Recognition

The Caltech-UCSD Birds-200-2011 Dataset