Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

doi:10.1109/ICCV.2017.63

Proceedings ArticleDOI

Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

Sijia Cai, +2 more

- pp 511-520

Chats0

TLDR

This work proposes an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC that yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

Abstract:

The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

Citations

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning.

From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

References

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

ImageNet Large Scale Visual Recognition Challenge

Related Papers (5)

Deep Residual Learning for Image Recognition

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

3D Object Representations for Fine-Grained Categorization

Bilinear CNN Models for Fine-Grained Visual Recognition

The Caltech-UCSD Birds-200-2011 Dataset