Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition

doi:10.1109/ICCV.2017.557

Proceedings ArticleDOI

Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition

- pp 5219-5227

TLDR

This paper proposes a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other, and shows the best performances on three challenging published fine-grained datasets.

Abstract:

Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other. MA-CNN consists of convolution, channel grouping and part classification sub-networks. The channel grouping network takes as input feature channels from convolutional layers, and generates multiple parts by clustering, weighting and pooling from spatially-correlated channels. The part classification network further classifies an image by each individual part, through which more discriminative fine-grained features can be learned. Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way. MA-CNN does not need bounding box/part annotation and can be trained end-to-end. We incorporate the learned parts from MA-CNN with part-CNN for recognition, and show the best performances on three challenging published fine-grained datasets, e.g., CUB-Birds, FGVC-Aircraft and Stanford-Cars.

Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition

Citations

A survey of the recent architectures of deep convolutional neural networks

Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism

This Looks Like That: Deep Learning for Interpretable Image Recognition

Learning to Navigate for Fine-grained Classification

Destruction and Construction Learning for Fine-Grained Image Recognition

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation

Related Papers (5)

Deep Residual Learning for Image Recognition

3D Object Representations for Fine-Grained Categorization

The Caltech-UCSD Birds-200-2011 Dataset

Bilinear CNN Models for Fine-Grained Visual Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition