Is Second-Order Information Helpful for Large-Scale Visual Recognition?

doi:10.1109/ICCV.2017.228

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

[...]

Qilong Wang¹, Banggu Wu¹, Pengfei Zhu¹, Peihua Li², Wangmeng Zuo³, Qinghua Hu¹ - Show less +2 more•Institutions (3)

Tianjin University¹, Dalian University of Technology², Harbin Institute of Technology³

14 Jun 2020

TL;DR: The Efficient Channel Attention (ECA) module as discussed by the authors proposes a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution, which only involves a handful of parameters while bringing clear performance gain.

...read moreread less

Abstract: Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution. Furthermore, we develop a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction. The proposed ECA module is both efficient and effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFlops vs. 3.86 GFlops, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.

...read moreread less

1,378 citations

Proceedings Article•DOI•

Second-Order Attention Network for Single Image Super-Resolution

[...]

Tao Dai¹, Jianrui Cai², Yongbing Zhang¹, Shu-Tao Xia¹, Lei Zhang² - Show less +1 more•Institutions (2)

Tsinghua University¹, Hong Kong Polytechnic University²

15 Jun 2019

TL;DR: Experimental results demonstrate the superiority of the SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

Abstract: Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance. However, most of the existing CNN-based SISR methods mainly focus on wider or deeper architecture design, neglecting to explore the feature correlations of intermediate layers, hence hindering the representational power of CNNs. To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning. Specifically, a novel train- able second-order channel attention (SOCA) module is developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Furthermore, we present a non-locally enhanced residual group (NLRG) structure, which not only incorporates non-local operations to capture long-distance spatial contextual information, but also contains repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations. Experimental results demonstrate the superiority of our SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

1,219 citations

Cites background from "Is Second-Order Information Helpful..."

...On the other hand, recent works [19, 21] have shown that second-order statistics in deep CNNs are more helpful for more discriminative representations than first-order ones....
[...]
...It is shown in [27, 19] that covariance normalization plays a critical role for more discriminative representations....
[...]
...As explored in [19], α = 1/2 works well for more discriminative representations....
[...]

Posted Content•

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

[...]

Qilong Wang¹, Banggu Wu¹, Pengfei Zhu¹, Peihua Li², Wangmeng Zuo³, Qinghua Hu¹ - Show less +2 more•Institutions (3)

Tianjin University¹, Dalian University of Technology², Harbin Institute of Technology³

08 Oct 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain, and develops a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction.

...read moreread less

Abstract: Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via $1D$ convolution. Furthermore, we develop a method to adaptively select kernel size of $1D$ convolution, determining coverage of local cross-channel interaction. The proposed ECA module is efficient yet effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFLOPs vs. 3.86 GFLOPs, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.

...read moreread less

1,048 citations

Posted Content•

$A^2$-Nets: Double Attention Networks

[...]

Yunpeng Chen¹, Yannis Kalantidis², Jianshu Li¹, Shuicheng Yan¹, Jiashi Feng¹ - Show less +1 more•Institutions (2)

National University of Singapore¹, Facebook²

27 Oct 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access featuresFrom the entire space efficiently.

...read moreread less

Abstract: Learning to capture long-range relations is fundamental to image/video recognition Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient In this work, we propose the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently The component is designed with a double attention mechanism in two steps, where the first step gathers features from the entire space into a compact set through second-order attention pooling and the second step adaptively selects and distributes features to each location via another attention The proposed double attention block is easy to adopt and can be plugged into existing deep neural networks conveniently We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance On the image recognition task, a ResNet-50 equipped with our double attention blocks outperforms a much larger ResNet-152 architecture on ImageNet-1k dataset with over 40% less the number of parameters and less FLOPs On the action recognition task, our proposed model achieves the state-of-the-art results on the Kinetics and UCF-101 datasets with significantly higher efficiency than recent works

...read moreread less

262 citations

Cites background or methods from "Is Second-Order Information Helpful..."

...The double-attention block is related to a number of recent works, including the Squeeze-andExcitation Networks [11], covariance pooling [14], the Non-local Neural Networks [25] and the Transformer architecture of [24]....
[...]
...Meanwhile, self-attentive and correlation operators like second-order pooling have been recently shown to work well in a wide range of tasks [24, 14, 15]....
[...]

Journal Article•DOI•

Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling

[...]

Nanjun He¹, Leyuan Fang¹, Shutao Li¹, Antonio Plaza², Javier Plaza² - Show less +1 more•Institutions (2)

Hunan University¹, University of Extremadura²

09 Jul 2018-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The experimental results demonstrate that the proposed multilayer stacked covariance pooling method can not only consistently outperform the corresponding single-layer model but also achieve better classification performance than other pretrained CNN-based scene classification methods.

...read moreread less

Abstract: This paper proposes a new method, called multilayer stacked covariance pooling (MSCP), for remote sensing scene classification The innovative contribution of the proposed method is that it is able to naturally combine multilayer feature maps, obtained by pretrained convolutional neural network (CNN) models Specifically, the proposed MSCP-based classification framework consists of the following three steps First, a pretrained CNN model is used to extract multilayer feature maps Then, the feature maps are stacked together, and a covariance matrix is calculated for the stacked features Each entry of the resulting covariance matrix stands for the covariance of two different feature maps, which provides a natural and innovative way to exploit the complementary information provided by feature maps coming from different layers Finally, the extracted covariance matrices are used as features for classification by a support vector machine The experimental results, conducted on three challenging data sets, demonstrate that the proposed MSCP method can not only consistently outperform the corresponding single-layer model but also achieve better classification performance than other pretrained CNN-based scene classification methods

...read moreread less

226 citations

Collapse

Is Second-Order Information Helpful for Large-Scale Visual Recognition?

Citations

Cites background from "Is Second-Order Information Helpful..."

Cites background or methods from "Is Second-Order Information Helpful..."

References

Related Papers (5)

Trending Questions (1)