Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

doi:10.1109/CVPR42600.2020.00406

Open AccessProceedings ArticleDOI

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

- pp 4003-4012

TLDR

Qibin et al. as mentioned in this paper proposed a strip pooling strategy, which considers a long but narrow kernel, i.e., 1xN or Nx1, to capture long-range contextual information for pixel-wise prediction tasks.

Abstract:

Spatial pooling has been proven highly effective to capture long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. Based on strip pooling, we further investigate spatial pooling architecture design by 1) introducing a new strip pooling module that enables backbone networks to efficiently model long-range dependencies; 2) presenting a novel building block with diverse spatial pooling as a core; and 3) systematically comparing the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play modules in existing scene parsing networks. Extensive experiments on Cityscapes and ADE20K benchmarks demonstrate that our simple approach establishes new state-of-the-art results. Code is available at https://github.com/Andrew-Qibin/SPNet.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sixiao Zheng, +10 more

TL;DR: Zhang et al. as discussed by the authors proposed a pure transformer to encode an image as a sequence of patches, which can be combined with a simple decoder to provide a powerful segmentation model.

...read moreread less

Proceedings ArticleDOI

Coordinate Attention for Efficient Mobile Network Design

Qibin Hou, +2 more

TL;DR: CoordAttention as mentioned in this paper embeds positional information into channel attention to capture long-range dependencies along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction.

...read moreread less

Journal ArticleDOI

Attention mechanisms in computer vision: A survey

- 15 Mar 2022 -

Computational Visual Media

TL;DR: Guo et al. as mentioned in this paper provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention.

...read moreread less

Posted Content

Attention Mechanisms in Computer Vision: A Survey.

Meng-Hao Guo, +9 more

- 15 Nov 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A comprehensive review of attention mechanisms in computer vision can be found in this article, which categorizes them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.

...read moreread less

Proceedings ArticleDOI

Rotate to Attend: Convolutional Triplet Attention Module

Diganta Misra, +3 more

TL;DR: Triplet Attention as discussed by the authors proposes triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure, which can be easily plugged into classic backbone networks as an add-on module.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 18 May 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Proceedings ArticleDOI

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin, +5 more

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

...read moreread less

Collapse

IEEE Transactions on Pattern Analysis an...

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Citations

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Coordinate Attention for Efficient Mobile Network Design

Attention mechanisms in computer vision: A survey

Attention Mechanisms in Computer Vision: A Survey.

Rotate to Attend: Convolutional Triplet Attention Module

References

Deep Residual Learning for Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

Fully convolutional networks for semantic segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation

Feature Pyramid Networks for Object Detection

Related Papers (5)

Deep Residual Learning for Image Recognition

Pyramid Scene Parsing Network

Fully convolutional networks for semantic segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs