scispace - formally typeset
Open AccessProceedings ArticleDOI

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

TLDR
Qibin et al. as mentioned in this paper proposed a strip pooling strategy, which considers a long but narrow kernel, i.e., 1xN or Nx1, to capture long-range contextual information for pixel-wise prediction tasks.
Abstract
Spatial pooling has been proven highly effective to capture long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. Based on strip pooling, we further investigate spatial pooling architecture design by 1) introducing a new strip pooling module that enables backbone networks to efficiently model long-range dependencies; 2) presenting a novel building block with diverse spatial pooling as a core; and 3) systematically comparing the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play modules in existing scene parsing networks. Extensive experiments on Cityscapes and ADE20K benchmarks demonstrate that our simple approach establishes new state-of-the-art results. Code is available at https://github.com/Andrew-Qibin/SPNet.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

TL;DR: Zhang et al. as discussed by the authors proposed a pure transformer to encode an image as a sequence of patches, which can be combined with a simple decoder to provide a powerful segmentation model.
Proceedings ArticleDOI

Coordinate Attention for Efficient Mobile Network Design

TL;DR: CoordAttention as mentioned in this paper embeds positional information into channel attention to capture long-range dependencies along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction.
Journal ArticleDOI

Attention mechanisms in computer vision: A survey

TL;DR: Guo et al. as mentioned in this paper provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention.
Posted Content

Attention Mechanisms in Computer Vision: A Survey.

TL;DR: A comprehensive review of attention mechanisms in computer vision can be found in this article, which categorizes them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
Proceedings ArticleDOI

Rotate to Attend: Convolutional Triplet Attention Module

TL;DR: Triplet Attention as discussed by the authors proposes triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure, which can be easily plugged into classic backbone networks as an add-on module.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI

Feature Pyramid Networks for Object Detection

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Related Papers (5)