Scale-Adaptive Convolutions for Scene Parsing

doi:10.1109/ICCV.2017.224

Proceedings ArticleDOI

Scale-Adaptive Convolutions for Scene Parsing

Rui Zhang, +4 more

- pp 2050-2058

Chats0

TLDR

The proposed scale-adaptive convolutions are not only differentiable to learn the convolutional parameters and scale coefficients in an end-to-end way, but also of high parallelizability for the convenience of GPU implementation.

Abstract:

Many existing scene parsing methods adopt Convolutional Neural Networks with fixed-size receptive fields, which frequently result in inconsistent predictions of large objects and invisibility of small objects. To tackle this issue, we propose a scale-adaptive convolution to acquire flexiblesize receptive fields during scene parsing. Through adding a new scale regression layer, we can dynamically infer the position-adaptive scale coefficients which are adopted to resize the convolutional patches. Consequently, the receptive fields can be adjusted automatically according to the various sizes of the objects in scene images. Thus, the problems of invisible small objects and inconsistent large-object predictions can be alleviated. Furthermore, our proposed scale-adaptive convolutions are not only differentiable to learn the convolutional parameters and scale coefficients in an end-to-end way, but also of high parallelizability for the convenience of GPU implementation. Additionally, since the new scale regression layers are learned implicitly, any extra training supervision of object sizes is unnecessary. Extensive experiments on Cityscapes and ADE20K datasets well demonstrate the effectiveness of the proposed scaleadaptive convolutions.

Citations

PDF

Open Access

More filters

Posted Content

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +3 more

- 17 Jun 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

Book ChapterDOI

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Changqian Yu, +5 more

TL;DR: BiSeNet as discussed by the authors designs a spatial path with a small stride to preserve the spatial information and generate high-resolution features, while a context path with fast downsampling strategy is employed to obtain sufficient receptive field.

...read moreread less

Proceedings ArticleDOI

CCNet: Criss-Cross Attention for Semantic Segmentation

Zilong Huang, +5 more

TL;DR: CCNet as mentioned in this paper proposes a recurrent criss-cross attention module to harvest the contextual information of all the pixels on its crisscross path, and then takes a further recurrent operation to finally capture the full-image dependencies from all pixels.

...read moreread less

Posted Content

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, +11 more

- 20 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.

...read moreread less

Journal ArticleDOI

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, +11 more

- 01 Oct 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The High-Resolution Network (HRNet) as mentioned in this paper maintains high-resolution representations through the whole process by connecting the high-to-low resolution convolution streams in parallel and repeatedly exchanging the information across resolutions.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Collapse

Related Papers (5)

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Liang-Chieh Chen, +4 more

- 01 Apr 2018 -

IEEE Transactions on Pattern Analysis an...

arXiv: Computer Vision and Pattern Recog...

Scale-Adaptive Convolutions for Scene Parsing

Citations

Rethinking Atrous Convolution for Semantic Image Segmentation

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

CCNet: Criss-Cross Attention for Semantic Segmentation

Deep High-Resolution Representation Learning for Visual Recognition

Deep High-Resolution Representation Learning for Visual Recognition

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Fully convolutional networks for semantic segmentation

Related Papers (5)

Deep Residual Learning for Image Recognition

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Pyramid Scene Parsing Network

Fully convolutional networks for semantic segmentation

Rethinking Atrous Convolution for Semantic Image Segmentation