The Devil is in the Decoder

doi:10.5244/C.31.10

Open AccessProceedings ArticleDOI

The Devil is in the Decoder

TLDR

In this paper, the authors present an extensive comparison of a variety of decoders for pixel-wise prediction tasks and identify two decoder types which give a consistently high performance.

Abstract:

Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +4 more

TL;DR: This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

...read moreread less

The PASCAL Visual Object Classes Challenge

Jianguo Zhang

Posted Content

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, +11 more

- 20 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.

...read moreread less

Posted Content

High-Resolution Representations for Labeling Pixels and Regions

Ke Sun, +9 more

- 09 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A simple modification is introduced to augment the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions rather than only the representation from thehigh-resolution convolution, which leads to stronger representations, evidenced by superior results.

...read moreread less

Proceedings ArticleDOI

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

Bowen Cheng, +5 more

TL;DR: HigherHRNet is presented, a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids that surpasses all top-down methods on CrowdPose test and achieves new state-of-the-art result on COCO test-dev, suggesting its robustness in crowded scene.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 18 May 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Posted Content

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

- 02 Dec 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less

Posted Content

Fully Convolutional Networks for Semantic Segmentation

Jonathan Long, +2 more

- 14 Nov 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.

...read moreread less

Collapse

The Devil is in the Decoder

Citations

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

The PASCAL Visual Object Classes Challenge

Deep High-Resolution Representation Learning for Visual Recognition

High-Resolution Representations for Labeling Pixels and Regions

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

References

Deep Residual Learning for Image Recognition

ImageNet Large Scale Visual Recognition Challenge

U-Net: Convolutional Networks for Biomedical Image Segmentation

Rethinking the Inception Architecture for Computer Vision

Fully Convolutional Networks for Semantic Segmentation

Related Papers (5)

Fully convolutional networks for semantic segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Learning Deconvolution Network for Semantic Segmentation