Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

doi:10.1007/978-3-030-01234-2_49

Open AccessBook ChapterDOI

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +4 more

- pp 833-851

Chats0

TLDR

This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

Abstract:

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89% and 82.1% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at https://github.com/tensorflow/models/tree/master/research/deeplab.

Citations

PDF

Open Access

More filters

Posted Content

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

Ze Liu, +7 more

- 25 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.

...read moreread less

Proceedings ArticleDOI

EfficientDet: Scalable and Efficient Object Detection

Mingxing Tan, +2 more

TL;DR: EfficientDetD7 as discussed by the authors proposes a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.

...read moreread less

Posted Content

Searching for MobileNetV3.

Andrew Howard, +11 more

- 06 May 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art of MobileNets.

...read moreread less

Proceedings ArticleDOI

Searching for MobileNetV3

Andrew Howard, +11 more

TL;DR: MobileNetV3 as mentioned in this paper is the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design and achieves state-of-the-art results for mobile classification, detection and segmentation.

...read moreread less

Journal ArticleDOI

Res2Net: A New Multi-Scale Backbone Architecture

Shanghua Gao, +5 more

- 01 Feb 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Res2Net as mentioned in this paper constructs hierarchical residual-like connections within one single residual block to represent multi-scale features at a granular level and increases the range of receptive fields for each network layer.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Hypercolumns for object segmentation and fine-grained localization

Bharath Hariharan, +3 more

TL;DR: In this paper, the authors define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and use hypercolumns as pixel descriptors.

...read moreread less

Proceedings ArticleDOI

Understanding Convolution for Semantic Segmentation

Panqu Wang, +6 more

TL;DR: DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.

...read moreread less

Proceedings ArticleDOI

The Role of Context for Object Detection and Semantic Segmentation in the Wild

Roozbeh Mottaghi, +7 more

TL;DR: A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales.

...read moreread less

Journal ArticleDOI

TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

Jamie Shotton, +3 more

- 01 Jan 2009 -

International Journal of Computer Vision

TL;DR: A new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently, which gives competitive and visually pleasing results for objects that are highly textured, highly structured, and even articulated.

...read moreread less

Proceedings ArticleDOI

Attention to Scale: Scale-Aware Semantic Image Segmentation

Liang-Chieh Chen, +4 more

TL;DR: Zhang et al. as discussed by the authors propose an attention mechanism that learns to softly weight the multi-scale features at each pixel location, which not only outperforms average and max-pooling, but also allows diagnostically visualize the importance of features at different positions and scales.

...read moreread less

Collapse

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Citations

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

EfficientDet: Scalable and Efficient Object Detection

Searching for MobileNetV3.

Searching for MobileNetV3

Res2Net: A New Multi-Scale Backbone Architecture

References

Hypercolumns for object segmentation and fine-grained localization

Understanding Convolution for Semantic Segmentation

The Role of Context for Object Detection and Semantic Segmentation in the Wild

TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

Attention to Scale: Scale-Aware Semantic Image Segmentation

Related Papers (5)

Deep Residual Learning for Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

Fully convolutional networks for semantic segmentation

Microsoft COCO: Common Objects in Context

ImageNet: A large-scale hierarchical image database