scispace - formally typeset
Open AccessBook ChapterDOI

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Reads0
Chats0
TLDR
This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.
Abstract
Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89% and 82.1% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at https://github.com/tensorflow/models/tree/master/research/deeplab.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Proceedings ArticleDOI

EfficientDet: Scalable and Efficient Object Detection

TL;DR: EfficientDetD7 as discussed by the authors proposes a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.
Posted Content

Searching for MobileNetV3.

TL;DR: This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art of MobileNets.
Proceedings ArticleDOI

Searching for MobileNetV3

TL;DR: MobileNetV3 as mentioned in this paper is the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design and achieves state-of-the-art results for mobile classification, detection and segmentation.
Journal ArticleDOI

Res2Net: A New Multi-Scale Backbone Architecture

TL;DR: Res2Net as mentioned in this paper constructs hierarchical residual-like connections within one single residual block to represent multi-scale features at a granular level and increases the range of receptive fields for each network layer.
References
More filters
Proceedings ArticleDOI

Hypercolumns for object segmentation and fine-grained localization

TL;DR: In this paper, the authors define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and use hypercolumns as pixel descriptors.
Proceedings ArticleDOI

Understanding Convolution for Semantic Segmentation

TL;DR: DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.
Proceedings ArticleDOI

The Role of Context for Object Detection and Semantic Segmentation in the Wild

TL;DR: A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales.
Journal ArticleDOI

TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

TL;DR: A new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently, which gives competitive and visually pleasing results for objects that are highly textured, highly structured, and even articulated.
Proceedings ArticleDOI

Attention to Scale: Scale-Aware Semantic Image Segmentation

TL;DR: Zhang et al. as discussed by the authors propose an attention mechanism that learns to softly weight the multi-scale features at each pixel location, which not only outperforms average and max-pooling, but also allows diagnostically visualize the importance of features at different positions and scales.
Related Papers (5)