Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Liang-Chieh Chen,Yukun Zhu,George Papandreou,Florian Schroff,Hartwig Adam +4 more
- pp 833-851
Reads0
Chats0
TLDR
This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.Abstract:
Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89% and 82.1% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at https://github.com/tensorflow/models/tree/master/research/deeplab.read more
Citations
More filters
Posted Content
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.
TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Proceedings ArticleDOI
EfficientDet: Scalable and Efficient Object Detection
TL;DR: EfficientDetD7 as discussed by the authors proposes a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.
Posted Content
Searching for MobileNetV3.
Andrew Howard,Mark Sandler,Grace Chu,Liang-Chieh Chen,Bo Chen,Mingxing Tan,Weijun Wang,Yukun Zhu,Ruoming Pang,Vijay K. Vasudevan,Quoc V. Le,Hartwig Adam +11 more
TL;DR: This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art of MobileNets.
Proceedings ArticleDOI
Searching for MobileNetV3
Andrew Howard,Ruoming Pang,Hartwig Adam,Quoc V. Le,Mark Sandler,Bo Chen,Weijun Wang,Liang-Chieh Chen,Mingxing Tan,Grace Chu,Vijay K. Vasudevan,Yukun Zhu +11 more
TL;DR: MobileNetV3 as mentioned in this paper is the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design and achieves state-of-the-art results for mobile classification, detection and segmentation.
Journal ArticleDOI
Res2Net: A New Multi-Scale Backbone Architecture
TL;DR: Res2Net as mentioned in this paper constructs hierarchical residual-like connections within one single residual block to represent multi-scale features at a granular level and increases the range of receptive fields for each network layer.
References
More filters
Proceedings ArticleDOI
Hypercolumns for object segmentation and fine-grained localization
TL;DR: In this paper, the authors define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and use hypercolumns as pixel descriptors.
Proceedings ArticleDOI
Understanding Convolution for Semantic Segmentation
TL;DR: DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.
Proceedings ArticleDOI
The Role of Context for Object Detection and Semantic Segmentation in the Wild
Roozbeh Mottaghi,Xianjie Chen,Xiaobai Liu,Nam-Gyu Cho,Seong-Whan Lee,Sanja Fidler,Raquel Urtasun,Alan L. Yuille +7 more
TL;DR: A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales.
Journal ArticleDOI
TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context
TL;DR: A new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently, which gives competitive and visually pleasing results for objects that are highly textured, highly structured, and even articulated.
Proceedings ArticleDOI
Attention to Scale: Scale-Aware Semantic Image Segmentation
TL;DR: Zhang et al. as discussed by the authors propose an attention mechanism that learns to softly weight the multi-scale features at each pixel location, which not only outperforms average and max-pooling, but also allows diagnostically visualize the importance of features at different positions and scales.