Rethinking Atrous Convolution for Semantic Image Segmentation
Citations
9,381 citations
8,807 citations
Cites methods from "Rethinking Atrous Convolution for S..."
...DeepLabv3 adopts atrous convolution [40, 41, 42], a powerful tool to explicitly control the resolution of computed feature maps, and builds five parallel heads including (a) Atrous Spatial Pyramid Pooling module (ASPP) [43] containing three 3 × 3 convolutions with different atrous rates, (b) 1 × 1 convolution head, and (c) Image-level features [44]....
[...]
...We have observed that: (a) the inference strategies, including multi-scale inputs and adding leftright flipped images, significantly increase the MAdds and thus are not suitable for on-device applications, (b) using output stride = 16 is more efficient than output stride = 8, (c) MobileNetV1 is already a powerful feature extractor and only requires about 4.9− 5.7 times fewer MAdds than ResNet-101 [8] (e.g., mIOU: 78.56 vs 82.70, and MAdds: 941.9B vs 4870.6B), (d) it is more efficient to build DeepLabv3 heads on top of the second last feature map of MobileNetV2 than on the original last-layer feature map, since the second to last feature map contains 320 channels instead of 1280, and by doing so, we attain similar performance, but require about 2.5 times fewer operations than the MobileNetV1 counterparts, and (e) DeepLabv3 heads are computationally expensive and removing the ASPP module significantly reduces the MAdds with only a slight performance degradation....
[...]
...In this section, we compare MobileNetV1 and MobileNetV2 models used as feature extractors with DeepLabv3 [39] for the task of mobile semantic segmentation....
[...]
...To build a mobile model, we experimented with three design variations: (1) different feature extractors, (2) simplifying the DeepLabv3 heads for faster computation, and (3) different inference strategies for boosting the performance....
[...]
7,113 citations
Cites background or methods from "Rethinking Atrous Convolution for S..."
...Spatial pyramid pooling: Models, such as PSPNet [81] or DeepLab [9, 10], perform spatial pyramid pooling [23, 40] at several grid scales (including image-level pooling [47]) or apply several parallel atrous convolution with different rates (called Atrous Spatial Pyramid Pooling, or ASPP)....
[...]
...We then review DeepLabv3 [10] which is used as our encoder module before discussing the proposed decoder module appended to the encoder output....
[...]
...DeepLabv3 as encoder: DeepLabv3 [10] employs atrous convolution [30, 21, 64, 56] to extract the features computed by deep convolutional neural networks at an arbitrary resolution....
[...]
...We follow the same training protocol as in [10] and refer the interested readers to [10] for details....
[...]
...In particular, our proposed model, called DeepLabv3+, extends DeepLabv3 [10] by adding a simple yet effective decoder module to recover the object boundaries, as illustrated in Fig....
[...]
4,327 citations
Cites background or methods from "Rethinking Atrous Convolution for S..."
...First, Deeplabv2 [3] and Deeplabv3 [4] adopt atrous spatial pyramid pooling to embed contextual information, which consist of parallel dilated convolutions with different dilated rates....
[...]
...It should be noted that our method is more effective and flexible than previous methods [4, 29] when dealing with complex and diverse scenes....
[...]
...Following [4,27], we employ a poly learning rate policy where the initial learning rate is multiplied by (1− iter total iter ) after each iteration....
[...]
...For example, some works [3, 4, 29] aggregate multi-scale contexts via combining feature maps generated by different dilated convolutions and pooling operations....
[...]
...Finally, segmentation map fusion further improves the performance to 81.50%, which outperforms well-known method Deeplabv3 [4] (79.30% on Cityscape val set) by 2.20%....
[...]
2,651 citations
Cites methods from "Rethinking Atrous Convolution for S..."
...We employ the same training protocol as [8, 39]....
[...]
...R-ASPP is a reduced design of the Atrous Spatial Pyramid Pooling module [7, 8, 9], which adopts only two branches consisting of a 1 × 1 convolution and a global-average pooling operation [29, 50]....
[...]
References
123,388 citations
73,978 citations
72,897 citations
Additional excerpts
...[29] to aggregate global context [44, 6, 76]....
[...]
49,914 citations
30,843 citations