Deformable Convolutional Networks

doi:10.1109/ICCV.2017.89

Open AccessProceedings ArticleDOI

Deformable Convolutional Networks

Jifeng Dai, +6 more

- pp 764-773

Chats0

TLDR

Deformable convolutional networks as discussed by the authors augment the spatial sampling locations in the modules with additional offsets and learn the offsets from the target tasks, without additional supervision, which can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard backpropagation.

Abstract:

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in their building modules. In this work, we introduce two new modules to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from the target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the performance of our approach. For the first time, we show that learning dense spatial transformation in deep CNNs is effective for sophisticated vision tasks such as object detection and semantic segmentation. The code is released at https://github.com/msracver/Deformable-ConvNets.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +4 more

TL;DR: This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

...read moreread less

Posted Content

Rethinking Atrous Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +3 more

- 17 Jun 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

...read moreread less

Proceedings ArticleDOI

Path Aggregation Network for Instance Segmentation

Shu Liu, +4 more

TL;DR: PANet as mentioned in this paper enhances the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature.

...read moreread less

Proceedings ArticleDOI

Cascade R-CNN: Delving Into High Quality Object Detection

Zhaowei Cai, +1 more

TL;DR: Cascade R-CNN as mentioned in this paper proposes a multi-stage object detection architecture, which consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selective against close false positives.

...read moreread less

Posted Content

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

Ze Liu, +7 more

- 25 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Collapse

Deformable Convolutional Networks

Citations

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Rethinking Atrous Convolution for Semantic Image Segmentation

Path Aggregation Network for Instance Segmentation

Cascade R-CNN: Delving Into High Quality Object Detection

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database

Microsoft COCO: Common Objects in Context

Fully convolutional networks for semantic segmentation

Related Papers (5)

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

SSD: Single Shot MultiBox Detector

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

You Only Look Once: Unified, Real-Time Object Detection