End-to-End Instance Segmentation with Recurrent Attention

doi:10.1109/CVPR.2017.39

Open AccessProceedings ArticleDOI

End-to-End Instance Segmentation with Recurrent Attention

Mengye Ren, +1 more

- pp 293-301

Chats0

TLDR

An end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations is proposed.

Abstract:

While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine large graphical models with low-level vision have been proposed to address this problem, however, we propose an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations. The network is jointly trained to sequentially produce regions of interest as well as a dominant object segmentation within each region. The proposed model achieves competitive results on the CVPPP [27], KITTI [12], and Cityscapes [8] datasets.

Citations

PDF

Open Access

More filters

Posted Content

End-to-End Object Detection with Transformers

Nicolas Carion, +5 more

- 26 May 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.

...read moreread less

Proceedings ArticleDOI

Path Aggregation Network for Instance Segmentation

Shu Liu, +4 more

TL;DR: PANet as mentioned in this paper enhances the entire feature hierarchy with accurate localization signals in lower layers by bottom-up path augmentation, which shortens the information path between lower layers and topmost feature.

...read moreread less

Book ChapterDOI

End-to-End Object Detection with Transformers

Nicolas Carion, +5 more

TL;DR: DetR as mentioned in this paper proposes a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture to directly output the final set of predictions in parallel.

...read moreread less

Posted Content

Image Segmentation Using Deep Learning: A Survey

Shervin Minaee, +5 more

- 15 Jan 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A comprehensive review of recent pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings are provided.

...read moreread less

Proceedings ArticleDOI

Representation Learning by Learning to Count

Mehdi Noroozi, +2 more

TL;DR: This paper uses two image transformations in the context of counting: scaling and tiling to train a neural network with a contrastive loss that produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

Proceedings ArticleDOI

Are we ready for autonomous driving? The KITTI vision benchmark suite

Andreas Geiger, +2 more

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.

...read moreread less

Proceedings ArticleDOI

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, +8 more

TL;DR: This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

...read moreread less

Proceedings Article

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Kelvin Xu, +10 more

TL;DR: An attention based model that automatically learns to describe the content of images is introduced that can be trained in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound.

...read moreread less

Collapse

End-to-End Instance Segmentation with Recurrent Attention

Citations

End-to-End Object Detection with Transformers

Path Aggregation Network for Instance Segmentation

End-to-End Object Detection with Transformers

Image Segmentation Using Deep Learning: A Survey

Representation Learning by Learning to Count

References

Fully convolutional networks for semantic segmentation

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Are we ready for autonomous driving? The KITTI vision benchmark suite

The Cityscapes Dataset for Semantic Urban Scene Understanding

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Related Papers (5)

Deep Residual Learning for Image Recognition

Fully convolutional networks for semantic segmentation

Mask R-CNN

Microsoft COCO: Common Objects in Context

U-Net: Convolutional Networks for Biomedical Image Segmentation