scispace - formally typeset
Open AccessProceedings ArticleDOI

Mask Encoding for Single Shot Instance Segmentation

Reads0
Chats0
TLDR
Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector, which allows the instance segmentation task to be incorporated into one-stage bounding-box detectors and results in a simple yet efficient instance segmentations framework.
Abstract
To date, instance segmentation is dominated by two-stage methods, as pioneered by Mask R-CNN. In contrast, one-stage alternatives cannot compete with Mask R-CNN in mask AP, mainly due to the difficulty of compactly representing masks, making the design of one-stage methods very challenging. In this work, we propose a simple single-shot instance segmentation framework, termed mask encoding based instance segmentation (MEInst). Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector, which allows the instance segmentation task to be incorporated into one-stage bounding-box detectors and results in a simple yet efficient instance segmentation framework. The proposed one-stage MEInst achieves 36.4% in mask AP with single-model (ResNeXt-101-FPN backbone) and single-scale testing on the MS-COCO benchmark. We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance. This framework can be easily adapted for other instance-level recognition tasks. Code is available at: git.io/AdelaiDet

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

SOLOv2: Dynamic and Fast Instance Segmentation

TL;DR: State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.
Posted Content

FCOS: A simple and strong anchor-free object detector

TL;DR: In this article, a fully convolutional one-stage object detector (FCOS) is proposed to solve object detection in a per-pixel prediction fashion, analogue to other dense prediction problems such as semantic segmentation.
Posted Content

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images.

TL;DR: This survey focuses on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images, and chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era.
Proceedings ArticleDOI

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

TL;DR: BPR as mentioned in this paper proposes a post-processing refinement framework to improve the boundary quality based on the results of any instance segmentation model, extracting and refining a series of small boundary patches along the predicted instance boundaries.
Proceedings ArticleDOI

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

TL;DR: Li et al. as mentioned in this paper proposed a simple yet effective one-stage video instance segmentation framework by spatial calibration and temporal fusion, namely STMask, to ensure spatial feature calibration with ground-truth bounding boxes.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Related Papers (5)