Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning

doi:10.1109/CVPR.2018.00130

Open AccessProceedings ArticleDOI

Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning

- pp 1189-1198

TLDR

The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario), and reaches comparable quality to competing methods with much less interaction.

Abstract:

This paper tackles the problem of video object segmentation, given some user annotation which indicates the object of interest. The problem is formulated as pixel-wise retrieval in a learned embedding space: we embed pixels of the same object instance into the vicinity of each other, using a fully convolutional network trained by a modified triplet loss as the embedding model. Then the annotated pixels are set as reference and the rest of the pixels are classified using a nearest-neighbor approach. The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario). In the semi-supervised scenario, we achieve results competitive with the state of the art but at a fraction of computation cost (275 milliseconds per frame). In the interactive scenario where the user is able to refine their input iteratively, the proposed method provides instant response to each input, and reaches comparable quality to competing methods with much less interaction.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Fast Online Object Tracking and Segmentation: A Unifying Approach

Qiang Wang, +4 more

TL;DR: This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.

...read moreread less

Proceedings ArticleDOI

Siam R-CNN: Visual Tracking by Re-Detection

Paul Voigtlaender, +3 more

TL;DR: This work presents Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking, and combines this with a novel tracklet-based dynamic programming algorithm to model the full history of both the object to be tracked and potential distractor objects.

...read moreread less

Proceedings ArticleDOI

Video Object Segmentation Using Space-Time Memory Networks

Seoung Wug Oh, +3 more

TL;DR: In this paper, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory, which is densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion.

...read moreread less

Proceedings ArticleDOI

FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation

Paul Voigtlaender, +5 more

TL;DR: FEELVOS as discussed by the authors uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame.

...read moreread less

Proceedings ArticleDOI

Video Instance Segmentation

Linjie Yang, +2 more

TL;DR: Wang et al. as discussed by the authors proposed a novel algorithm called MaskTrack R-CNN for instance segmentation in videos, which introduces a new tracking branch to Mask-R-CNN to jointly perform detection, segmentation and tracking tasks simultaneously.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Journal Article

Visualizing Data using t-SNE

Laurens van der Maaten, +1 more

- 01 Jan 2008 -

Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

Journal ArticleDOI

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Liang-Chieh Chen, +4 more

- 01 Apr 2018 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

...read moreread less

Posted Content

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Liang-Chieh Chen, +4 more

- 02 Jun 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.

...read moreread less

Collapse

Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning

Citations

Fast Online Object Tracking and Segmentation: A Unifying Approach

Siam R-CNN: Visual Tracking by Re-Detection

Video Object Segmentation Using Space-Time Memory Networks

FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation

Video Instance Segmentation

References

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

Visualizing Data using t-SNE

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Related Papers (5)

A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

Learning Video Object Segmentation from Static Images

One-Shot Video Object Segmentation

Fast Video Object Segmentation by Reference-Guided Mask Propagation

Online Adaptation of Convolutional Neural Networks for Video Object Segmentation