scispace - formally typeset
Open AccessProceedings ArticleDOI

Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning

TLDR
The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario), and reaches comparable quality to competing methods with much less interaction.
Abstract
This paper tackles the problem of video object segmentation, given some user annotation which indicates the object of interest. The problem is formulated as pixel-wise retrieval in a learned embedding space: we embed pixels of the same object instance into the vicinity of each other, using a fully convolutional network trained by a modified triplet loss as the embedding model. Then the annotated pixels are set as reference and the rest of the pixels are classified using a nearest-neighbor approach. The proposed method supports different kinds of user input such as segmentation mask in the first frame (semi-supervised scenario), or a sparse set of clicked points (interactive scenario). In the semi-supervised scenario, we achieve results competitive with the state of the art but at a fraction of computation cost (275 milliseconds per frame). In the interactive scenario where the user is able to refine their input iteratively, the proposed method provides instant response to each input, and reaches comparable quality to competing methods with much less interaction.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Fast Online Object Tracking and Segmentation: A Unifying Approach

TL;DR: This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.
Proceedings ArticleDOI

Siam R-CNN: Visual Tracking by Re-Detection

TL;DR: This work presents Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking, and combines this with a novel tracklet-based dynamic programming algorithm to model the full history of both the object to be tracked and potential distractor objects.
Proceedings ArticleDOI

Video Object Segmentation Using Space-Time Memory Networks

TL;DR: In this paper, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory, which is densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion.
Proceedings ArticleDOI

FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation

TL;DR: FEELVOS as discussed by the authors uses a semantic pixel-wise embedding together with a global and a local matching mechanism to transfer information from the first frame and from the previous frame of the video to the current frame.
Proceedings ArticleDOI

Video Instance Segmentation

TL;DR: Wang et al. as discussed by the authors proposed a novel algorithm called MaskTrack R-CNN for instance segmentation in videos, which introduces a new tracking branch to Mask-R-CNN to jointly perform detection, segmentation and tracking tasks simultaneously.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Journal ArticleDOI

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
Posted Content

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.
Related Papers (5)