scispace - formally typeset
Open AccessProceedings ArticleDOI

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

TLDR
A differentiable matching layer which unrolls a projected gradient descent algorithm in which the projection step exploits the Dykstra's algorithm and it is proved that under mild conditions, the matching is guaranteed to converge to the optimal one.
Abstract
In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals as a linear assignment problem where thA heading inside a blocke cost matrix is predicted by a deep convolutional neural network. We propose a differentiable matching layer which unrolls a projected gradient descent algorithm in which the projection step exploits the Dykstra's algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimal one. In practice, it achieves similar performance compared to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a U-Net style architecture is exploited to refine the matched mask per time step. On DAVIS 2017 dataset, DMM-Net achieves the best performance without online learning on the first frames and the 2nd best with it. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our differentiable matching layer is very simple to implement; we attach the PyTorch code in the supplementary material which is less than $50$ lines long.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Video Object Segmentation with Episodic Graph Memory Networks

TL;DR: In this article, a graph memory network is developed to address the novel idea of "learning to update the segmentation model" by exploiting an episodic memory network to store frames as nodes and capture cross-frame correlations by edges.
Posted Content

Video Object Segmentation with Episodic Graph Memory Networks

TL;DR: This work exploits an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges and yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
Posted Content

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

TL;DR: A novel approach that segments and tracks instances across space and time in a single stage and is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster pixels belonging to a specific objectinstance over an entire video clip is proposed.
Book ChapterDOI

Kernelized Memory Network for Video Object Segmentation

TL;DR: A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.
Posted Content

Kernelized Memory Network for Video Object Segmentation

TL;DR: In this paper, a kernelized memory network (KMN) is proposed to solve the mismatch between STM and VOS, which is pre-trained on static images, as in previous works.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Mask R-CNN

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
Journal ArticleDOI

The Hungarian method for the assignment problem

TL;DR: This paper has always been one of my favorite children, combining as it does elements of the duality of linear programming and combinatorial tools from graph theory, and it may be of some interest to tell the story of its origin this article.
Book ChapterDOI

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

TL;DR: This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.
Related Papers (5)