DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

doi:10.1109/ICCV.2019.00403

Open AccessProceedings ArticleDOI

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

- pp 3929-3938

TLDR

A differentiable matching layer which unrolls a projected gradient descent algorithm in which the projection step exploits the Dykstra's algorithm and it is proved that under mild conditions, the matching is guaranteed to converge to the optimal one.

Abstract:

In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals as a linear assignment problem where thA heading inside a blocke cost matrix is predicted by a deep convolutional neural network. We propose a differentiable matching layer which unrolls a projected gradient descent algorithm in which the projection step exploits the Dykstra's algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimal one. In practice, it achieves similar performance compared to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a U-Net style architecture is exploited to refine the matched mask per time step. On DAVIS 2017 dataset, DMM-Net achieves the best performance without online learning on the first frames and the 2nd best with it. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our differentiable matching layer is very simple to implement; we attach the PyTorch code in the supplementary material which is less than $50$ lines long.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Video Object Segmentation with Episodic Graph Memory Networks

Xiankai Lu, +5 more

TL;DR: In this article, a graph memory network is developed to address the novel idea of "learning to update the segmentation model" by exploiting an episodic memory network to store frames as nodes and capture cross-frame correlations by edges.

...read moreread less

Posted Content

Video Object Segmentation with Episodic Graph Memory Networks

Xiankai Lu, +5 more

- 14 Jul 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work exploits an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges and yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.

...read moreread less

Posted Content

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Ali Athar, +4 more

- 18 Mar 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel approach that segments and tracks instances across space and time in a single stage and is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster pixels belonging to a specific objectinstance over an entire video clip is proposed.

...read moreread less

Book ChapterDOI

Kernelized Memory Network for Video Object Segmentation

Hongje Seong, +2 more

TL;DR: A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.

...read moreread less

Posted Content

Kernelized Memory Network for Video Object Segmentation

Hongje Seong, +2 more

- 16 Jul 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a kernelized memory network (KMN) is proposed to solve the mismatch between STM and VOS, which is pre-trained on static images, as in previous works.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings ArticleDOI

Mask R-CNN

Kaiming He, +3 more

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less

Journal ArticleDOI

The Hungarian method for the assignment problem

Harold W. Kuhn

- 01 Mar 1955 -

Naval Research Logistics Quarterly

TL;DR: This paper has always been one of my favorite children, combining as it does elements of the duality of linear programming and combinatorial tools from graph theory, and it may be of some interest to tell the story of its origin this article.

...read moreread less

Book ChapterDOI

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen, +4 more

TL;DR: This work extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries and applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network.

...read moreread less

Collapse

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

Citations

Video Object Segmentation with Episodic Graph Memory Networks

Video Object Segmentation with Episodic Graph Memory Networks

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Kernelized Memory Network for Video Object Segmentation

Kernelized Memory Network for Video Object Segmentation

References

Adam: A Method for Stochastic Optimization

Microsoft COCO: Common Objects in Context

Mask R-CNN

The Hungarian method for the assignment problem

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Related Papers (5)

Deep Residual Learning for Image Recognition

Fast Video Object Segmentation by Reference-Guided Mask Propagation

One-Shot Video Object Segmentation

A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

Learning Video Object Segmentation from Static Images