Multigrid Predictive Filter Flow for Unsupervised Learning on Videos.

Open AccessPosted Content

Multigrid Predictive Filter Flow for Unsupervised Learning on Videos.

- 02 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TLDR

It is shown that mgPFF is able to not only estimate long-range flow for frame reconstruction and detect video shot transitions, but also readily amendable for video object segmentation and pose tracking, where it substantially outperforms the published state-of-the-art without bells and whistles.

Abstract:

We introduce multigrid Predictive Filter Flow (mgPFF), a framework for unsupervised learning on videos. The mgPFF takes as input a pair of frames and outputs per-pixel filters to warp one frame to the other. Compared to optical flow used for warping frames, mgPFF is more powerful in modeling sub-pixel movement and dealing with corruption (e.g., motion blur). We develop a multigrid coarse-to-fine modeling strategy that avoids the requirement of learning large filters to capture large displacement. This allows us to train an extremely compact model (4.6MB) which operates in a progressive way over multiple resolutions with shared weights. We train mgPFF on unsupervised, free-form videos and show that mgPFF is able to not only estimate long-range flow for frame reconstruction and detect video shot transitions, but also readily amendable for video object segmentation and pose tracking, where it substantially outperforms the published state-of-the-art without bells and whistles. Moreover, owing to mgPFF's nature of per-pixel filter prediction, we have the unique opportunity to visualize how each pixel is evolving during solving these tasks, thus gaining better interpretability.

Citations

PDF

Open Access

More filters

Proceedings Article

Volumetric Correspondence Networks for Optical Flow

Gengshan Yang, +1 more

TL;DR: Several simple modifications that dramatically simplify the use of volumetric layers are introduced that significantly improve accuracy over SOTA on standard benchmarks while being significantly easier to work with - training converges in 10X fewer iterations, and most importantly, the networks generalize across correspondence tasks.

...read moreread less

Proceedings Article

Joint-task self-supervised learning for temporal correspondence

Xueting Li, +5 more

TL;DR: This method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking.

...read moreread less

Proceedings ArticleDOI

MAST: A Memory-Augmented Self-Supervised Tracker

Zihang Lai, +2 more

TL;DR: In this article, a self-supervised dense tracking model is proposed, which is trained on videos without any annotations, and achieves performance comparable to supervised methods on existing benchmarks by a significant margin.

...read moreread less

Proceedings ArticleDOI

Learning Video Object Segmentation From Unlabeled Videos

Xiankai Lu, +5 more

TL;DR: In this paper, a unified unsupervised/weakly supervised learning framework, called MuG, is proposed to comprehensively capture intrinsic properties of VOS at multiple granularities.

...read moreread less

Posted Content

MAST: A Memory-Augmented Self-supervised Tracker

Zihang Lai, +2 more

- 18 Feb 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A dense tracking model trained on videos without any annotations is proposed that surpasses previous self-supervised methods on existing benchmarks by a significant margin, and achieves performance comparable to supervised methods.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Collapse

arXiv: Computer Vision and Pattern Recog...

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Eddy Ilg, +5 more

Multigrid Predictive Filter Flow for Unsupervised Learning on Videos.

Citations

Volumetric Correspondence Networks for Optical Flow

Joint-task self-supervised learning for temporal correspondence

MAST: A Memory-Augmented Self-Supervised Tracker

Learning Video Object Segmentation From Unlabeled Videos

MAST: A Memory-Augmented Self-supervised Tracker

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

ImageNet: A large-scale hierarchical image database

U-Net: Convolutional Networks for Biomedical Image Segmentation

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

Deep Residual Learning for Image Recognition

Spatial transformer networks

A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

The 2017 DAVIS Challenge on Video Object Segmentation

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks