Learning Correspondence From the Cycle-Consistency of Time

doi:10.1109/CVPR.2019.00267

Open AccessProceedings ArticleDOI

Learning Correspondence From the Cycle-Consistency of Time

- pp 2566-2576

TLDR

A self-supervised method to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch and demonstrates the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow.

Abstract:

We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

Citations

PDF

Open Access

More filters

Posted Content

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron, +7 more

- 29 Apr 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets) beyond the fact that adapting selfsupervised methods to this architecture works particularly well, they make the following observations: first, self-vised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets.

...read moreread less

Proceedings ArticleDOI

Video Representation Learning by Dense Predictive Coding

Tengda Han, +2 more

TL;DR: With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101 and HMDB51, outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on ImageNet.

...read moreread less

Posted ContentDOI

Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans

Xuehai He, +6 more

- 17 Apr 2020 -

medRxiv

TL;DR: An Self-Trans approach is proposed, which synergistically integrates contrastive self-supervised learning with transfer learning to learn powerful and unbiased feature representations for reducing the risk of overfitting in COVID-19.

...read moreread less

Proceedings ArticleDOI

CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency

Yun-Chun Chen, +3 more

TL;DR: A novel pixel-wise adversarial domain adaptation algorithm that leverages image-to-image translation methods for data augmentation and introduces a cross-domain consistency loss that enforces the adapted model to produce consistent predictions.

...read moreread less

Posted Content

Spatiotemporal Contrastive Video Representation Learning

Rui Qian, +6 more

- 09 Aug 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes a temporally consistent spatial augmentation method to impose strong spatial augmentations on each frame of the video while maintaining the temporal consistency across frames, and proposes a sampling-based temporal augmentation methods to avoid overly enforcing invariance on clips that are distant in time.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Collapse

Learning Correspondence From the Cycle-Consistency of Time

Citations

Emerging Properties in Self-Supervised Vision Transformers

Video Representation Learning by Dense Predictive Coding

Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans

CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency

Spatiotemporal Contrastive Video Representation Learning

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Very Deep Convolutional Networks for Large-Scale Image Recognition

Attention is All you Need

Very Deep Convolutional Networks for Large-Scale Image Recognition

Related Papers (5)

Deep Residual Learning for Image Recognition

Representation Learning with Contrastive Predictive Coding

Momentum Contrast for Unsupervised Visual Representation Learning

ImageNet: A large-scale hierarchical image database

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks