Multi-task Self-Supervised Visual Learning

Open AccessPosted Content

Multi-task Self-Supervised Visual Learning

Carl Doersch, +1 more

- 25 Aug 2017 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

The results show that deeper networks work better, and that combining tasks—even via a na¨ýve multihead architecture—always improves performance.

Abstract:

We investigate methods for combining multiple self-supervised tasks--i.e., supervised tasks where data can be collected without manual labeling--in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for "harmonizing" network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks--even via a naive multi-head architecture--always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction.

Citations

PDF

Open Access

More filters

Posted Content

Deep Clustering for Unsupervised Learning of Visual Features

Mathilde Caron, +3 more

- 15 Jul 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features and outperforms the current state of the art by a significant margin on all the standard benchmarks.

...read moreread less

Posted Content

Learning deep representations by mutual information estimation and maximization

R Devon Hjelm, +6 more

- 20 Aug 2018 -

arXiv: Machine Learning

TL;DR: It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

...read moreread less

Journal ArticleDOI

Self-supervised learning for medical image analysis using image context restoration.

Liang Chen, +5 more

- 01 Dec 2019 -

Medical Image Analysis

TL;DR: A novel self-supervised learning strategy based on context restoration is proposed in order to better exploit unlabelled images and is validated in three common problems in medical imaging: classification, localization, and segmentation.

...read moreread less

Posted Content

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

Zhirong Wu, +3 more

- 05 May 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work forms this intuition as a non-parametric classification problem at the instance-level, and uses noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes.

...read moreread less

Posted Content

Objects that Sound

Relja Arandjelovic, +1 more

- 18 Dec 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, audio and visual embeddings are learned from unlabeled video using only audio-visual correspondence (AVC) as the objective function, which is a form of cross-modal self-supervision from video.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Book ChapterDOI

Identity Mappings in Deep Residual Networks

Kaiming He, +3 more

TL;DR: In this paper, the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.

...read moreread less

Proceedings Article

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, +3 more

TL;DR: In this paper, the authors show that training with residual connections accelerates the training of Inception networks significantly, and they also present several new streamlined architectures for both residual and non-residual Inception Networks.

...read moreread less

Proceedings ArticleDOI

Action Recognition with Improved Trajectories

Heng Wang, +1 more

TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.

...read moreread less

Collapse

International Journal of Computer Vision

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

Multi-task Self-Supervised Visual Learning

Citations

Deep Clustering for Unsupervised Learning of Visual Features

Learning deep representations by mutual information estimation and maximization

Self-supervised learning for medical image analysis using image context restoration.

Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination

Objects that Sound

References

ImageNet Large Scale Visual Recognition Challenge

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Identity Mappings in Deep Residual Networks

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Action Recognition with Improved Trajectories

Related Papers (5)

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

ImageNet Large Scale Visual Recognition Challenge

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift