Open AccessPosted Content
Multi-task Self-Supervised Visual Learning
Carl Doersch,Andrew Zisserman +1 more
Reads0
Chats0
TLDR
The results show that deeper networks work better, and that combining tasks—even via a na¨ýve multihead architecture—always improves performance.Abstract:
We investigate methods for combining multiple self-supervised tasks--i.e., supervised tasks where data can be collected without manual labeling--in order to train a single visual representation. First, we provide an apples-to-apples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network. We also explore lasso regularization to encourage the network to factorize the information in its representation, and methods for "harmonizing" network inputs in order to learn a more unified representation. We evaluate all methods on ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our results show that deeper networks work better, and that combining tasks--even via a naive multi-head architecture--always improves performance. Our best joint network nearly matches the PASCAL performance of a model pre-trained on ImageNet classification, and matches the ImageNet network on NYU depth prediction.read more
Citations
More filters
Posted Content
Deep Clustering for Unsupervised Learning of Visual Features
TL;DR: This work presents DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features and outperforms the current state of the art by a significant margin on all the standard benchmarks.
Posted Content
Learning deep representations by mutual information estimation and maximization
R Devon Hjelm,Alex Fedorov,Samuel Lavoie-Marchildon,Karan Grewal,Philip Bachman,Adam Trischler,Yoshua Bengio +6 more
TL;DR: It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
Journal ArticleDOI
Self-supervised learning for medical image analysis using image context restoration.
TL;DR: A novel self-supervised learning strategy based on context restoration is proposed in order to better exploit unlabelled images and is validated in three common problems in medical imaging: classification, localization, and segmentation.
Posted Content
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
TL;DR: This work forms this intuition as a non-parametric classification problem at the instance-level, and uses noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes.
Posted Content
Objects that Sound
TL;DR: In this article, audio and visual embeddings are learned from unlabeled video using only audio-visual correspondence (AVC) as the objective function, which is a form of cross-modal self-supervision from video.
References
More filters
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Posted Content
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Book ChapterDOI
Identity Mappings in Deep Residual Networks
TL;DR: In this paper, the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
Proceedings Article
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
TL;DR: In this paper, the authors show that training with residual connections accelerates the training of Inception networks significantly, and they also present several new streamlined architectures for both residual and non-residual Inception Networks.
Proceedings ArticleDOI
Action Recognition with Improved Trajectories
Heng Wang,Cordelia Schmid +1 more
TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.