Self-supervised Pretraining of Visual Features in the Wild

Open AccessPosted Content

Self-supervised Pretraining of Visual Features in the Wild

Priya Goyal, +10 more

- 02 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods as mentioned in this paper. But self-learning cannot learn from any random image and from any unbounded dataset.

Abstract:

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: this https URL

Citations

PDF

Open Access

More filters

Posted Content

Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning.

Dahyun Kim, +1 more

- 01 Dec 2021 -

arXiv: Learning

TL;DR: In this article, a self-supervised learning method for binary networks that uses a moving target network is proposed to accelerate deployment of models with the benefit of unsupervised representation learning to resource limited devices for various downstream tasks.

...read moreread less

Posted Content

What Is Considered Complete for Visual Recognition

Lingxi Xie, +4 more

- 28 May 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the authors advocate a new type of pre-training task named learning-by-compression, where the computational models are optimized to represent the visual data using compact features, and the features preserve the ability to recover the original data.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018 -

arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

Mark Everingham, +4 more

- 01 Jun 2010 -

International Journal of Computer Vision

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.

...read moreread less

Collapse

arXiv: Learning

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Jean-Bastien Grill, +13 more

- 13 Jun 2020 -

arXiv: Learning

Self-supervised Pretraining of Visual Features in the Wild

Citations

Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning.

What Is Considered Complete for Visual Recognition

References

Deep Residual Learning for Image Recognition

ImageNet Large Scale Visual Recognition Challenge

Microsoft COCO: Common Objects in Context

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The Pascal Visual Object Classes (VOC) Challenge

Related Papers (5)

Deep Residual Learning for Image Recognition

Momentum Contrast for Unsupervised Visual Representation Learning

ImageNet: A large-scale hierarchical image database

Representation Learning with Contrastive Predictive Coding

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning