Temporal Ensembling for Semi-Supervised Learning

Open AccessPosted Content

Temporal Ensembling for Semi-Supervised Learning

Samuli Laine, +1 more

- 07 Oct 2016 -

arXiv: Neural and Evolutionary Computing

Chats0

TLDR

Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.

Abstract:

In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions. This ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. We additionally obtain a clear improvement in CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled extra inputs during training. Finally, we demonstrate good tolerance to incorrect labels.

Citations

PDF

Open Access

More filters

Posted Content

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Jean-Bastien Grill, +13 more

- 13 Jun 2020 -

arXiv: Learning

TL;DR: This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.

...read moreread less

Proceedings Article

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Antti Tarvainen, +1 more

TL;DR: The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.

...read moreread less

Posted Content

MixMatch: A Holistic Approach to Semi-Supervised Learning

David Berthelot, +5 more

- 06 May 2019 -

arXiv: Learning

TL;DR: MixMatch as discussed by the authors predicts low-entropy labels for unlabeled examples and combines them with labeled and unlabelled data using MixUp to obtain state-of-the-art results.

...read moreread less

Posted Content

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, +4 more

- 29 Apr 2019 -

arXiv: Learning

TL;DR: A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

...read moreread less

Proceedings Article

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Kihyuk Sohn, +8 more

TL;DR: This paper demonstrates the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling, and shows that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014 -

arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

Journal ArticleDOI

Bagging predictors

Leo Breiman

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

Posted Content

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, +2 more

- 09 Mar 2015 -

arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

Collapse

Temporal Ensembling for Semi-Supervised Learning

Citations

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

MixMatch: A Holistic Approach to Semi-Supervised Learning

Unsupervised Data Augmentation for Consistency Training

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

References

Adam: A Method for Stochastic Optimization

Dropout: a simple way to prevent neural networks from overfitting

Adam: A Method for Stochastic Optimization

Bagging predictors

Distilling the Knowledge in a Neural Network

Related Papers (5)

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Deep Residual Learning for Image Recognition

Learning Multiple Layers of Features from Tiny Images

Generative Adversarial Nets

Reading Digits in Natural Images with Unsupervised Feature Learning