Concrete Dropout

Open AccessProceedings Article

Concrete Dropout

Yarin Gal, +2 more

- Vol. 30, pp 3581-3590

Chats0

TLDR

In this paper, a continuous relaxation of dropout's discrete masks is proposed, which allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles.

Abstract:

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary—a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

Moloud Abdar, +13 more

- 12 Nov 2020 -

arXiv: Learning

TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.

...read moreread less

Posted Content

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle, +1 more

- 09 Mar 2018 -

arXiv: Learning

TL;DR: In this paper, the lottery tickets hypothesis is proposed to find the subnetworks that can reach test accuracy comparable to the original network in a similar number of iterations, where the winning tickets have won the initialization lottery: their connections have initial weights that make training particularly effective.

...read moreread less

Proceedings Article

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Wesley J. Maddox, +5 more

TL;DR: In this article, the authors proposed SWA-Gaussian (SWAG) approach for uncertainty representation and calibration in deep learning, where the first moment of stochastic gradient descent (SGD) is computed using a modified learning rate schedule.

...read moreread less

Proceedings ArticleDOI

Deep and Confident Prediction for Time Series at Uber

Lingxue Zhu, +1 more

TL;DR: A novel end-to-end Bayesian deep model is proposed that provides time series prediction along with uncertainty estimation at Uber and is successfully applied to large-scale time series anomaly detection at Uber.

...read moreread less

Posted Content

Understanding Measures of Uncertainty for Adversarial Example Detection

Lewis Smith, +1 more

- 22 Mar 2018 -

arXiv: Machine Learning

TL;DR: In this article, failure modes for MC dropout, a widely used approach for estimating uncertainty in deep models, are highlighted, and a proposal to improve the quality of uncertainty estimates using probabilistic model ensembles is made.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Proceedings Article

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Ronald J. Williams

- 01 May 1992 -

Machine Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

...read moreread less

Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, +4 more

- 03 Jul 2012 -

arXiv: Neural and Evolutionary Computing

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

...read moreread less

Concrete Dropout

Citations

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Deep and Confident Prediction for Time Series at Uber

Understanding Measures of Uncertainty for Adversarial Example Detection

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

Auto-Encoding Variational Bayes

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Improving neural networks by preventing co-adaptation of feature detectors

Related Papers (5)

Dropout as a Bayesian approximation: representing model uncertainty in deep learning

Dropout: a simple way to prevent neural networks from overfitting

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Learning Multiple Layers of Features from Tiny Images