Dropout: a simple way to prevent neural networks from overfitting

Citations

PDF

Open Access

More filters

Proceedings Article•

Dropout as a Bayesian approximation: representing model uncertainty in deep learning

[...]

Yarin Gal¹, Zoubin Ghahramani¹•Institutions (1)

University of Cambridge¹

19 Jun 2016

TL;DR: A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.

...read moreread less

Abstract: Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs - extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and nonlinearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

...read moreread less

3,472 citations

Cites background or methods from "Dropout: a simple way to prevent ne..."

...Dropout is used in many models in deep learning as a way to avoid over-fitting (Srivastava et al., 2014), and our interpretation suggests that dropout approximately integrates over the models’ weights....
[...]
...Furthermore, our results carry to other variants of dropout as well (such as drop-connect (Wan et al., 2013), multiplicative Gaussian noise (Srivastava et al., 2014), etc.)....
[...]
...In this paper we give a complete theoretical treatment of the link between Gaussian processes and dropout, and develop the tools necessary to represent uncertainty in deep learning....
[...]

Proceedings Article•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

24 May 2019

TL;DR: EfficientNet-B7 as discussed by the authors proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient, which achieves state-of-the-art accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference.

...read moreread less

Abstract: Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at this https URL.

...read moreread less

3,445 citations

Posted Content•

Unsupervised Domain Adaptation by Backpropagation

[...]

Yaroslav Ganin¹, Victor Lempitsky¹•Institutions (1)

Skolkovo Institute of Science and Technology¹

26 Sep 2014-arXiv: Machine Learning

TL;DR: In this paper, a gradient reversal layer is proposed to promote the emergence of deep features that are discriminative for the main learning task on the source domain and invariant with respect to the shift between the domains.

...read moreread less

Abstract: Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of "deep" features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation. Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets.

...read moreread less

3,222 citations

Proceedings Article•DOI•

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

[...]

Adina Williams¹, Nikita Nangia¹, Samuel R. Bowman¹•Institutions (1)

New York University¹

01 Jun 2018

TL;DR: The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

...read moreread less

Abstract: This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. At 433k examples, this resource is one of the largest corpora available for natural language inference (a.k.a. recognizing textual entailment), improving upon available resources in both its coverage and difficulty. MultiNLI accomplishes this by offering data from ten distinct genres of written and spoken English, making it possible to evaluate systems on nearly the full complexity of the language, while supplying an explicit setting for evaluating cross-genre domain adaptation. In addition, an evaluation using existing machine learning models designed for the Stanford NLI corpus shows that it represents a substantially more difficult task than does that corpus, despite the two showing similar levels of inter-annotator agreement.

...read moreread less

3,148 citations

Additional excerpts

...For the CBOW and BiLSTM models, we tune Dropout on the SNLI development set and find that a drop rate of 0.1 works well....
[...]
...We use Dropout (Srivastava et al., 2014) for regularization....
[...]

Proceedings Article•DOI•

A large annotated corpus for learning natural language inference

[...]

Samuel R. Bowman¹, Gabor Angeli¹, Christopher Potts¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

21 Aug 2015

TL;DR: The Stanford Natural Language Inference (SNLI) corpus as discussed by the authors is a large-scale collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.

...read moreread less

Abstract: Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

...read moreread less

3,100 citations

Collapse

Dropout: a simple way to prevent neural networks from overfitting

Citations

Cites background or methods from "Dropout: a simple way to prevent ne..."

Additional excerpts

References

"Dropout: a simple way to prevent ne..." refers methods in this paper

"Dropout: a simple way to prevent ne..." refers methods in this paper

Related Papers (5)

Trending Questions (3)