Open AccessJournal Article
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.Abstract:
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.read more
Citations
More filters
Proceedings ArticleDOI
Deep Neural Decision Forests
TL;DR: Deep Neural Decision Forests as discussed by the authors proposes a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network.
Book ChapterDOI
Deep Domain Generalization via Conditional Invariant Adversarial Networks
TL;DR: This work proposes an end-to-end conditional invariant deep domain generalization approach by leveraging deep neural networks for domain-invariant representation learning and proves the effectiveness of the proposed method.
Journal ArticleDOI
Deep Learning Methods for Improved Decoding of Linear Codes
TL;DR: It is shown that deep learning methods can be used to improve a standard belief propagation decoder, and that tying the parameters of the decoders across iterations, so as to form a recurrent neural network architecture, can be implemented with comparable results.
Posted Content
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari,Pratik Chaudhari,Anna Choromanska,Stefano Soatto,Yann LeCun,Yann LeCun,Carlo Baldassi,Carlo Baldassi,Christian Borgs,Jennifer Chayes,Levent Sagun,Riccardo Zecchina,Riccardo Zecchina +12 more
TL;DR: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape and compares favorably to state-of-the-art techniques in terms of generalization error and training time.
Journal ArticleDOI
Text Data Augmentation for Deep Learning.
TL;DR: A survey of data augmentation for text data can be found in this article, where the major motifs of Data Augmentation are summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI
Reducing the Dimensionality of Data with Neural Networks
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Journal ArticleDOI
A fast learning algorithm for deep belief nets
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Dissertation
Learning Multiple Layers of Features from Tiny Images
TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.