scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 2014"


Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations


BookDOI
25 Feb 2014
TL;DR: This chapter discusses G.E. Hinton's models of Information Processing in the Brain, Implementing Semantic Networks in Parallel Hardware, and R. Ratcliff's Parallel-Processing Mechanisms and Processing of Organized Information in Human Memory.

799 citations


Journal ArticleDOI
TL;DR: The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models, however, using additional unlabeled data for DBN pre-training and combining Dbn-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.
Abstract: Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called Contrastive Divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: Support Vector Machines (SVM), boosting and Maximum Entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre-training and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

430 citations


Posted Content
Oriol Vinyals1, Lukasz Kaiser1, Terry Koo1, Slav Petrov1, Ilya Sutskever1, Geoffrey E. Hinton1 
TL;DR: This paper proposed a domain agnostic attention-enhanced sequence-to-sequence model for syntactic constituency parsing, which achieved state-of-the-art results on the most widely used syntactical constituency parsing dataset, when trained on a large synthetic corpus annotated using existing parsers.
Abstract: Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

97 citations


Journal ArticleDOI
TL;DR: Using a stack of RBMs to initialize the weights of a feedforward neural network allows backpropagation to work effectively in much deeper networks and it leads to much better generalization.

88 citations



Proceedings ArticleDOI
14 Sep 2014
TL;DR: A simple but effective way of using multi-frame targets to improve the accuracy of Artificial Neural NetworkHidden Markov Model (ANN-HMM) hybrid systems.
Abstract: We describe a simple but effective way of using multi-frame targets to improve the accuracy of Artificial Neural NetworkHidden Markov Model (ANN-HMM) hybrid systems. In this approach a Deep Neural Network (DNN) is trained to predict the forced-alignment state of multiple frames using a separate softmax unit for each of the frames. This is in contrast to the usual method of training a DNN to predict only the state of the central frame. By itself this is not sufficient to improve accuracy of the system significantly. However, if we average the predictions for each frame from the different contexts it is associated with we achieve state of the art results on TIMIT using a fully connected Deep Neural Network without convolutional architectures or dropout training. On a 14 hour subset of Wall Street Journal (WSJ) using a context dependent DNN-HMM system it leads to a relative improvement of 6.4% on the dev set (testdev93) and 9.3% on test set (test-eval92).

19 citations