Showing papers by "Geoffrey E. Hinton published in 2014"

PDF

Open Access

Journal Article•

Dropout: a simple way to prevent neural networks from overfitting

[...]

Nitish Srivastava¹, Geoffrey E. Hinton¹, Alex Krizhevsky¹, Ilya Sutskever¹, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

33,597 citations

Book•DOI•

Parallel Models of Associative Memory : Updated Edition

[...]

Geoffrey E. Hinton, James A. Anderson

25 Feb 2014

TL;DR: This chapter discusses G.E. Hinton's models of Information Processing in the Brain, Implementing Semantic Networks in Parallel Hardware, and R. Ratcliff's Parallel-Processing Mechanisms and Processing of Organized Information in Human Memory.

...read moreread less

799 citations

Journal Article•DOI•

Application of Deep Belief Networks for natural language understanding

[...]

Ruhi Sarikaya¹, Geoffrey E. Hinton², Anoop Deoras¹•Institutions (2)

Microsoft¹, University of Toronto²

01 Apr 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models, however, using additional unlabeled data for DBN pre-training and combining Dbn-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

...read moreread less

Abstract: Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called Contrastive Divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: Support Vector Machines (SVM), boosting and Maximum Entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre-training and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

...read moreread less

430 citations

Posted Content•

Grammar as a Foreign Language

[...]

Oriol Vinyals¹, Lukasz Kaiser¹, Terry Koo¹, Slav Petrov¹, Ilya Sutskever¹, Geoffrey E. Hinton¹ - Show less +2 more•Institutions (1)

Google¹

23 Dec 2014-arXiv: Computation and Language

TL;DR: This paper proposed a domain agnostic attention-enhanced sequence-to-sequence model for syntactic constituency parsing, which achieved state-of-the-art results on the most widely used syntactical constituency parsing dataset, when trained on a large synthetic corpus annotated using existing parsers.

...read moreread less

Abstract: Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

...read moreread less

97 citations

Journal Article•DOI•

Where Do Features Come From

[...]

Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Aug 2014-Cognitive Science

TL;DR: Using a stack of RBMs to initialize the weights of a feedforward neural network allows backpropagation to work effectively in much deeper networks and it leads to much better generalization.

...read moreread less

88 citations

Book Chapter•DOI•

Implementing Semantic Networks in Parallel Hardware

[...]

Geoffrey E. Hinton, James A. Anderson

25 Feb 2014

37 citations

Proceedings Article•DOI•

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models.

[...]

Navdeep Jaitly¹, Vincent Vanhoucke², Geoffrey E. Hinton¹•Institutions (2)

University of Toronto¹, Google²

14 Sep 2014

TL;DR: A simple but effective way of using multi-frame targets to improve the accuracy of Artificial Neural NetworkHidden Markov Model (ANN-HMM) hybrid systems.

...read moreread less

Abstract: We describe a simple but effective way of using multi-frame targets to improve the accuracy of Artificial Neural NetworkHidden Markov Model (ANN-HMM) hybrid systems. In this approach a Deep Neural Network (DNN) is trained to predict the forced-alignment state of multiple frames using a separate softmax unit for each of the frames. This is in contrast to the usual method of training a DNN to predict only the state of the central frame. By itself this is not sufficient to improve accuracy of the system significantly. However, if we average the predictions for each frame from the different contexts it is associated with we achieve state of the art results on TIMIT using a fully connected Deep Neural Network without convolutional architectures or dropout training. On a 14 hour subset of Wall Street Journal (WSJ) using a context dependent DNN-HMM system it leads to a relative improvement of 6.4% on the dev set (testdev93) and 9.3% on test set (test-eval92).

...read moreread less

19 citations