scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 2015"


Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations


Posted Content
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Abstract: A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

12,857 citations


Proceedings Article
Oriol Vinyals1, Lukasz Kaiser1, Terry Koo1, Slav Petrov1, Ilya Sutskever1, Geoffrey E. Hinton1 
07 Dec 2015
TL;DR: The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.
Abstract: Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

817 citations


Posted Content
TL;DR: This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
Abstract: Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

655 citations


Journal ArticleDOI
TL;DR: The authors propose the use of an unsupervised feature learning algorithm for the analysis of biological tissue imagery and show how this can be very useful because of the paucity of labeled images available at training time.
Abstract: Deep Learning methods aim at learning feature hierarchies. Applications of deep learning to vision tasks date back to convolutional networks in the early 1990s. These methods have been the subject of a recent surge of interest for two main reasons: when labeled data is scarce, unsupervised learning algorithms can learn useful feature hierarchies. When labeled data is abundant, supervised methods can be used to train very large networks on very large datasets through the use of high-performance computers. Such large networks have been shown to outperform previous state-of-theart methods on several perceptual tasks, including categorylevel object recognition, object detection and semantic segmentation. In “Stacked Predictive Sparse Decomposition for Classification of Histology Sections” (doi:10.1007/s11263-0140790-9) the authors propose the use of an unsupervised feature learning algorithm for the analysis of biological tissue imagery. Biomedical applications is one domainwhere unsupervised learning can be very useful because of the paucity of labeled images available at training time.

31 citations


Patent
04 Jun 2015

13 citations