Showing papers by "Geoffrey E. Hinton published in 2015"

PDF

Open Access

Journal Article•DOI•

[...]

Yann LeCun¹, Yann LeCun², Yoshua Bengio³, Geoffrey E. Hinton⁴, Geoffrey E. Hinton⁵ - Show less +1 more•Institutions (5)

New York University¹, Facebook², Université de Montréal³, University of Toronto⁴, Google⁵

28 May 2015-Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

...read moreread less

46,982 citations

Posted Content•

Distilling the Knowledge in a Neural Network

[...]

Geoffrey E. Hinton, Oriol Vinyals, Jeffrey Dean

09 Mar 2015-arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

Abstract: A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

...read moreread less

12,857 citations

Proceedings Article•

Grammar as a foreign language

[...]

Oriol Vinyals¹, Lukasz Kaiser¹, Terry Koo¹, Slav Petrov¹, Ilya Sutskever¹, Geoffrey E. Hinton¹ - Show less +2 more•Institutions (1)

Google¹

07 Dec 2015

TL;DR: The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.

...read moreread less

Abstract: Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

...read moreread less

817 citations

Posted Content•

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

[...]

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

03 Apr 2015-arXiv: Neural and Evolutionary Computing

TL;DR: This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

...read moreread less

Abstract: Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

...read moreread less

655 citations

Journal Article•DOI•

Guest Editorial: Deep Learning

[...]

Marc'Aurelio Ranzato¹, Geoffrey E. Hinton², Yann LeCun³•Institutions (3)

Facebook¹, University of Toronto², Courant Institute of Mathematical Sciences³

01 May 2015-International Journal of Computer Vision

TL;DR: The authors propose the use of an unsupervised feature learning algorithm for the analysis of biological tissue imagery and show how this can be very useful because of the paucity of labeled images available at training time.

...read moreread less

Abstract: Deep Learning methods aim at learning feature hierarchies. Applications of deep learning to vision tasks date back to convolutional networks in the early 1990s. These methods have been the subject of a recent surge of interest for two main reasons: when labeled data is scarce, unsupervised learning algorithms can learn useful feature hierarchies. When labeled data is abundant, supervised methods can be used to train very large networks on very large datasets through the use of high-performance computers. Such large networks have been shown to outperform previous state-of-theart methods on several perceptual tasks, including categorylevel object recognition, object detection and semantic segmentation. In “Stacked Predictive Sparse Decomposition for Classification of Histology Sections” (doi:10.1007/s11263-0140790-9) the authors propose the use of an unsupervised feature learning algorithm for the analysis of biological tissue imagery. Biomedical applications is one domainwhere unsupervised learning can be very useful because of the paucity of labeled images available at training time.

...read moreread less

31 citations

Patent•

Training distilled machine learning models

[...]

Oriol Vinyals¹, Jeffrey Dean¹, Geoffrey E. Hinton¹•Institutions (1)

Google¹

04 Jun 2015

13 citations