One Model To Learn Them All

Open AccessPosted Content

One Model To Learn Them All

Lukasz Kaiser, +6 more

- 16 Jun 2017 -

arXiv: Learning

Chats0

TLDR

It is shown that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all, and that adding a block to the model never hurts performance and in most cases improves it on all tasks.

Abstract:

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

Citations

PDF

Open Access

More filters

Posted Content

Deep Reinforcement Learning: An Overview

Yuxi Li

- 25 Jan 2017 -

arXiv: Learning

TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.

...read moreread less

Journal ArticleDOI

A State-of-the-Art Survey on Deep Learning Theory and Architectures

Zahangir Alom, +9 more

- 05 Mar 2019 -

Electronics

TL;DR: This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network and goes on to cover Convolutional Neural Network, Recurrent Neural Network (RNN), and Deep Reinforcement Learning (DRL).

...read moreread less

Posted Content

CTRL: A Conditional Transformer Language Model for Controllable Generation

Nitish Shirish Keskar, +4 more

- 11 Sep 2019 -

arXiv: Computation and Language

TL;DR: CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation.

...read moreread less

Journal ArticleDOI

A Survey on Deep Learning: Algorithms, Techniques, and Applications

Samira Pouyanfar, +8 more

- 18 Sep 2018 -

ACM Computing Surveys

TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.

...read moreread less

Posted Content

The Natural Language Decathlon: Multitask Learning as Question Answering

Bryan McCann, +3 more

- 28 Aug 2018 -

arXiv: Computation and Language

TL;DR: Presented on August 28, 2018 at 12:15 p.m. in the Pettit Microelectronics Research Center, Room 102 A/B.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014 -

arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

Collapse

One Model To Learn Them All

Citations

Deep Reinforcement Learning: An Overview

A State-of-the-Art Survey on Deep Learning Theory and Architectures

CTRL: A Conditional Transformer Language Model for Controllable Generation

A Survey on Deep Learning: Algorithms, Techniques, and Applications

The Natural Language Decathlon: Multitask Learning as Question Answering

References

Adam: A Method for Stochastic Optimization

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

ImageNet Large Scale Visual Recognition Challenge

Adam: A Method for Stochastic Optimization

Related Papers (5)

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Attention is All you Need

Long short-term memory

ImageNet: A large-scale hierarchical image database