Open AccessPosted Content
One Model To Learn Them All
Lukasz Kaiser,Aidan N. Gomez,Noam Shazeer,Ashish Vaswani,Niki Parmar,Llion Jones,Jakob Uszkoreit +6 more
Reads0
Chats0
TLDR
It is shown that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all, and that adding a block to the model never hurts performance and in most cases improves it on all tasks.Abstract:
Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.read more
Citations
More filters
Posted Content
Deep Reinforcement Learning: An Overview
TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.
Journal ArticleDOI
A State-of-the-Art Survey on Deep Learning Theory and Architectures
Zahangir Alom,Tarek M. Taha,Chris Yakopcic,Stefan Westberg,Paheding Sidike,Mst Shamima Nasrin,Mahmudul Hasan,Brian Van Essen,Abdul A. S. Awwal,Vijayan K. Asari +9 more
TL;DR: This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network and goes on to cover Convolutional Neural Network, Recurrent Neural Network (RNN), and Deep Reinforcement Learning (DRL).
Posted Content
CTRL: A Conditional Transformer Language Model for Controllable Generation
TL;DR: CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation.
Journal ArticleDOI
A Survey on Deep Learning: Algorithms, Techniques, and Applications
Samira Pouyanfar,Saad Sadiq,Yilin Yan,Haiman Tian,Yudong Tao,Maria Presa Reyes,Mei-Ling Shyu,Shu-Ching Chen,S. Sitharama Iyengar +8 more
TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.
Posted Content
The Natural Language Decathlon: Multitask Learning as Question Answering
TL;DR: Presented on August 28, 2018 at 12:15 p.m. in the Pettit Microelectronics Research Center, Room 102 A/B.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.