Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Proceedings ArticleDOI
Improving Multilingual Models with Language-Clustered Vocabularies
TL;DR: This work introduces a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocABularies.
Proceedings ArticleDOI
Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
Andrea Madotto,Samuel Cahyawijaya,Genta Indra Winata,Yan Xu,Zihan Liu,Zhaojiang Lin,Pascale Fung +6 more
TL;DR: This paper proposes a method to embed the KB, of any size, directly into the model parameters, which does not require any DST or template responses, nor the KB as input, and it can dynamically update its KB via fine-tuning.
Posted Content
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing.
Ahmed Elnaggar,Michael Heinzinger,Christian Dallago,Ghalia Rihawi,Yu Wang,Llion Jones,Tom Gibbs,Tamas Feher,Christoph Angerer,Martin Steinegger,Debsindhu Bhowmik,Burkhard Rost +11 more
TL;DR: CodeTrans as discussed by the authors is an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoderdecoder transformers for six software engineering tasks, including thirteen sub-tasks.
Proceedings ArticleDOI
Exploring and Predicting Transferability across NLP Tasks
Tu Vu,Tong Wang,Tsendsuren Munkhdalai,Alessandro Sordoni,Adam Trischler,Andrew Mattarella-Micke,Subhransu Maji,Mohit Iyyer +7 more
TL;DR: The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task.
Posted Content
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao,Li Dong,Furu Wei +2 more
TL;DR: Li et al. as mentioned in this paper proposed a self-supervised vision representation model called BEiT, which stands for Bidirectional Encoder representation from Image Transformers, where each image has two views in pre-training, image patches (such as 16x16 pixels) and visual tokens (i.e., discrete tokens).