Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Proceedings ArticleDOI
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
TL;DR: This paper proposes the first large-scale language VAE model, Optimus, a universal latent embedding space for sentences that is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks.
Proceedings ArticleDOI
AdapterHub: A Framework for Adapting Transformers
Jonas Pfeiffer,Andreas Rücklé,Clifton Poth,Aishwarya Kamath,Ivan Vulić,Sebastian Ruder,Kyunghyun Cho,Iryna Gurevych +7 more
TL;DR: In this paper, the authors propose a framework that allows dynamic "stiching-in" of pre-trained adapters for different tasks and languages, which enables extremely easy and quick adaptation of state-of-the-art pre-training models across tasks.
Proceedings ArticleDOI
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le,Loïc Vial,Jibril Frej,Vincent Segonne,Maximin Coavoux,Benjamin Lecouteux,Alexandre Allauzen,Benoît Crabbé,Laurent Besacier,Didier Schwab +9 more
TL;DR: The authors proposed FlauBERT, a model learned on a very large and heterogeneous French corpus and applied it to various NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and showed that most of the time they outperformed other pre-training approaches.
Posted Content
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren,Samyam Rajbhandari,Reza Yazdani Aminabadi,Olatunji Ruwase,Shuangyan Yang,Minjia Zhang,Dong Li,Yuxiong He +7 more
TL;DR: ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU, and combines compute and memory efficiency with ease-of-use.
Journal ArticleDOI
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh,Yao Chen,Xin Lou,Mohammad Ali Khan,Yin Yang,Hassan Sajjad,Preslav Nakov,Deming Chen,Marianne Winslett +8 more
TL;DR: This systematic study identifies the state of the art in compression for each part of BERT, clarifies current best practices for compressing large-scale Transformer models, and provides insights into the inner workings of various methods.