Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
Evaluating Large Language Models Trained on Code
Mark Chen,Jerry Tworek,Heewoo Jun,Qiming Yuan,Henrique Ponde de Oliveira Pinto,Jared Kaplan,Harrison Edwards,Yuri Burda,Nicholas Joseph,Greg Brockman,Alex Ray,Raul Puri,Gretchen Krueger,Michael Petrov,Heidy Khlaaf,Girish Sastry,Pamela Mishkin,Brooke Chan,Scott Gray,Nick Ryder,Mikhail Pavlov,Alethea Power,Lukasz Kaiser,Mohammad Bavarian,Clemens Winter,Philippe Tillet,Felipe Petroski Such,Dave Cummings,Matthias Plappert,Fotios Chantzis,Elizabeth A. Barnes,Ariel Herbert-Voss,William H. Guss,Alex Nichol,Alex Paino,Nikolas Tezak,Jie Tang,Igor Babuschkin,Suchir Balaji,Shantanu Jain,William Saunders,Christopher Hesse,Andrew N. Carr,Jan Leike,Joshua Achiam,Vedant Misra,Evan Morikawa,Alec Radford,Matthew M. Knight,Miles Brundage,Mira Murati,Katie Mayer,Peter Welinder,Bob McGrew,Dario Amodei,Samuel McCandlish,Ilya Sutskever,Wojciech Zaremba +57 more
TL;DR: Codex as discussed by the authors is a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities, showing that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts.
Proceedings ArticleDOI
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation
TL;DR: This paper presents a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation and evaluates Data Boost on three diverse text classification tasks under five different classifier architectures.
Posted Content
When Do You Need Billions of Words of Pretraining Data
TL;DR: While the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other, unidentified, forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models.
Posted Content
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
TL;DR: CodeT5 as discussed by the authors proposes a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers, and employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning.
Proceedings ArticleDOI
Few-Shot Conversational Dense Retrieval
TL;DR: ConvDR as mentioned in this paper employs an ad hoc dense retriever as the teacher, inherits its document encodings, and learns a student query encoder to mimic the teacher embeddings on oracle reformulated queries.