Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
Exploring the Limits of Out-of-Distribution Detection
TL;DR: In this article, a few-shot outlier exposure setting where a few examples from outlier classes may be available was explored, and a large-scale pre-trained transformers were used to improve the state-of-the-art on a range of near OOD tasks across different data modalities.
Posted Content
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh,Albert Webson,Colin Raffel,Stephen H. Bach,Lintang Sutawika,Zaid Alyafeai,Antoine Chaffin,Arnaud Stiegler,Teven Le Scao,Arun Raja,Manan Dey,M Saiful Bari,Canwen Xu,Urmish Thakker,Shanya Sharma,Eliza Szczechla,Taewoon Kim,Gunjan Chhablani,Nihal V. Nayak,Debajyoti Datta,Jonathan Chang,Mike Tian-Jian Jiang,Han Wang,Matteo Manica,Sheng Shen,Zheng Xin Yong,Harshit Pandey,Rachel Bawden,Tom Wang,Trishala Neeraj,Jos Rozen,Abheesht Sharma,Andrea Santilli,Thibault Févry,Jason A. Fries,Ryan Teehan,Stella Biderman,Leo Gao,Tali Bers,Thomas Wolf,Alexander M. Rush +40 more
TL;DR: This article developed a system for easily mapping general natural language tasks into a human-readable prompted form, and fine-tuned a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
Proceedings ArticleDOI
TeaForN: Teacher-Forcing with N-grams
TL;DR: Teacher-Forcing with N-grams (TeaForN) as discussed by the authors uses a stack of N decoders trained to decode along a secondary time axis that allows model-parameter updates based on N prediction steps.
Proceedings Article
A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion.
TL;DR: The authors presented an empirical study in favor of a cascade architecture to neural text summarization and showed that the performance of a cascaded pipeline that separately identifies important content pieces and stitches them together into a coherent text is comparable to or outranks that of end-to-end systems.
Posted Content
Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering.
TL;DR: In this paper, a visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. And they introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction.