Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Exploring the Limits of Out-of-Distribution Detection

Stanislav Fort, +2 more

- 06 Jun 2021 -

arXiv: Learning

TL;DR: In this article, a few-shot outlier exposure setting where a few examples from outlier classes may be available was explored, and a large-scale pre-trained transformers were used to improve the state-of-the-art on a range of near OOD tasks across different data modalities.

...read moreread less

Posted Content

Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh, +40 more

- 15 Oct 2021 -

arXiv: Learning

TL;DR: This article developed a system for easily mapping general natural language tasks into a human-readable prompted form, and fine-tuned a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.

...read moreread less

Proceedings ArticleDOI

TeaForN: Teacher-Forcing with N-grams

Sebastian Goodman, +2 more

TL;DR: Teacher-Forcing with N-grams (TeaForN) as discussed by the authors uses a stack of N decoders trained to decode along a secondary time axis that allows model-parameter updates based on N prediction steps.

...read moreread less

Proceedings Article

A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion.

Logan Lebanoff, +4 more

TL;DR: The authors presented an empirical study in favor of a cascade architecture to neural text summarization and showed that the performance of a cascaded pipeline that separately identifies important content pieces and stitches them together into a coherent text is comparable to or outranks that of end-to-end systems.

...read moreread less

Posted Content

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering.

Man Luo, +3 more

- 09 Sep 2021 -

arXiv: Computation and Language

TL;DR: In this paper, a visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. And they introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction.

...read moreread less