Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs.

Jena D. Hwang, +6 more

- 12 Oct 2020 -

arXiv: Computation and Language

TL;DR: It is proposed that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents, and a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them is proposed.

...read moreread less

Posted Content

ReZero is All You Need: Fast Convergence at Large Depth.

Thomas C. Bachlechner, +4 more

- 10 Mar 2020 -

arXiv: Learning

TL;DR: This work shows that the simplest architecture change of gating each residual connection using a single zero-initialized parameter satisfies initial dynamical isometry and outperforms more complex approaches and is applied to language modeling and finds that it can easily train 120-layer Transformers.

...read moreread less

Proceedings ArticleDOI

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li, +1 more

TL;DR: The authors propose prefix-tuning, a lightweight alternative to finetuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which they call the prefix.

...read moreread less

Proceedings ArticleDOI

Document Ranking with a Pretrained Sequence-to-Sequence Model

Rodrigo Nogueira, +2 more

TL;DR: Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.

...read moreread less

Proceedings ArticleDOI

Reformulating Unsupervised Style Transfer as Paraphrase Generation.

Kalpesh Krishna, +2 more

TL;DR: This paper reformulates unsupervised style transfer as a paraphrase generation problem, and presents a simple methodology based on fine-tuning pretrained language models on automatically generated paraphrase data that significantly outperforms state-of-the-art style transfer systems on both human and automatic evaluations.

...read moreread less