Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings Article

FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models.

Rakesh Chada, +1 more

TL;DR: This article propose a simple fine-tuning framework that leverages pre-trained text-to-text models and is directly aligned with their pre-training framework, which constructs the input as a concatenation of the question, a mask token representing the answer span and a context.

...read moreread less

Proceedings Article

MS\^2: Multi-Document Summarization of Medical Studies.

Jay DeYoung, +4 more

TL;DR: MS2 as mentioned in this paper is a dataset of over 470k documents and 20k summaries derived from the scientific literature, which facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.

...read moreread less

Posted Content

Good-Enough Example Extrapolation

Jason Wei

- 12 Sep 2021 -

arXiv: Computation and Language

TL;DR: This article proposed a simple data augmentation protocol called "good-enough example extrapolation" (GE3), which extrapolates the hidden space distribution of text examples from one class onto another.

...read moreread less

Proceedings Article

How much pretraining data do language models need to learn syntax

Laura Pérez-Mayos, +2 more

TL;DR: The authors explored the impact of pretraining data size on the knowledge of the models and compared the performance of different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification.

...read moreread less

Posted Content

Darmok and Jalad at Tanagra: A Dataset and Model for English-to-Tamarian Translation.

Peter Jansen

- 16 Jul 2021 -

arXiv: Computation and Language

TL;DR: This article constructed a parallel corpus of 456 English-Tamarian utterances from the original Star Trek episode Darmok and several follow-on novels, and trained a machine translation system based on a large language model (T5) to translate from English to Tamarian.

...read moreread less