Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models

Karl Fredrik Erliksson, +3 more

TL;DR: This article proposed a three-step training procedure to improve explanation quality by up to 7% and avoid sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.

...read moreread less

Proceedings ArticleDOI

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

Tomáš Nekvinda, +1 more

TL;DR: In this article, the authors identify inconsistencies in data preprocessing and re-porting of three corpus-based metrics used on the MultiWOZ dataset, i.e., BLEU score and Inform &Success rates.

...read moreread less

Proceedings ArticleDOI

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TL;DR: The authors proposed a token-aware contrastive learning (TaCL) model, which encourages BERT to learn an isotropic and discriminative distribution of token representations for NER tasks.

...read moreread less

Posted Content

Prompting Contrastive Explanations for Commonsense Reasoning Tasks.

Bhargavi Paranjape, +4 more

- 12 Jun 2021 -

arXiv: Computation and Language

TL;DR: The authors used pre-trained language models to generate contrastive explanations for commonsense reasoning tasks, which are judged by humans to be more relevant for solving the task and facilitate a novel method to evaluate explanation faithfulfness.

...read moreread less

Posted Content

Language Models are Few-shot Multilingual Learners

Genta Indra Winata, +5 more

- 16 Sep 2021 -

arXiv: Computation and Language

TL;DR: The authors showed that given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are significantly better than random prediction.

...read moreread less