Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

Timo Schick, +2 more

TL;DR: The authors proposed an approach that automatically finds a mapping between words and labels given a small amount of training data, and found that the mapping found by their approach performs almost as well as hand-crafted label-to-word mappings.

...read moreread less

Proceedings ArticleDOI

Plug-and-Play Conversational Models.

Andrea Madotto, +4 more

TL;DR: This article proposed and evaluated plug-and-play methods for controllable response generation, which does not require dialogue specific datasets and does not rely on fine-tuning a large model.

...read moreread less

Posted Content

Measuring Systematic Generalization in Neural Proof Generation with Transformers

Nicolas Gontier, +3 more

- 30 Sep 2020 -

arXiv: Learning

TL;DR: It is observed that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs, which suggests that Transformers have efficient internal reasoning strategies that are harder to interpret.

...read moreread less

Posted Content

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari, +6 more

- 22 Apr 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a Video-Audio-Text Transformer (VATT) is proposed to learn multimodal representations from unlabeled data using convolution-free Transformer architectures.

...read moreread less

Proceedings ArticleDOI

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, +3 more

TL;DR: This article proposed a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text, which is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms.

...read moreread less

Collapse

arXiv: Computation and Language

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Citations

Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

Plug-and-Play Conversational Models.

Measuring Systematic Generalization in Neural Proof Generation with Transformers

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Adam: A Method for Stochastic Optimization

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (1)