Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Improving Multilingual Models with Language-Clustered Vocabularies

Hyung Won Chung, +3 more

TL;DR: This work introduces a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocABularies.

...read moreread less

Proceedings ArticleDOI

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

Andrea Madotto, +6 more

TL;DR: This paper proposes a method to embed the KB, of any size, directly into the model parameters, which does not require any DST or template responses, nor the KB as input, and it can dynamically update its KB via fine-tuning.

...read moreread less

Posted Content

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing.

Ahmed Elnaggar, +11 more

- 06 Apr 2021 -

arXiv: Learning

TL;DR: CodeTrans as discussed by the authors is an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoderdecoder transformers for six software engineering tasks, including thirteen sub-tasks.

...read moreread less

Proceedings ArticleDOI

Exploring and Predicting Transferability across NLP Tasks

Tu Vu, +7 more

TL;DR: The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task.

...read moreread less

Posted Content

BEiT: BERT Pre-Training of Image Transformers

Hangbo Bao, +2 more

- 15 Jun 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Li et al. as mentioned in this paper proposed a self-supervised vision representation model called BEiT, which stands for Bidirectional Encoder representation from Image Transformers, where each image has two views in pre-training, image patches (such as 16x16 pixels) and visual tokens (i.e., discrete tokens).

...read moreread less