Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Whale: A Unified Distributed Training Framework.

Ang Wang, +5 more

- 18 Nov 2020 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: Whale is the first work that can support various hybrid distributed strategies within one framework and is compatible with TensorFlow and can easily distribute training tasks by adding a few code lines without changing user model code.

...read moreread less

Proceedings ArticleDOI

FedNLP: An interpretable NLP System to Decode Federal Reserve Communications

Jean Lee, +4 more

- 11 Jun 2021 -

arXiv: Computation and Language

TL;DR: FedNLP as mentioned in this paper is an interpretable multi-component Natural Language Processing system to decode Federal Reserve communications, which is designed for end-users to explore how NLP techniques can assist their holistic understanding of the Fed's communications with NO coding.

...read moreread less

Proceedings Article

Text Generation by Learning from Demonstrations

Richard Yuanzhe Pang, +1 more

TL;DR: GOLD as discussed by the authors proposes an easy-to-optimize algorithm that learns from the off-policy demonstrations by importance weighting, which upweights confident tokens and downweights unconfident ones during training.

...read moreread less

Posted Content

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

Noam Wies, +3 more

- 09 May 2021 -

arXiv: Learning

TL;DR: In this paper, the authors theoretically predict the existence of an embedding rank bottleneck that limits the contribution of self-attention width to the Transformer expressivity, and empirically demonstrate that this bottleneck and its implications on the depth-to-width interplay of Transformer architectures, linking the architecture variability across domains to the often glossed-over usage of different vocabulary sizes or embedding ranks.

...read moreread less

Posted Content

Update Frequently, Update Fast: Retraining Semantic Parsing Systems in a Fraction of Time

Vladislav Lialin, +4 more

- 15 Oct 2020 -

arXiv: Computation and Language

TL;DR: The authors proposed a method that alleviates catastrophic forgetting and showed that it is possible to match the performance of a model trained from scratch in less than 10% of a time via fine-tuning.

...read moreread less