Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

Chunyuan Li, +6 more

TL;DR: This paper proposes the first large-scale language VAE model, Optimus, a universal latent embedding space for sentences that is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks.

...read moreread less

Proceedings ArticleDOI

AdapterHub: A Framework for Adapting Transformers

Jonas Pfeiffer, +7 more

TL;DR: In this paper, the authors propose a framework that allows dynamic "stiching-in" of pre-trained adapters for different tasks and languages, which enables extremely easy and quick adaptation of state-of-the-art pre-training models across tasks.

...read moreread less

Proceedings ArticleDOI

FlauBERT: Unsupervised Language Model Pre-training for French

Hang Le, +9 more

TL;DR: The authors proposed FlauBERT, a model learned on a very large and heterogeneous French corpus and applied it to various NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and showed that most of the time they outperformed other pre-training approaches.

...read moreread less

Posted Content

ZeRO-Offload: Democratizing Billion-Scale Model Training

Jie Ren, +7 more

- 18 Jan 2021 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU, and combines compute and memory efficiency with ease-of-use.

...read moreread less

Journal ArticleDOI

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Prakhar Ganesh, +8 more

- 27 Feb 2020 -

arXiv: Learning

TL;DR: This systematic study identifies the state of the art in compression for each part of BERT, clarifies current best practices for compressing large-scale Transformer models, and provides insights into the inner workings of various methods.

...read moreread less