scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

TL;DR: This paper proposes the first large-scale language VAE model, Optimus, a universal latent embedding space for sentences that is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks.
Proceedings ArticleDOI

AdapterHub: A Framework for Adapting Transformers

TL;DR: In this paper, the authors propose a framework that allows dynamic "stiching-in" of pre-trained adapters for different tasks and languages, which enables extremely easy and quick adaptation of state-of-the-art pre-training models across tasks.
Proceedings ArticleDOI

FlauBERT: Unsupervised Language Model Pre-training for French

TL;DR: The authors proposed FlauBERT, a model learned on a very large and heterogeneous French corpus and applied it to various NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and showed that most of the time they outperformed other pre-training approaches.
Posted Content

ZeRO-Offload: Democratizing Billion-Scale Model Training

TL;DR: ZeRO-Offload democratizes large-scale model training making it accessible to even data scientists with access to just a single GPU, and combines compute and memory efficiency with ease-of-use.
Journal ArticleDOI

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

TL;DR: This systematic study identifies the state of the art in compression for each part of BERT, clarifies current best practices for compressing large-scale Transformer models, and provides insights into the inner workings of various methods.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.