scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

TL;DR: The authors proposed an approach that automatically finds a mapping between words and labels given a small amount of training data, and found that the mapping found by their approach performs almost as well as hand-crafted label-to-word mappings.
Proceedings ArticleDOI

Plug-and-Play Conversational Models.

TL;DR: This article proposed and evaluated plug-and-play methods for controllable response generation, which does not require dialogue specific datasets and does not rely on fine-tuning a large model.
Posted Content

Measuring Systematic Generalization in Neural Proof Generation with Transformers

TL;DR: It is observed that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs, which suggests that Transformers have efficient internal reasoning strategies that are harder to interpret.
Posted Content

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

TL;DR: In this article, a Video-Audio-Text Transformer (VATT) is proposed to learn multimodal representations from unlabeled data using convolution-free Transformer architectures.
Proceedings ArticleDOI

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

TL;DR: This article proposed a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text, which is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.