scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs.

TL;DR: It is proposed that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents, and a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them is proposed.
Posted Content

ReZero is All You Need: Fast Convergence at Large Depth.

TL;DR: This work shows that the simplest architecture change of gating each residual connection using a single zero-initialized parameter satisfies initial dynamical isometry and outperforms more complex approaches and is applied to language modeling and finds that it can easily train 120-layer Transformers.
Proceedings ArticleDOI

Prefix-Tuning: Optimizing Continuous Prompts for Generation

TL;DR: The authors propose prefix-tuning, a lightweight alternative to finetuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which they call the prefix.
Proceedings ArticleDOI

Document Ranking with a Pretrained Sequence-to-Sequence Model

TL;DR: Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.
Proceedings ArticleDOI

Reformulating Unsupervised Style Transfer as Paraphrase Generation.

TL;DR: This paper reformulates unsupervised style transfer as a paraphrase generation problem, and presents a simple methodology based on fine-tuning pretrained language models on automatically generated paraphrase data that significantly outperforms state-of-the-art style transfer systems on both human and automatic evaluations.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.