scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

TL;DR: This paper presents a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation and evaluates Data Boost on three diverse text classification tasks under five different classifier architectures.
Posted Content

When Do You Need Billions of Words of Pretraining Data

TL;DR: While the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other, unidentified, forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models.
Posted Content

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

TL;DR: CodeT5 as discussed by the authors proposes a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers, and employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning.
Proceedings ArticleDOI

Few-Shot Conversational Dense Retrieval

TL;DR: ConvDR as mentioned in this paper employs an ad hoc dense retriever as the teacher, inherits its document encodings, and learns a student query encoder to mimic the teacher embeddings on oracle reformulated queries.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.