scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Improving Multilingual Models with Language-Clustered Vocabularies

TL;DR: This work introduces a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocABularies.
Proceedings ArticleDOI

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

TL;DR: This paper proposes a method to embed the KB, of any size, directly into the model parameters, which does not require any DST or template responses, nor the KB as input, and it can dynamically update its KB via fine-tuning.
Posted Content

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing.

TL;DR: CodeTrans as discussed by the authors is an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoderdecoder transformers for six software engineering tasks, including thirteen sub-tasks.
Proceedings ArticleDOI

Exploring and Predicting Transferability across NLP Tasks

TL;DR: The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task.
Posted Content

BEiT: BERT Pre-Training of Image Transformers

TL;DR: Li et al. as mentioned in this paper proposed a self-supervised vision representation model called BEiT, which stands for Bidirectional Encoder representation from Image Transformers, where each image has two views in pre-training, image patches (such as 16x16 pixels) and visual tokens (i.e., discrete tokens).
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.