scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models

TL;DR: This article proposed a three-step training procedure to improve explanation quality by up to 7% and avoid sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.
Proceedings ArticleDOI

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

TL;DR: In this article, the authors identify inconsistencies in data preprocessing and re-porting of three corpus-based metrics used on the MultiWOZ dataset, i.e., BLEU score and Inform &Success rates.
Proceedings ArticleDOI

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TL;DR: The authors proposed a token-aware contrastive learning (TaCL) model, which encourages BERT to learn an isotropic and discriminative distribution of token representations for NER tasks.
Posted Content

Prompting Contrastive Explanations for Commonsense Reasoning Tasks.

TL;DR: The authors used pre-trained language models to generate contrastive explanations for commonsense reasoning tasks, which are judged by humans to be more relevant for solving the task and facilitate a novel method to evaluate explanation faithfulfness.
Posted Content

Language Models are Few-shot Multilingual Learners

TL;DR: The authors showed that given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are significantly better than random prediction.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.