scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Structural analysis of an all-purpose question answering model.

TL;DR: The authors conducted a structural analysis of a new all-purpose question answering model and observed that attention heads specialize in a particular task and some heads are more conducive to learning than others in both the multi-task and single-task settings.
Proceedings ArticleDOI

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

TL;DR: Mehta et al. as discussed by the authors presented a paper at the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) on "Long Papers".
Proceedings Article

A Simple and Effective Positional Encoding for Transformers

TL;DR: Decoupled Positional Attention for Transformers (DIET) as mentioned in this paper is a simple yet effective mechanism to encode position and segment information into the Transformer models and has faster training and inference time, while achieving competitive performance on GLUE, XTREME and WMT benchmarks.
Posted Content

Automated question generation and question answering from Turkish texts using text-to-text transformers.

TL;DR: The authors fine-tuned a multilingual T5 transformer in a multi-task setting for QA, QG and answer extraction tasks using a Turkish QA dataset and achieved state-of-the-art performance.
Posted Content

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

TL;DR: The authors showed that TAPT and self-training can be complementary with simple TFS protocol by following TAPt -> Finetuning -> Self-training (TFS) process.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.