Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Structural analysis of an all-purpose question answering model.

Vincent Micheli, +3 more

- 13 Apr 2021 -

arXiv: Computation and Language

TL;DR: The authors conducted a structural analysis of a new all-purpose question answering model and observed that attention heads specialize in a particular task and some heads are more conducive to learning than others in both the multi-task and single-task settings.

...read moreread less

Proceedings ArticleDOI

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

TL;DR: Mehta et al. as discussed by the authors presented a paper at the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) on "Long Papers".

...read moreread less

Proceedings Article

A Simple and Effective Positional Encoding for Transformers

Pu-Chin Chen, +5 more

TL;DR: Decoupled Positional Attention for Transformers (DIET) as mentioned in this paper is a simple yet effective mechanism to encode position and segment information into the Transformer models and has faster training and inference time, while achieving competitive performance on GLUE, XTREME and WMT benchmarks.

...read moreread less

Posted Content

Automated question generation and question answering from Turkish texts using text-to-text transformers.

Fatih Cagatay Akyon, +4 more

- 11 Nov 2021 -

arXiv: Learning

TL;DR: The authors fine-tuned a multilingual T5 transformer in a multi-task setting for QA, QG and answer extraction tasks using a Turkish QA dataset and achieved state-of-the-art performance.

...read moreread less

Posted Content

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Shiyang Li, +3 more

- 14 Sep 2021 -

arXiv: Computation and Language

TL;DR: The authors showed that TAPT and self-training can be complementary with simple TFS protocol by following TAPt -> Finetuning -> Self-training (TFS) process.

...read moreread less

Collapse

arXiv: Computation and Language

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Citations

Structural analysis of an all-purpose question answering model.

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

A Simple and Effective Positional Encoding for Transformers

Automated question generation and question answering from Turkish texts using text-to-text transformers.

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Adam: A Method for Stochastic Optimization

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (1)