Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Arij Riabi, +5 more

- 23 Oct 2020 -

arXiv: Computation and Language

TL;DR: It is shown that the proposed method allows to significantly outperform the baselines trained on English data only, and is reported a new state-of-the-art on four datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).

...read moreread less

Posted Content

RoFormer: Enhanced Transformer with Rotary Position Embedding.

Jianlin Su, +4 more

- 20 Apr 2021 -

arXiv: Computation and Language

TL;DR: The authors proposed a rotary position embedding (RoPE) to encode absolute position information with rotation matrix and naturally incorporates explicit relative position dependency in self-attention formulation, which has valuable properties such as flexibility of being expand to any sequence length, decaying inter-token dependency with increasing relative distances, and capability of equipping the linear selfattention with relative position encoding.

...read moreread less

Proceedings ArticleDOI

Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs

Martin Schmitt, +4 more

TL;DR: Graformer, a novel Transformer-based encoder-decoder architecture for graph-to-text generation that learns to weight node-node relations differently for different attention heads, thus virtually learning differently connected views of the input graph.

...read moreread less

Proceedings ArticleDOI

On the importance of pre-training data volume for compact language models

Vincent Micheli, +2 more

TL;DR: This paper study the impact of pre-training data volume on compact language models and show that well-performing models can be obtained with as little as 100 MB of text, and that past critically low amounts of pretraining data, an intermediate pretraining step on the task-specific corpus does not yield substantial improvements.

...read moreread less

Proceedings Article

Mixed-Lingual Pre-training for Cross-lingual Summarization

Ruochen Xu, +4 more

TL;DR: This work proposes a solution based on mixed-lingual pre-training that can leverage the massive monolingual data to enhance its modeling of language and has no task-specific components, which saves memory and increases optimization efficiency.

...read moreread less