Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

How Can We Accelerate Progress Towards Human-like Linguistic Generalization?.

Tal Linzen

- 03 May 2020 -

arXiv: Computation and Language

TL;DR: This position paper describes and critiques the Pretraining-Agnostic Identically Distributed (PAID) evaluation paradigm, and advocates for supplementing or replacing PAID with paradigms that reward architectures that generalize as quickly and robustly as humans.

...read moreread less

Proceedings ArticleDOI

BioMegatron: Larger Biomedical Domain Language Model

Hoo-Chang Shin, +6 more

TL;DR: This work empirically study and evaluate several factors that can affect performance on domain language applications, such as the sub-word vocabulary set, model size, pre-training corpus, and domain transfer.

...read moreread less

Proceedings ArticleDOI

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training

Yizhe Zhang, +5 more

TL;DR: POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation, which achieves state-of-the-art performance on constrained text generation.

...read moreread less

Posted Content

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

Vishvak Murahari, +3 more

- 05 Dec 2019 -

arXiv: Learning

TL;DR: This work adapts the recently proposed ViLBERT model for multi-turn visually-grounded conversations and finds that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG but hurts MRR, highlighting a trade-off between the two primary metrics.

...read moreread less

Proceedings ArticleDOI

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

Yang Xu, +11 more

TL;DR: In this article, a two-stream multi-modal Transformer encoder is proposed to model the interaction among text, layout, and image in a single multimodal framework.

...read moreread less