Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Element Intervention for Open Relation Extraction

Fangchao Liu, +4 more

TL;DR: Zhang et al. as discussed by the authors revisited the procedure of OpenRE from a causal view by formulating OpenRE using a structural causal model to identify that the above-mentioned problems stem from the spurious correlations from entities and context to the relation type.

...read moreread less

Posted Content

BERT & Family Eat Word Salad: Experiments with Text Understanding

Ashim Gupta, +2 more

- 10 Jan 2021 -

arXiv: Computation and Language

TL;DR: This paper study the response of large models from the BERT family to incoherent inputs that should confuse any model that claims to understand natural language, and show that models are explicitly trained to recognize invalid inputs, they can be robust to such attacks without a drop in performance.

...read moreread less

Proceedings ArticleDOI

Controlling Industrial Robots with High-Level Verbal Commands

Dongkyu Choi, +4 more

TL;DR: In this paper, a pre-trained language model is fine-tuned for translating verbal instructions into robot tasks, better than other semantic parsing methods, and the system is capable of handling through dialogue a variety of exceptions that happen during human-robot interaction including unknown tasks, user interruption, and changes in the world state.

...read moreread less

Posted Content

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation.

Zhoujun Cheng, +8 more

- 15 Aug 2021 -

arXiv: Computation and Language

TL;DR: The HiTab dataset as mentioned in this paper provides fine-grained annotations on both entity and quantity alignment, which helps models to largely reduce spurious predictions in the QA task and also helps NLG models to generate better results in a conditional generation setting.

...read moreread less

Posted Content

Augmented Natural Language for Generative Sequence Labeling

Ben Athiwaratkun, +3 more

- 15 Sep 2020 -

arXiv: Computation and Language

TL;DR: This paper proposed a generative framework for joint sequence labeling and sentence-level classification using a single, shared natural language output space, which achieved state-of-the-art performance on several NER tasks.

...read moreread less