Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Relative Positional Encoding for Transformers with Linear Complexity

Antoine Liutkus, +5 more

- 18 May 2021 -

arXiv: Learning

TL;DR: Stochastic Positional Encoding (SPE) as discussed by the authors is an alternative to the classical additive (sinusoidal) PE and provably behaves like RPE for the linear-variants of the Transformer.

...read moreread less

Posted Content

Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words.

Valentin Hofmann, +2 more

- 02 Jan 2021 -

arXiv: Computation and Language

TL;DR: The authors showed that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words.

...read moreread less

Posted Content

Boosting Search Engines with Interactive Agents

Leonard Adolphs, +7 more

- 01 Sep 2021 -

arXiv: Computation and Language

TL;DR: The authors used machine reading to guide the selection of refinement terms from aggregated search results and then empowered agents with simple but effective search operators to exert fine-grained and transparent control over queries and search results.

...read moreread less

Proceedings ArticleDOI

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

TL;DR: The Common Crawl Question Answering dataset (CCQA) as mentioned in this paper ) is a large-scale open-domain question-answering dataset with a previously unseen number of around 130 million multilingual question-answer pairs.

...read moreread less

Posted Content

Large Scale Multi-Actor Generative Dialog Modeling

Alex Boyd, +4 more

- 13 May 2020 -

arXiv: Computation and Language

TL;DR: The authors introduced the Generative Conversation Control model, an augmented and fine-tuned GPT-2 language model that conditions on past reference conversations to probabilistically model multi-turn conversations in the actor's persona.

...read moreread less