Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Evaluating Large Language Models Trained on Code

Mark Chen, +57 more

- 07 Jul 2021 -

arXiv: Learning

TL;DR: Codex as discussed by the authors is a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities, showing that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts.

...read moreread less

Proceedings ArticleDOI

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Ruibo Liu, +5 more

TL;DR: This paper presents a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation and evaluates Data Boost on three diverse text classification tasks under five different classifier architectures.

...read moreread less

Posted Content

When Do You Need Billions of Words of Pretraining Data

Yian Zhang, +3 more

- 10 Nov 2020 -

arXiv: Computation and Language

TL;DR: While the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other, unidentified, forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models.

...read moreread less

Posted Content

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Yue Wang, +3 more

- 02 Sep 2021 -

arXiv: Computation and Language

TL;DR: CodeT5 as discussed by the authors proposes a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers, and employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning.

...read moreread less

Proceedings ArticleDOI

Few-Shot Conversational Dense Retrieval

Shi Yu, +4 more

- 10 May 2021 -

arXiv: Information Retrieval

TL;DR: ConvDR as mentioned in this paper employs an ad hoc dense retriever as the teacher, inherits its document encodings, and learns a student query encoder to mimic the teacher embeddings on oracle reformulated queries.

...read moreread less