Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing

TL;DR: This paper extended a sequence-to-sequence model with a retrieval component, which is used to retrieve existing similar samples and present them as an additional context to the model, and analyzed the quality, model sensitivity, and performance of the nearest neighbor retrieval component's for semantic parses of varied utterance complexity.

...read moreread less

Book ChapterDOI

High Performance Computing for Understanding Natural Language

Marija Stanojevic, +2 more

TL;DR: This chapter gives an overview of state-of-the-art natural language processing problems, algorithms, models, and libraries, and details of a few specific applications that use pre-training or self-supervised learning on large amounts of data in text understanding.

...read moreread less

Proceedings Article

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce

Song Xu, +7 more

TL;DR: The authors proposed a knowledge-injected pre-trained language model based on the encoder-decoder transformer that can be transferred to both natural language understanding and generation tasks, which can learn a diverse set of domain-specific knowledge.

...read moreread less

Journal ArticleDOI

CINS: Comprehensive Instruction for Few-Shot Learning in Task-Oriented Dialog Systems

TL;DR: The authors proposed Comprehensive Instruction (CINS) that exploits pre-trained language models with extra task-specific instructions to better utilize the power of PLMs for few-shot learning in task-oriented dialog.

...read moreread less

Posted Content

Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English

Xiangyang Li, +4 more

- 07 Jan 2021 -

arXiv: Computation and Language

TL;DR: This article proposed an ensemble method of different pre-trained language models such as BERT, Roberta, Ernie, etc. with various training strategies including warm-up, learning rate schedule and k-fold cross-validation.

...read moreread less

Collapse

arXiv: Computation and Language

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Citations

RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing

High Performance Computing for Understanding Natural Language

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce

CINS: Comprehensive Instruction for Few-Shot Learning in Task-Oriented Dialog Systems

Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Adam: A Method for Stochastic Optimization

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (1)