Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models

Sha Yuan, +8 more

TL;DR: A super large-scale Chinese corpora WuDaoCorpora, containing about 3TB training data and 1.08 trillion Chinese characters, is introduced and the results show that the models trained on this corpora can achieve excellent performance in Chinese.

...read moreread less

Data Augmentation using Pre-trained Transformer Models

Varun Kumar, +2 more

TL;DR: This article showed that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation, and also showed that pre-learned Seq2Seq model outperforms other data augmentation methods in a low-resource setting.

...read moreread less

Proceedings ArticleDOI

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

Haejun Lee, +3 more

TL;DR: Sentence-level Language Modeling is introduced, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original ordering.

...read moreread less

Posted Content

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic.

Muhammad Abdul-Mageed, +2 more

- 27 Dec 2020 -

arXiv: Computation and Language

TL;DR: The authors introduced two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, for multi-dialectal Arabic language understanding evaluation, which achieved state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets).

...read moreread less

Posted Content

Retrieval Augmentation Reduces Hallucination in Conversation

Kurt Shuster, +4 more

- 15 Apr 2021 -

arXiv: Computation and Language

TL;DR: This paper explore the use of neural retrieval-in-the-loop architectures for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.

...read moreread less