Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Memorization in Deep Neural Networks: Does the Loss Function Matter?

Deep Patel, +1 more

TL;DR: The authors empirically show that a symmetric loss function as opposed to either cross entropy or squared error loss results in significant improvement in the ability of the network to resist such overfitting.

...read moreread less

Proceedings ArticleDOI

DialFact: A Benchmark for Fact-Checking in Dialogue

TL;DR: In this paper , the authors introduce the task of fact-checking in dialogue, which is a relatively unexplored area, and construct DialFact, a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia.

...read moreread less

Journal ArticleDOI

Deep neural networks architecture driven by problem-specific information

Daniel Urda, +7 more

- 29 Jan 2021 -

Neural Computing and Applications

TL;DR: The use of problem-specific information is proposed in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc- DNN), in such a way that network topology is driven by prior knowledge.

...read moreread less

Proceedings Article

A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems

San Kim, +3 more

TL;DR: This paper showed that knowledge inherent in cross-lingual language models can be helpful for generating responses in open-domain Korean dialogue systems, even with only English knowledge given to the dialogue system, and developed a knowledge-grounded Korean dialogue model based on KE-T5.

...read moreread less

Posted Content

CDLM: Cross-Document Language Modeling

Avi Caciularu, +5 more

- 02 Jan 2021 -

arXiv: Computation and Language

TL;DR: This paper proposed a cross-document language model (CDLM) for multidocument language modeling, which incorporates two key ideas into the masked language modeling self-supervised objective: first, instead of considering documents in isolation, they pretrain over sets of multiple related documents, and second, they improve over recent long-range transformers by introducing dynamic global attention to predict masked tokens.

...read moreread less